Department of Information Systems, Statistics & Management Science
Permanent URI for this community
Browse
Browsing Department of Information Systems, Statistics & Management Science by Author "Addy, Samuel N."
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item GA-Boost: a genetic algorithm for robust boosting(University of Alabama Libraries, 2012) Oh, Dong-Yop; Gray, J. Brian; University of Alabama TuscaloosaMany simple and complex methods have been developed to solve the classification problem. Boosting is one of the best known techniques for improving the prediction accuracy of classification methods, but boosting is sometimes prone to overfit and the final model is difficult to interpret. Some boosting methods, including Adaboost, are very sensitive to outliers. Many researchers have contributed to resolving boosting problems, but those problems are still remaining as hot issues. We introduce a new boosting algorithm "GA-Boost" which directly optimizes weak learners and their associated weights using a genetic algorithm, and three extended versions of GA-Boost. The genetic algorithm utilizes a new penalized fitness function that consists of three parameters (a, b, and p) which limit the number of weak classifiers (by b) and control the effects of outliers (by a) to maximize an appropriately chosen p-th percentile of margins. We evaluate GA-Boost performance with an experimental design and compare it to AdaBoost using several artificial and real-world data sets from the UC-Irvine Machine Learning Repository. In experiments, GA-Boost was more resistant to outliers and resulted in simpler predictive models than AdaBoost. GA-Boost can be applied to data sets with three different weak classifier options. We introduce three extended versions of GA-Boost, which performed very well on two simulation data sets and three real world data sets.Item Multivariate time series clustering using kernel variant multi-way principal component analysis(University of Alabama Libraries, 2010) Choi, Hwanseok; Hardin, J. Michael; University of Alabama TuscaloosaClustering multivariate time series data has been a challenging task for researchers since data has multiple dimensions to consider such as auto-correlations and cross-correlations whereas multivariate time series data has been prevailing in diverse areas for decades. However, for a short-period time series data, conventional time series modeling may not satisfy the model validity. Multi-way Principal Component Analysis can be used for this case, but the normality assumption can restrict to handle nonlinear data such as multivariate time series with high order interactions. Kernel variant MPCA will be proposed for an alternative solution for this case. To test if KMPCA can cluster trivariate time series data into two groups, two simulation studies were conducted. The first study has the same mean structure groups with error structures which are combinations of three different auto-correlation levels and three different cross-correlation levels. Two different mean structure groups with nine error structures were generated for the second study. To check the proposed method work well on a real-world data, Obesity-depression relationship study was done for a real-world data. The simulation studies showed that KMPCA cluster two different mean structure groups over 90% success rates when an appropriate kernel function with proper parameter was applied. Similar error structure will obstruct the clustering performance: strong cross-correlation, weak auto-correlation, and larger number of temporal points. Considering racial effect, obesity and obesity related variables, especially addictive material uses for 15 years can expect depressed cohorts at year 20 up to 76% for Caucasian group and 95% for African-American group.Item Workforce supply and facility location(University of Alabama Libraries, 2009) Palmer, Nathan Curtis; Sox, Charles R.; Mittenthal, John; University of Alabama TuscaloosaIt is important for every company to minimize its costs. This includes labor costs. We develop mathematical models that allow a company to minimize its labor costs by deciding from where to hire workers and the amount that will be paid to those workers within a similar region. These decisions are particularly important when the company has multiple facilities that compete amongst themselves for labor resources. In areas that are experiencing economic growth or in developing countries labor resources are limited and labor decisions are critical. With this motivation, this work investigates the labor and facility location decisions of a company that has decided to build many new facilities in close proximity to each other. One example is a large manufacturing firm that seeks simultaneously to locate a new assembly plant and supplier facilities. Concentrating all, or much, of the supply chain together will cause already limited labor resources to be depleted even further. Higher wages are paid and higher labor costs are incurred by the company, as a result. On the other hand, greater transportation costs are incurred as the distances between the plant and its suppliers increase. For each of the supply chain facilities, the location of the facility, the labor markets from which to hire workers, and the wages offered must be determined. While considering these decisions, another potential factor in choosing the location for each facility is the cost of the site. This dissertation introduces this real-world problem, formulates it mathematically, and provides managerial insights for companies faced with these decisions.