Theses and Dissertations - Department of Information Systems, Statistics & Management Science

Permanent URI for this collection


Recent Submissions

Now showing 1 - 20 of 49
  • Item
    The Development of Statistical Monitoring Scheme and Simulation Model for the Autocorrelated Process
    (University of Alabama Libraries, 2020) Wang, Zhi; Perry, Marcus; University of Alabama Tuscaloosa
    The modern development in data acquisition and storage technologies have allowed for rapid data collection. One representative example is collecting data via high-sample-rate sensors developed with a rate of hundreds or more samples per second. The proximity between the observations can induce high autocorrelation into data sequences. Consequently, develop statistical tools for dealing with the autocorrelated process is of paramount value in modern data analysis. For this reason, the dissertation places primacy upon developing appropriate monitoring schemes and simulation models for the autocorrelated processes. In addition, the complexity of the modern process precludes the using of some conventional statistical approaches that has rigor distribution assumption. The wide practicality of the modern process motivates the work in the dissertation and award the great potential of the future investigation. Statistical process control (SPC) has wide applications in quality engineering, manufacturing industries, social science, disease surveillance, and many other areas. In this dissertation, a distribution-free jointly and independently monitoring scheme for location and scale using individual observations is developed based on the Bernoulli Cumulative Summation (CUSUM) control chart and the Bahadur model. The approach takes autocorrelation into consideration and circumvents the model-misspecification problem. The necessity of the method is appropriately motivated, simulation studies and real-world applications are used to evaluate the reliability and performance of the proposed scheme. Knowing when a process has deviated from the desired in-control status would simplify the control chart post-signal diagnostics. In the dissertation, we developed the maximum likelihood estimators (MLE) of time change point and introduced the built-in change point estimators for CUSUM and binomial exponentially weighted moving average (EWMA) charts. Relative mean index plots are provided and general conclusions are summarized to assist control charts users selecting change point and control chart design combination that guarantees robust change point estimation performance across a range of potential change magnitudes. Another aspect we studied is the simulation of the autocorrelated process. In this dissertation, we developed a simulation approach that permits users to simulate autocorrelated processes from both discrete and continuous distribution with a fully customizable order and structure of autocorrelation. Simulation studies and real-world applications are used to evaluate and illustrate the usefulness of the proposed simulation model.
  • Item
    Essays on Mixed-Fleet Green Vehicle Routing
    (University of Alabama Libraries, 2020) Koyuncu, Isil; Yavuz, Mesut; University of Alabama Tuscaloosa
    This work addresses a family of green vehicle routing problems. Three key operational characteristics distinguish alternative-fuel vehicles (AFVs) from gasoline or diesel vehicles (GDVs): (i) limited driving range before refueling is needed, (ii) scarce refueling infrastructure, and (iii) lengthy refueling times. The operational challenges in daily routing decisions faced by fleet managers and several key modeling aspects such as mixed fleets, refueling at customer and non-customer locations, and refueling policies are incorporated into the GVRP models. The first study compares two competing GVRP formulations, namely node- and arc-duplicating. Both formulations are strengthened via (i) two label setting algorithms to tighten the bounds, and (ii) improved lower bound on the number of routes. Through computational experiments based on two testbeds from the literature, the study concludes that the less common arc-duplicating formulation outperforms the more common node-duplicating formulation. The second study introduces an efficient solution framework by exploiting the route optimization outcome of the GDVs. We investigate the benefits of utilizing GDV optimal routes by quantifying the differences between AFV and GDV optimal routes and the solution times. Based on the results, three route optimization frameworks are proposed and implemented in a column generation algorithm. Based on data analysis, a solution methodology that potentially shortens the expected solution time is proposed. Finally, the third study introduces a novel profit-maximizing fleet mix and sizing with customer selection in green vehicle routing problem (GVRP). In addition to addressing operational challenges presented in the previous chapters, this study considers environmentally conscious customers who prefer receiving service with AFVs to reduce their supply chain carbon footprint and may have willingness-to-pay a green premium for it.
  • Item
    Model-Based Clustering of Sequential and Directional Data
    (University of Alabama Libraries, 2020) Zhang, Yingying; Melnykov, Volodymyr; University of Alabama Tuscaloosa
    The goal of cluster analysis is to separate objects into distinct groups so that observations in the same group are more similar to each other. A variety of clustering algorithms have been proposed to implement this task, among which model-based clustering stands out due to its flexibility and usefulness. Model-based clustering often employs finite mixture models to cluster heterogeneous observations. A finite mixture model is a sum of several probability distributions, each distribution can be considered as one component, and the number of components is named as mixture order. In addition to this, the weight corresponding to each cluster can be termed as the mixing proportion. It denotes the prior probability that an observation originates from the associated cluster. There are two constraints for mixing proportions, these values must be between zero and one, and the sum of all mixing proportions should always be equal to one. Clustering objects are challenging due to the increasing complexity of data structure. This thesis solves problems with the clustering of categorical sequences and directional observations. Nowadays, clustering algorithms developed for categorical data are very limited, but this type of data can be found in many areas. Thus efficient models need to be proposed to measure the state-of-the-art nature of categorical data. On the other hand, directional data can also be obtained in many areas, like meteorology, astronomy, biology, and medical science. The most commonly employed Gaussian mixture models can no longer describe the directional nature of the data, while the von-Mises Fisher distribution pays a vital role in this area. One evident phenomenon is that real-life directional data have many noises, outliers, and heavy tails, but the current models are very sensitive to the presence of these noises. A model that can deal with such a problem will also be explored in this thesis. For each model, the Expectation-Maximization (EM) algorithm is employed to find estimates of parameters for the associated mixture model, and the performance of the proposed model is tested under various types of synthetic data and compared to the already developed models. Then the proposed models are applied to the corresponding real-life data. The results indicate the superiority of proposed models for both synthetic and real data sets. The thesis is organized as follows: In the first chapter, a brief introduction is given for the background description of cluster analysis. Then the second chapter introduces a new model to include the temporal character of categorical sequences. In the third chapter, semi-supervised clustering is developed to explore potential factors that can affect observation classifications. Finally, in the fourth chapter, a new model is proposed to tackle directional data with noises.
  • Item
    Statistical Process Monitoring for Some Nonstandard Situations in Estimated Parameters Case
    (University of Alabama Libraries, 2020) Yao, Yuhui; Chakraborti, Subha; University of Alabama Tuscaloosa
    Statistical process control (SPC) and monitoring techniques are useful in practice in a variety of applications. Recent advancements in the literature have shown the need for distinguishing between Phase I (retrospective) and Phase II (prospective) process monitoring and the importance of taking proper account of the effects of parameter estimation. This work considers the retrospective and prospective process monitoring for the balanced random effects (variance components) model with Phases I and II Shewhart charts and Phase II EWMA chart with estimated parameters. In Phase I, Shewhart-type charts are recommended in this phase because of their broader shift detection ability. The proposed methodology takes proper account of the effects of parameter estimation and uses the false alarm probability (FAP) metric to design the chart. The proposed Phase I chart is shown to be easily adaptable to more general models, with more variance components and nested factors, and can accommodate various estimators of variance. Thus, it enables a broader Phase I process monitoring strategy, under normality, which can be applied within the ANOVA framework applicable for many DOE models. In Phase II, multiple control charts are dominating including the Shewhart-type charts and its generalization, the EWMA charts which is famous of detecting smaller shift. In order to not inflate the false alarm rate, the effect of parameter estimation is considered and the proposed Phase II charts are measured by the average run length (ARL). Two types of corrected limits are provided, following the recent literature, one based on the unconditional perspective and the other on the conditional perspective and the exceedance probability criterion (EPC). In the sequel, the corrected (adjusted) charting constants are calculated and tabulated. The tabulations can be found, on demand, from accompanying R packages. Simulation studies for the robustness and the out-of-control performance are conducted. Illustrations are shown using real-world data. R packages are provided to help deployment of the new methodology in practice.
  • Item
    Sharing is Caring: Essays on Online Self-Disclosure
    (University of Alabama Libraries, 2020) Nabity-Grover, Teagen M; Thatcher, Jason; Johnston, Allen; University of Alabama Tuscaloosa
    Online self-disclosure has been studied in a variety of disciplines for more than two decades. Self-disclosure is any information about the self communicated to another person; it is generally decomposed into five dimensions: amount, depth or intimacy, honesty and accuracy, polarity, and intent. In this dissertation, we offer a new contextualization of self-disclosure to online settings. While our review of the literature suggests four dimensions are conceptually similar across contexts, the fifth – intent – is problematic. Intent refers to the willingness to share personal information. In the online context, intent items direct attention to whether one intends to post or is unaware they are posting certain information. In the offline context, unintentional or accidental disclosures occur mostly due to environmental (i.e. seeing a colleague in a locker room) or nonverbal (i.e. facial expressions) cues. However, online communication differs from offline communication in four ways: reduced nonverbal cues, asynchronicity, editability, and breadth of audience. The first three of these unique attributes imply online intent is fundamentally different from offline intent. To account for these differences, there is a need to contextualize self-disclosure to the online environment. We accomplish the contextualization of online self-disclosure through two essays. In essay one, we conduct a thorough review of the literature to evaluate the contextualization of the measures of online self-disclosure and identify areas for improving the construct’s measurement. Based on the analysis, we propose four context-specific dimensions to supplant intent in the decomposition of online self-disclosure: willingness to participate, reciprocity, audience control, and conscientious use. In essay two, we develop an operational long- and short-form measure and subject it to rigorous validity testing; in doing so, we compare the new measure to two established instruments and examine its performance within a nomological model. We find support for two of the proposed dimensions and for a new structural definition of online self-disclosure involving two intermediate latent variables: message and behavior. This new structure could help improve the content validity of short, simple instruments that are frequently seen in the literature.
  • Item
    Some Contributions to Tolerance Intervals and Statistical Process Control
    (University of Alabama Libraries, 2021) Alqurashi, Mosab; Chakraborti, Subhabrata; University of Alabama Tuscaloosa
    Tolerance Intervals play an important role in statistical process control along with control charts. When constructing a tolerance interval or a control chart for the mean of a quality characteristic, the normality assumption can be justifiable at least in an approximate sense. However, in applications where the individual observations are to be monitored or controlled, the normality assumption is not always satisfied. In addition, for high dimensional data, the normality is rarely, if ever, satisfied. The existing tolerance intervals for exponential random variables and sample variances are constructed under a condition that assumes a known parameter, leading to unbalanced tolerance intervals. Moreover, the existing multivariate distribution-free control charts in the literature lack the ability to identify the out-of-control variables directly from the chart signal and the scale of the original variables is often lost. In this dissertation, new tolerance intervals for exponential random variables and for the sample variances, and a multivariate distribution-free control chart are developed. This dissertation consists of three chapters. The summary of each chapter is provided below. In the first chapter, we introduce a tolerance interval for exponential random variables that gives the practitioner control over the ratio of the two tails probabilities without assuming that the parameter of the distribution, the mean, is known. The second chapter develops a tolerance interval and a guaranteed performance control chart for the sample variances without assuming that the population variance is known. The third chapter introduces a multivariate distribution-free control chart based on order statistics that can identify out-of-control variables and preserve the original scale.
  • Item
    Semiparametric Approaches for Dimension Reduction Through Gradient Descent on Manifold
    (University of Alabama Libraries, 2021) Xiao, Qing; Wang, Qin; University of Alabama Tuscaloosa
    High-dimensional data arises at an unprecedented speed across various fields. Statistical models might fail on high-dimensional data due to the "curse of dimensionality". Sufficient dimension reduction (SDR) is to extract the core information through low-dimensional mapping so that efficient statistical models can be built while preserving the regression information in the high-dimensional data. We develop several SDR methods through manifold parameterization. First, we propose a SDR method, gemDR, based on local kernel regression without loss of information of the conditional mean E[Y|X]. The method, gemDR, focuses on identifying the central mean subspace (CMS). Then gemDR is extended to CS-gemDR for central subspace (CS), through the empirical cumulative distribution function. CS-OPG, a modified outer product gradient (OPG) method for CS, is developed as an initial estimator for CS-gemDR. The basis B of the CMS or CS is estimated by a gradient descent algorithm. An update scheme on a Grassmann manifold is to preserve the orthogonality constraint on the parameters. To determine the dimension of the CMS and CS, two consistent cross-validation criteria are developed. Our methods show better performance for highly correlated features. We also develop ER-OPG and ER-MAVE to identify the basis of CS on a manifold. The entire conditional distribution of a response given predictors is estimated in a heterogeneous regression setting through composite expectile regression. The computation algorithm is developed through an orthogonal updating scheme on a manifold. The proposed methods are adaptive to the structure of the random errors and do not require restrictive probabilistic assumptions as inverse methods. Our methods are first-order methods which are computationally efficient compared with second-order methods. Their efficacy is demonstrated through numerical simulation and real data applications. The kernel bandwidth and basis are estimated simultaneously. The proposed methods show better performance in estimation of the basis and its dimension.
  • Item
    Some Contributions to Modern Mixture Modeling and Model-Based Clustering
    (University of Alabama Libraries, 2021) Wang, Yang; Melnykov, Volodymyr; University of Alabama Tuscaloosa
    Clustering analysis is a technique of recognizing groups of similar objects. Based on the finite mixture models, model-based clustering is one of the most popular methods due to its flexibility and interpretability in modeling heterogeneous data. In this background, the one-to-one correspondence between mixture components and groups is assumed. The clustering process can be viewed as the model estimation by using an optimization algorithm. The age of big data poses new challenges. Due to a potentially high number of parameters, finite mixture models are often at the risk of being overparameterized. The overparameterization in model-based clustering often results in mixture order underestimation. As a fast-growing field, developing simulation studies to validate the mixture models becomes another crucial topic. This thesis contributes to modern mixture modeling and model-based clustering, and mainly focuses on developing approaches for solving overparameterization issues in this context. In addition, algorithms for simulating various types of clusters are created, which can be utilized to evaluate and improve clustering techniques. For each of the chapters, the expectation-maximization (EM) algorithm of the proposed mixture is developed, the expressions for model parameter estimations are provided, and corresponding parsimonious procedures are proposed. The utilities of methodologies are tested on both synthetic and well-known classification datasets. The organization of the thesis is as follows. In the firstchapter, a variable selection procedure is developed and applied in the matrix mixture modeling. The second chapter develops a novel mixture modeling approach called conditional mixture modeling and its corresponding parsimonious procedure. The third chapter provides an extension for simulating heterogeneous data for studying the systematic performance of clustering algorithms. Finally, the fourth chapter describes an R package cmbClust functionality developed for clustering multivariate data using the methodology proposed in chapter two.
  • Item
    On the use of transformations for modeling multidimensional heterogeneous data
    (University of Alabama Libraries, 2019) Sarkar, Shuchismita; Melnykov, Volodymyr; University of Alabama Tuscaloosa
    The objective of cluster analysis is to find distinct groups of similar observations. There are many algorithms in literature that can perform this task and among them model based clustering is one of the most flexible tools. Assumption of Gaussian density for mixture components is quite popular in this field of study due to it’s convenient form. However, this assumption is not always valid. This thesis explores the use of various transformations for finding clusters in heterogeneous data. In this process, the thesis also attends to several data structures such as vector-, matrix-, tensor-, and network-valued data. In the first chapter, linear and non-linear transformations are used to model heterogeneous vector-valued observations when the data suffer from measurement inconsistency. The second chapter discusses an extensive set of parsimonious models for matrix-valued data. In the third chapter a methodology for clustering skewed tensor-valued data is developed and it is applied for analyzing remuneration of professors in American universities. The fourth chapter focuses on network-valued data and a novel finite mixture model addressing the dependent structure of network data is proposed. Finally, the fifth chapter describes the functionality of a R package “netClust” developed by the author for clustering unilayer and multilayer networks following the methodology proposed in Chapter four.
  • Item
    Distribution system design for omnichannel retailing
    (University of Alabama Libraries, 2019) Guo, Jia; Keskin, Burcu B.; University of Alabama Tuscaloosa
    Omnichannel retailing - serving customers via a combination of physical stores and web-based stores- offers new opportunities and forces traditional retailers to rethink their supply chain design, operational efficiency, revenue/cost streams, and operations/marketing interface. While omnichannel supply chain management has received some attention recently, the role of cross-channel fulfillment, the layout of the omnichannel retail supply chain, and revenue management considering customer channel choice behavior have not been widely studied. This dissertation investigates these three streams in omnichannel supply chain design. In the cross-channel fulfillment stream, we study the optimal supply chain design for a dual-channel retailer that combines the operations of both channels in an omnichannel environment considering demand segmentation, cost structure, and more importantly, the execution ability of the firm. We formulate this problem as a two-stage stochastic programming model and use first-order optimality conditions to study the optimal inventory replenishment decisions and omnichannel strategy decisions under perfect and imperfect demand information. For the second chapter, we extend the dual-channel setting from a single store to N retail stores. We study the transshipment problem based on a two-store case by reformulating the problem into a large scale mixed-integer linear programming model. The third chapter addresses the revenue management stream by focuses on the interface between the retailer's operations and customer's demand. Specifically, this chapter explores the right role for a physical store in an omnichannel environment for an online-first retailer. The main result relates to the trade-off between the increased profits from the newly acquired demand (from the new channel) and the increased fulfillment and operations costs from cannibalized demand.
  • Item
    Discernable Periods in the Historical Development of Statistical Inference
    (1967) Gober, Richard Wayne; University of Alabama Tuscaloosa
    The purpose of this study is to trace the historical development of that part of modern statistical procedures known as statistical inference. Although the application of statistical methods is concerned more than ever with the study of great masses of data, percentages, and columns of figures, statistics has moved far beyond the descriptive stage. Using concepts from mathematics, logic, economics, and psychology, modern statistics has developed into a designed "way of thinking" about conclusions or decisions to help a person choose a reasonable course of action under uncertainty. The general theory and methodology is called statistical inference.
  • Item
    Stochastic decision models for last mile distribution using approximate dynamic programming
    (University of Alabama Libraries, 2018) Cook, Robert A.; Lodree, Emmett J.; University of Alabama Tuscaloosa
    After localized disasters, donations are sometimes collected at the same facility as they are distributed, and the damaged infrastructure is overwhelmed by the congestion. However, separating the donation facilities from the points of distribution requires a vehicle to bring items between locations. We investigate dispatching policies for vehicles in such a scenario. We initially consider the case with one collection facility called a Staging Area (SA) and one Point of Distribution (POD). Among other things, we prove that if we have two or more vehicles, it is optimal to continuously dispatch the vehicles under most circumstances. Furthermore, we define two common-sense practical decision policies - Continuous Dispatching (CD) and Full Truckload Dispatching (FTD) - and demonstrate that CD performs well for one vehicle, at least as well as FTD across the board. This begs the question, can CD work on larger, more realistic networks? To answer this, we expand our network to two SAs and two vehicles to best compare to our prior work. First, we evaluate two Value Function Approximation methods and find that Rollout Algorithms can serve as a proxy for the optimal solution. Against this as a benchmark, CD performs well when the amount of items donated greatly exceeds the demand, and also when demand exceeds supply, but struggles when the two are equivalent. Next, we expand our network and consider general numbers of SAs and vehicles. Before we can begin, we must redefine CD for the expanded network. We describe several variations of CD for general networks, requiring different information to implement. So, by comparing them, we evaluate the value of the different pieces of information that a practitioner may have in the field. We find that visiting each SA equally on a rotating basis is a powerful strategy, although a better approach can be found by combining information about inventory levels, the locations of the vehicles, and the expected accumulation at each SA. Given the chaotic nature of humanitarian logistics, it is unlikely that this information may be obtained accurately, and so we recommend the rotating strategy.
  • Item
    The statistical detection of clusters in networks
    (University of Alabama Libraries, 2018) Ballard, Marcus Alan; Perry, Marcus B.; University of Alabama Tuscaloosa
    A network consists of vertices and edges that connect the vertices. A network is clustered by assigning each of the N vertices to one of k groups, usually in order to optimize a given objective function. This dissertation proposes statistical likelihood as an objective function for network clustering for both undirected networks, in which edges have no direction, and directed networks, in which edges have direction. Clustering networks by optimizing an objective function is computationally expensive and quickly becomes prohibitive as the number of vertices in a network grows large. To address this, theorems are developed to increase the efficiency of likelihood parameter estimation during the optimization and a significant decrease in time-to-solution is demonstrated. When the clustering performance of likelihood is rigorously compared to competitor objective function modularity using Monte Carlo simulation, likelihood is frequently found to be superior. A novel statistical significance test for clusters identified when using likelihood as an objective function is also derived and both clustering using the likelihood objective function and subsequent significance testing are demonstrated on real-world networks, both undirected and directed.
  • Item
    Models for patient-centered appointment scheduling in physician clinics
    (University of Alabama Libraries, 2018) Dogru, Ali Kemal; Melouk, Sharif H.; University of Alabama Tuscaloosa
    Naive clinical capacity planning, myopic appointment scheduling techniques, and unavoidable appointment interruptions lead to excessive patient waiting time, physician idle time and overtime, which result in inefficient use of clinical resources, increased clinical costs, untimely access to care, decreased continuity of care, and dissatisfied patients. Previous research have found that efficient appointment scheduling methods may significantly improve both patient and clinic related outcomes. Furthermore, the concept of patient-centered medical home (PCMH) has become one of the predominant models of health care delivery in the last two decades. PCMH is a proactive and team based approach to care, which places the patient at the center of care and benefits from data analytics to make informed clinical decisions. Motivated by PCMH principles, this dissertation research aims to investigate patient-centered appointment scheduling in physician clinics. More specifically, we address the following problems: 1) primary care capacity planning for open-access appointment systems; 2) adaptive appointment scheduling for patient-centered medical homes; 3) managing interruptions in appointment schedules in physician clinics. To solve these problems, we use stochastic optimization and simulation optimization. Our patient-centered capacity planning, appointment scheduling, and appointment interruption management strategies provide significant value in terms of both operational and patient oriented performance measures.
  • Item
    Medical decision making in patients with chronic diseases
    (University of Alabama Libraries, 2018) Mirghorbani, Seyedeh Saeideh; Melouk, Sharif H.; Mittenthal, John; University of Alabama Tuscaloosa
    Inefficient resource allocation and planning, expensive treatment costs, and low patient adherence to medication plans lead to undesired health outcomes in patients with chronic diseases. Operation research and stochastic decision process models have provided significant opportunities to assist physicians and healthcare providers with these complexities. Using advanced stochastic decision-making processes, this dissertation contributes to the field of medical decision making in patients with chronic diseases. We use Markov decision process and partially Markov decision process models to address our research questions. The first contribution investigates the impact of patient adherence on health outcomes and medication plans in patients with Type 2 diabetes. The second contribution, that is an extension of the first contribution, investigates the financial effects of nonadherence to medication plans in patients with Type 2 diabetes. The experimental results of these two studies reveal the importance of higher adherence to the medication in achieving desired health outcomes and expenses. Finally, the third contribution focuses on patients with a risk of Alzheimer’s disease and aims to provide observation-based screening plans that consider patient risk factors.
  • Item
    On robust estimation of multiple change points in multivariate and matrix processes
    (University of Alabama Libraries, 2017) Melnykov, Yana; Perry, Marcus B.; University of Alabama Tuscaloosa
    There are numerous areas of human activities where various processes are observed over time. If the conditions of the process change, it can be reflected through the shift in observed response values. The detection and estimation of such shifts is commonly known as change point inference. While the estimation helps us learn about the process nature, assess its parameters, and analyze identified change points, the detection focuses on finding shifts in the real-time process flow. There is a vast variety of methods proposed in the literature to target change point detections in both settings. Unfortunately, the majority of procedures impose very restrictive assumptions. Some of them include the normality of data, independence of observations, or independence of subjects in multisubject studies. In this dissertation, a new methodology, relying on more realistic assumptions, is developed. This dissertation report includes three chapters. The summary of each chapter is provided below. In the first chapter, we develop methodology capable of estimating and detecting multiple change points in a multisubject single variable process observed over time. In the second chapter, we introduce methodology for the robust estimation of change points in multivariate processes observed over time. In the third chapter, we generalize the ideas presented in the first two chapters by developing methodology capable of identifying multiple change points in multisubject matrix processes observed over time.
  • Item
    A predictive model for highway accidents and two papers on clustering averaging
    (University of Alabama Libraries, 2017) Wang, Ketong; Porter, Michael D.; University of Alabama Tuscaloosa
    Predictive models and clustering algorithms are two of the most important statistical methodologies in solving quantitative problems. This dissertation document aims at proposing several innovative prediction and clustering techniques and demonstrating their successful applications in solving several real world problems. Chapter 1 discusses how the choice of highway safety performance function (SPF), as a predictive model on crash rate, affects the importance of various highway intersection characteristics. In this chapter, a highway data inventory of 36 safety relevant parameters along state routes in Alabama is used to study the importance of the road characteristics and their interactions. Four SPFs are considered including Poisson regression, negative binomial regression, regularized generalized linear model, and boosted regression trees (BRT). Overall, the BRT outperforms other models on predictive accuracy, due to its capability of accounting for non-linearities and multi-way interactions. Additionally, the boosted tree model identifies several important variables, such as pedestrian crossing control type and distance to next public intersection, that are ignored by other SPFs. Although models of linear form have straightforward interpretations of the relationship between crash rate and the road characteristics, BRT better identifies critical variables with an superior prediction accuracy. Chapter 2 presents an improvement of Bayesian model-based clustering using similarity-based model aggregation and a clustering estimation approach named non-negative matrix factorization (NMF). In Bayesian model-based clustering, MCMC algorithm provides sufficient outcome for statistical inference on the model-specific parameters. However, traditional posterior inference techniques, such as maximum a posteriori (MAP), is difficult to apply to the partitioning vector due to the exchangeability of the cluster labels. Therefore, this chapter proposes a methodology for estimating the final partitioning vector based on the close relationship between NMF and the loss-function approaches in literature. Our method not only provides clustering solution of better accuracy but also enables a soft or probabilistic interpretation of the cluster assignments. Chapter 3 illustrates how clustering averaging can be utilized to refine model-based clustering using finite mixture models. As the Expectation-Maximization (EM) algorithm for estimating finite mixture models is notably sensitive to the initialization and the specification of a correct number of clusters K, clustering model averaging can be employed to provide an aggregated partition better than any individual solution. However, various specifications are available for each step of the clustering aggregation and estimation process. This chapter proposes an aggregated multi-component clustering algorithm (AMCCA) which optimizes the options for each step of the clustering aggregation. Additionally, our algorithm imposes an extra step of multi-component clustering using an initial partition from NMF, which presents better clustering performance than existing approaches including Gaussian mixture models.
  • Item
    Some contributions to univariate nonparametric tests and control charts
    (University of Alabama Libraries, 2017) Zheng, Rong; Chakraborti, Subhabrata; University of Alabama Tuscaloosa
    In general, statistical methods have two categories: parametric and nonparametric. Parametric analysis is usually made based on information regarding the probability distribution of the random variable. While, nonparametric method is also referred as a distribution-free procedure, which does not require prior knowledge of the distribution of the random variable. In reality, few cases allow practitioners to gain full knowledge of a random variable and tell the probability distribution for sure. Hence, there are two choices for practitioners. One can still use the parametric methods due to the scientific evaluations or the simplification of situation, with an assumption of the parametric distribution. Alternatively, one can directly apply the nonparametric methods without having much knowledge of the distribution. The conclusions from the parametric methods are valid as long as the assumptions are substantiated. These assumptions would help solving problems, but also risky because making a wrong assumption might be dangerous. Hence, nonparametric techniques would be a preferable alternative. One chief advantage of the nonparametric methods lies in its relaxation of the shapes of the distributions, namely, distribution-free property. Hence, from a research point of view, new methodology with nonparametric techniques applied, or further investigation related to existing nonparametric techniques could be interesting, informative and valuable. All research in this matter contributes to univariate nonparametric tests and control charts.
  • Item
    Integrated supply chain models and analysis
    (University of Alabama Libraries, 2016) Zhi, Jianing; Keskin, Burcu Baris; University of Alabama Tuscaloosa
    This dissertation reports on three integrated supply chain problems that incorporate several key components of modern supply chains, including location, transportation, inventory, and customer selection. The first part of the dissertation investigates a multi-product, three-stage distribution system with transshipment and direct shipping from supply centers to customers. The objective is to determine the locations of a fixed number of capacitated warehouses to minimize the total transportation and fixed facility costs in the supply chain network. The second part of the dissertation focuses on the integrated location-inventory problem in a multi-retailer newsvendor setting with both decentralized and centralized decision making. The third part of the dissertation explores the coordination between operations management and marketing through an integration of marketing strategies and the inventory decisions to maximize the total expected profit of the company. The contribution of this dissertation is four-fold: First, we define two new problems with respect to integrated decision making in supply chain management: one combines inventory and location decisions based on two designs of supply chain network, and the other one studies the interface of operations management and marketing on top of a selective newsvendor problem with quantity dependent lead time. For both problems, we offer mathematical models and effective solution approaches. Second, we develop two meta-heuristic solution approaches for a multi-product production/distribution system design (PDSD) problem, which has been studied in literature and solved with Scatter search and Tabu search. We propose to solve the problem with two meta-heuristic procedures, simulated annealing (SA) and greedy randomized adaptive search procedure (GRASP), which demonstrate better solution quality and time performance compared to Scatter Search. Third, we establish a practical connection between operations and marketing in the selective newsvendor problem. This effort demonstrates that a joint decision-making process is more profitable, and opens up more cooperation opportunities between the two disciplines. Lastly, the proposed mathematical models, solution approaches, as well as managerial insights for either new problems or existing problems will potentially shed light on the research of problem variants and the development of new techniques beyond those considered in this dissertation.
  • Item
    The development of diagnostic tools for mixture modeling and model-based clustering
    (University of Alabama Libraries, 2016) Zhu, Xuwen; Melnykov, Volodymyr; University of Alabama Tuscaloosa
    Cluster analysis performs unsupervised partition of heterogeneous data. It has applications in almost all fields of study. Model-based clustering is one of the most popular clustering methods these days due to its flexibility and interpretability. It is based on finite mixture models. However, the development of diagnostic tools and visualization tools for clustering procedures is limited. This dissertation is devoted to assessing different properties of the clustering procedure. This report has four chapters. The summary of each chapter is given below: In the first chapter we provide the practitioners with an approach to assess the certainty of a classification made in model-based clustering. The second chapter introduces a novel finite mixture model called Manly mixture model. It is capable of modeling skewness in data and performs diagnostics on the normality of variables. In the third chapter we develop an extension of the traditional K-means procedure that is capable of modeling skewness in data. The fourth chapter contributes to the ManlyMix R package, which is the developed software corresponding to our paper in Chapter 2.