Advances in mixture modeling and model based clustering
Cluster analysis is part of unsupervised learning that deals with finding groups of similar observations in heterogeneous data. There are several clustering approaches with the goal of minimizing the within cluster variance while maximizing the variance between clusters. K-means or hierarchical clustering with different linkages can be thought as distance-based approaches. Another approach is model-based which relies on the idea of finite mixture models. This dissertations will propose new advances in clustering area mostly related to model-based clustering and its extension to the K-means algorithm. This report has five chapters. The first chapter is a literature review on recent advances in the area of model-based clustering and finite mixture modeling. Main advances and challenges are described in the methodology section. Then some interesting and diverse applications of model-based clustering are presented in the application section. The second chapter deals with a simulation study conducted to analyze the factors that affect complexity of model-based clustering. In the third chapter we develop a methodology for model-based clustering of regression time series data and show its application to annual tree rings. In the fourth chapter, we utilize the relationship between model-based clustering and the Kmeans algorithm to develop a methodology for merging clusters formed by K-means to find meaningful grouping. The final chapter is dedicated to the problem of initialization in model-based clustering. It is well known fact that the performance of model-based clustering is highly dependent on initialization of the EM algorithm. So far there is no method that comprehensively works in all situations. In this project, we use the idea of model averaging and initialization using the emEM algorithm to solve this problem.