On the use of transformations for modeling multidimensional heterogeneous data
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The objective of cluster analysis is to find distinct groups of similar observations. There are many algorithms in literature that can perform this task and among them model based clustering is one of the most flexible tools. Assumption of Gaussian density for mixture components is quite popular in this field of study due to it’s convenient form. However, this assumption is not always valid. This thesis explores the use of various transformations for finding clusters in heterogeneous data. In this process, the thesis also attends to several data structures such as vector-, matrix-, tensor-, and network-valued data. In the first chapter, linear and non-linear transformations are used to model heterogeneous vector-valued observations when the data suffer from measurement inconsistency. The second chapter discusses an extensive set of parsimonious models for matrix-valued data. In the third chapter a methodology for clustering skewed tensor-valued data is developed and it is applied for analyzing remuneration of professors in American universities. The fourth chapter focuses on network-valued data and a novel finite mixture model addressing the dependent structure of network data is proposed. Finally, the fifth chapter describes the functionality of a R package “netClust” developed by the author for clustering unilayer and multilayer networks following the methodology proposed in Chapter four.