Multivariate time series clustering using kernel variant multi-way principal component analysis

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
University of Alabama Libraries

Clustering multivariate time series data has been a challenging task for researchers since data has multiple dimensions to consider such as auto-correlations and cross-correlations whereas multivariate time series data has been prevailing in diverse areas for decades. However, for a short-period time series data, conventional time series modeling may not satisfy the model validity. Multi-way Principal Component Analysis can be used for this case, but the normality assumption can restrict to handle nonlinear data such as multivariate time series with high order interactions. Kernel variant MPCA will be proposed for an alternative solution for this case. To test if KMPCA can cluster trivariate time series data into two groups, two simulation studies were conducted. The first study has the same mean structure groups with error structures which are combinations of three different auto-correlation levels and three different cross-correlation levels. Two different mean structure groups with nine error structures were generated for the second study. To check the proposed method work well on a real-world data, Obesity-depression relationship study was done for a real-world data. The simulation studies showed that KMPCA cluster two different mean structure groups over 90% success rates when an appropriate kernel function with proper parameter was applied. Similar error structure will obstruct the clustering performance: strong cross-correlation, weak auto-correlation, and larger number of temporal points. Considering racial effect, obesity and obesity related variables, especially addictive material uses for 15 years can expect depressed cohorts at year 20 up to 76% for Caucasian group and 95% for African-American group.

Electronic Thesis or Dissertation
Statistics, Biology, Biostatistics, Behavioral psychology