The statistical detection of clusters in networks
Files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
A network consists of vertices and edges that connect the vertices. A network is clustered by assigning each of the N vertices to one of k groups, usually in order to optimize a given objective function. This dissertation proposes statistical likelihood as an objective function for network clustering for both undirected networks, in which edges have no direction, and directed networks, in which edges have direction. Clustering networks by optimizing an objective function is computationally expensive and quickly becomes prohibitive as the number of vertices in a network grows large. To address this, theorems are developed to increase the efficiency of likelihood parameter estimation during the optimization and a significant decrease in time-to-solution is demonstrated. When the clustering performance of likelihood is rigorously compared to competitor objective function modularity using Monte Carlo simulation, likelihood is frequently found to be superior. A novel statistical significance test for clusters identified when using likelihood as an objective function is also derived and both clustering using the likelihood objective function and subsequent significance testing are demonstrated on real-world networks, both undirected and directed.