On the identification of statistically significant network topology
Determining the structure of large and complex networks is a problem that has stirred great interest in many fields including mathematics, computer science, sociology, biomedical research, and epidemiology. Despite this high level of interest, though, there still exists no procedure for formal hypothesis testing to measure the significance of detected community structure in an observed network. First, this work proposes three, more general alternatives to modularity, the most common measure of community structure, which allow for the detection of more general structure in networks. An approach based upon the likelihood ratio test is shown not only to be as effective as modularity in detecting modular structure but also able to detect a wide variety of other network topologies. Second, this work proposes a general and novel test, the Likelihood Ratio Cluster (LRC) test, for assessing the statistical significance of the output of clustering algorithms. This technique is demonstrated by applying it to the sample partitions generated by both network and conventional clustering algorithms. Finally, a method for evaluating the capability of heuristic clustering techniques to detect the optimal sample partition is developed. This technique is used to evaluate several common community detection algorithms. Surprisingly, the most popular community detection algorithm is found to be largely ineffective at detecting the optimal partition of a random network. Also surprisingly, Clauset's fast algorithm (Clauset et al 2004), which is commonly thought to be fast but inaccurate, is found to be the most effective of the algorithms examined at detecting the optimal partition in random networks.