Abstract:
Metabolomics, a relatively late entrant in the ’omics’ pyramid, aims to capture a complete snapshot of the metabolome of an organism at any given point in time. Recent advances in mass spectrometry techniques have allowed for the simultaneous detection of hundreds of metabolites in a given sample. However, metabolomics data suffers from high dimen- sionality, high correlations, and the presence of unknown metabolites. In my Ph.D. disser- tation, I have employed machine learning techniques and graphical models to analyze and deconstruct some of the complexities in metabolomics data in Drosophila melanogaster. In chapter 1, I introduce the challenges in metabolomics data analysis and outline my dissertation. In chapter 2, I employed the Random Forest algorithm, to identify essential metabolites that best differentiate between the high-fat diet and normal diet. I found that flies on a high-fat diet had an upregulated omega fatty acid oxidation pathway. Further- more, I analyzed the network structure differences between the high-fat diet and normal diet-fed flies using Gaussian Graphical Models. The edge symmetric difference between the two networks was 0.786, indicating very different topology. Chapter 3 shows the use of Bayesian networks to predict metabolic networks from the untargeted metabolomics data. The networks obtained were then compared to known metabolic networks in various organisms present in KEGG. I found that the generated Bayesian networks showed a similar degree distribution, had similar secondary motif com- position, and similar short path length distribution as the known KEGG metabolic net- works. Thus, I demonstrate that Bayesian network analysis can be successfully utilized for untargeted metabolomics data to generate data-driven network models that have similar underlying characteristics as known metabolic networks. In chapter 4, we present FlyNet, a multilayer network database conceptualized and con- structed for storing and visualizing complex network data. FlyNet integrates the metabolome with the genome and the proteome to facilitate integrative studies in Drosophila melanogaster. As an example, I show how the betweenness of gene and protein nodes changes in a mul- tilayer setting compared to a single layer analysis. Furthermore, I show how using FlyNet, one can query a possible relationship between genes and metabolites across different bio- logical layers.