Model selection results for latent high-dimensional graphical models on binary and count data with applications to fMRI and Genomics
Sinclair, David Giles
This dissertation explores the undirected graphical model framework. We explore applications of highly dependent binary data and count data in order to determine to determine the underlying structural dependencies. In the first chapter we provide an overview of statistical techniques used in the dissertation. This includes an overview of undirected graphical models, as well as discussion on the Gaussian graphical model. Fitting procedures for the Gaussian graphical model are discussed, as the procedures are foundational to cases discussed in subsequent chapters. In the second chapter we propose the misclassified Ising Model: a framework for analyzing dependent binary data where the binary state is susceptible to error. We extend the theoretical results of the model selection method presented in Ravikumar et al. (2010) to show that the method will still correctly identify edges in the underlying graphical model under suitable misclassification settings. With knowledge of the misclassification process, an expectation maximization algorithm is developed that accounts for misclassification during model selection. We illustrate the increase of performance of the proposed expectation maximization algorithm with simulated data, and using functional magnetic resonance imaging data from the Human Connectome Project. Appendices to the paper containing proofs of the theoretical results, and weight calculations steps. In chapter 3 we report on a novel Interregional Function Connectivity (IRFC) method for functional connectivity in task-fMRI data. The approach can be directly applied to output from the standard GLM. Given a parcellation of the cortex, activation states are estimated using GLM output. The connectivity fitting procedure builds a whole-brain conditional independence graph where edges can be interpreted as direct communications between regions of the cortex. Under the stated statistical model, the full network estimates a whole-brain functional connectivity network. The model can be fit using the IRFCnet function https://github.com/lbc-spreng/IRFCnet, directly given GLM output. The model is demonstrated with working memory data from the Human Connectome Project where the activation state of regions of the cortex is estimated using a univariate contrast of the 2-back greater than 0-back working memory conditions. Our model observed strong positive connectivity between parietal and frontal brain regions consistent with numerous previous reports of frontoparietal activation and connectivity during n-back performance. IRFC output was then submitted to graph analysis. These findings provide the first evidence supporting an extended GLM approach for the simultaneous estimation of whole brain activation and functional connectivity. In chapter 4 we introduce the Poisson Log-Normal Graphical Model, and present a normality transformation for data arising from this distribution. The model allows for network dependencies to be modeled for count data, and we provide an algorithm which utilizes a one-step EM based result in order to allow for a provable increase in performance in determining the network structure. The model is shown to provide an increase in performance in simulation settings over a range of network structures. The model is applied to high-throughput microRNA (miRNA) sequencing data from patients with breast cancer from The Cancer Genome Atlas (TCGA). By selecting the most highly connected miRNA molecules in the fitted network we find that nearly all of them are known to be involved in the regulation of breast cancer.
Statistics; fMRI; Graphical Models; Expectation Maximzation; Latent Network; miRNA; lasso
Hooker, Giles J.
Spreng, Robert Nathan; Booth, James
Ph. D., Statistics
Doctor of Philosophy
dissertation or thesis