Statistical Methods And Analysis For Genome-Wide Association Studies
Genome-wide association (GWA) studies utilize a large number of genetic variants, usually single nucleotide polymorphisms (SNPs), across the entire genome to identify genetic basis underlying disease susceptibility or phenotypic variation in a trait of interest. A commonly used analysis tool is single marker analysis (SMA), which tests one SNP at a time. Although it has been successful in identifying some causal loci, further enhancements are possible by considering multi-locus methods that investigate a large number of SNPs simultaneously. One difficulty of doing so is high dimensionality, i.e. the large number of SNPs, making it a challenging statistical problem. My first project addresses this problem in case-control GWA studies. Both the logistic and probit models are considered for binary traits, and three-component mixture priors are assumed to model the fact that only a few SNPs have non-negligible effects. To estimate posterior distributions, I propose three Markov chain Monte Carlo techniques. Specifically, an adaptive independence sampler is proposed for the logistic model, and data augmentation methods are developed for both logistic and probit models. Simulations suggest that they nearly always outperform SMA. The second project deals with GWA studies on quantitative traits with the confounding of population structure. A linear mixed model is used to account for cryptic relatedness between individuals in the sample. I propose an algorithm that is based on least angle regression and can efficiently select a small number of SNPs that are likely to be associated with the trait. Simulations show that the proposed algorithm tends to yield higher ranks for causal loci than least angle regression directly applied, and that both outperform SMA. My third project is part of the so-called CanMap project. More than 1,000 domestic dogs from different breeds, wild canids and village dogs were genotyped on a dense SNP array, and my responsibility was to carry out a GWA analysis for the domestic dog on body weight and other morphological traits including height, shapes, etc. The GWA results enrich our understanding of the impact of strong directional selection on the genetic architecture of complex traits known to be under selection.
dissertation or thesis