COMPUTATIONAL EXPLORATION OF THE GENETIC FACTORS BEHIND TRANSCRIPTIONAL REGULATION
Understanding how DNA sequence affects transcription is an important first step to unravel the molecular mechanisms that cause genetic disease. Finding allele specific differences in the distribution of RNA Polymerase II (Pol II) along the genome is a powerful strategy for understanding the link between DNA sequence and the various steps in the transcription cycle. Using the natural genetic variation between the two homologous copies of the genome in diploid organisms, I can exclude most external confounding factors and identify the effect of DNA sequence differences between the copies. However, few computational methods have been developed to discover allele specific differences in functional genomic data. Existing methods either treat each SNP independently, limiting statistical power, or combine SNPs across gene annotations, preventing the discovery of allele specific differences in unexpected genomic regions. In the first part of my dissertation, I describe a new computational method, AlleleHMM, I developed which addresses this problem. AlleleHMM uses the spatial relationship among the neighboring single nucleotide polymorphisms (SNPs) to identify genomic blocks that share similar allele specific differences in mark abundance. Using both simulated and real genomic data, I found that AlleleHMM substantially outperforms naive methods, particularly when input data has realistic levels of overdispersion. AlleleHMM is a powerful tool for discovering allele specific regions in functional genomic datasets.In the second part of my dissertation, I describe how I used naturally occurring genetic variation in F1 hybrid mice to explore how DNA sequence differences affect the steps in the transcription cycle. To maximize allelic differences, we generated ChRO-seq data from F1 hybrids of two genetically distinct breeds of mice: C57BL/6 (B6) and Castaneus (CAST). My analysis revealed a strong genetic basis for the precise coordinates of transcription initiation and promoter proximal pause. For initiation, the data suggest that Pol II scan bidirectionally to search for an energetically favorable transcription start site within a transcription start cluster. For promoter proximal pause, the data support where paused Pol II is positioned in part through a physical interaction with pre-initiation complex. The data also show substantial allelic differences in the position of transcription termination, which frequently do not affect the composition of the mature mRNA. Finally, I identified frequent, organ-specific changes in transcription that affect mRNA and ncRNA expression across broad genomic domains. Collectively, my work reveals how DNA sequences shape core transcriptional processes at single nucleotide resolution in mammals.
Danko, Charles G.
Yu, Haiyuan; Clark, Andrew
Genetics, Genomics and Development
Ph. D., Genetics, Genomics and Development
Doctor of Philosophy
dissertation or thesis