Show simple item record

dc.contributor.authorChou, Shao-Pei
dc.date.accessioned2022-01-24T18:07:48Z
dc.date.available2022-01-24T18:07:48Z
dc.date.issued2021-12
dc.identifier.otherChou_cornellgrad_0058F_12821
dc.identifier.otherhttp://dissertations.umi.com/cornellgrad:12821
dc.identifier.urihttps://hdl.handle.net/1813/110824
dc.description200 pages
dc.description.abstractUnderstanding how DNA sequence affects transcription is an important first step to unravel the molecular mechanisms that cause genetic disease. Finding allele specific differences in the distribution of RNA Polymerase II (Pol II) along the genome is a powerful strategy for understanding the link between DNA sequence and the various steps in the transcription cycle. Using the natural genetic variation between the two homologous copies of the genome in diploid organisms, I can exclude most external confounding factors and identify the effect of DNA sequence differences between the copies. However, few computational methods have been developed to discover allele specific differences in functional genomic data. Existing methods either treat each SNP independently, limiting statistical power, or combine SNPs across gene annotations, preventing the discovery of allele specific differences in unexpected genomic regions. In the first part of my dissertation, I describe a new computational method, AlleleHMM, I developed which addresses this problem. AlleleHMM uses the spatial relationship among the neighboring single nucleotide polymorphisms (SNPs) to identify genomic blocks that share similar allele specific differences in mark abundance. Using both simulated and real genomic data, I found that AlleleHMM substantially outperforms naive methods, particularly when input data has realistic levels of overdispersion. AlleleHMM is a powerful tool for discovering allele specific regions in functional genomic datasets.In the second part of my dissertation, I describe how I used naturally occurring genetic variation in F1 hybrid mice to explore how DNA sequence differences affect the steps in the transcription cycle. To maximize allelic differences, we generated ChRO-seq data from F1 hybrids of two genetically distinct breeds of mice: C57BL/6 (B6) and Castaneus (CAST). My analysis revealed a strong genetic basis for the precise coordinates of transcription initiation and promoter proximal pause. For initiation, the data suggest that Pol II scan bidirectionally to search for an energetically favorable transcription start site within a transcription start cluster. For promoter proximal pause, the data support where paused Pol II is positioned in part through a physical interaction with pre-initiation complex. The data also show substantial allelic differences in the position of transcription termination, which frequently do not affect the composition of the mature mRNA. Finally, I identified frequent, organ-specific changes in transcription that affect mRNA and ncRNA expression across broad genomic domains. Collectively, my work reveals how DNA sequences shape core transcriptional processes at single nucleotide resolution in mammals.
dc.language.isoen
dc.titleCOMPUTATIONAL EXPLORATION OF THE GENETIC FACTORS BEHIND TRANSCRIPTIONAL REGULATION
dc.typedissertation or thesis
thesis.degree.disciplineGenetics, Genomics and Development
thesis.degree.grantorCornell University
thesis.degree.levelDoctor of Philosophy
thesis.degree.namePh. D., Genetics, Genomics and Development
dc.contributor.chairDanko, Charles G.
dc.contributor.committeeMemberYu, Haiyuan
dc.contributor.committeeMemberClark, Andrew
dcterms.licensehttps://hdl.handle.net/1813/59810.2
dc.identifier.doihttps://doi.org/10.7298/hp5y-gr08


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Statistics