Leveraging High Throughout Sequencing To Characterize Alternative Polyadenylation Across Species

Other Titles


The recent advance in high-throughput sequencing (HTS) technologies creates enormous opportunity for discovery, concomitant with new challenges for data analysis and interpretion. A major application of HTS includes studies of how the transcriptome is modulated at the levels of gene expression and RNA processing, and how these events are related to cellular identity, environment, and/or disease status. To understand the impact of alternative polyadenylation (APA) events on post-transcriptional gene regulation, I have analyzed deep mammalian RNA-seq data using conservative criteria, and identified thousands of genes that utilize substantially extended novel 3'UTRs in mouse and human. Global tissue comparisons revealed that APA events generating these extensions were most prevalent in the brain. Collectively, these extensions contain thousands of conserved miRNA binding sites, and are strongly enriched for many well-studied neural miRNAs. Altogether, these revised 3'UTR annotations greatly expand the scope of post-transcriptional regulatory networks in mammals. This work further highlights opportunities to improve methods to leverage RNA-seq for 3'UTR annotation and identification of differential APA events. Existing assembly strategies often fragment long 3'UTRs, and importantly, none of the algorithms can be used to infer tandem 3'UTR isoforms directly from RNA-seq data. Consequently, it is often not possible to identify patterns of APA using existing assembly and differential expression testing workflows. To remedy these limitations, I developed a new method for transcript assembly, Isoform Structural Change Model (IsoSCM) that incorporates change-point analysis to improve the 3'UTR annotation process. Through evaluation on simulated and experimental data sets, I demonstrate that IsoSCM annotates 3' termini with higher sensitivity and specificity than can be achieved with existing methods. I highlight the utility of IsoSCM by demonstrating its ability to recover known patterns of tissue-regulated APA. The methodology encapsulated by IsoSCM will facilitate future efforts for 3'UTR annotation and genome-wide studies of the breadth, regulation, and roles of APA leveraging RNA-seq data. Finally, I describe CrossBrowse, a multi-species genome browser, and use several examples to illustrate how the visualizations generated by CrossBrowse inform comparative data analysis.

Journal / Series

Volume & Issue



Date Issued




3'UTR; APA; polyadenylation; post-transcriptional regulation; RNA-seq; transcriptome


Effective Date

Expiration Date




Union Local


Number of Workers

Committee Chair

Committee Co-Chair

Committee Member

Degree Discipline

Physiology, Biophysics & Systems Biology

Degree Name

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)


Link(s) to Reference(s)

Previously Published As

Government Document




Other Identifiers


Attribution-NonCommercial-NoDerivatives 4.0 International


dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record