The recent advance in high-throughput sequencing (HTS) technologies creates enormous opportunity for discovery, concomitant with new challenges for data analysis and interpretion. A major application of HTS includes studies of how the transcriptome is modulated at the levels of gene expression and RNA processing, and how these events are related to cellular identity, environment, and/or disease status. To understand the impact of alternative polyadenylation (APA) events on post-transcriptional gene regulation, I have analyzed deep mammalian RNA-seq data using conservative criteria, and identified thousands of genes that utilize substantially extended novel 3'UTRs in mouse and human. Global tissue comparisons revealed that APA events generating these extensions were most prevalent in the brain. Collectively, these extensions contain thousands of conserved miRNA binding sites, and are strongly enriched for many well-studied neural miRNAs. Altogether, these revised 3'UTR annotations greatly expand the scope of post-transcriptional regulatory networks in mammals. This work further highlights opportunities to improve methods to leverage RNA-seq for 3'UTR annotation and identification of differential APA events. Existing assembly strategies often fragment long 3'UTRs, and importantly, none of the algorithms can be used to infer tandem 3'UTR isoforms directly from RNA-seq data. Consequently, it is often not possible to identify patterns of APA using existing assembly and differential expression testing workflows. To remedy these limitations, I developed a new method for transcript assembly, Isoform Structural Change Model (IsoSCM) that incorporates change-point analysis to improve the 3'UTR annotation process. Through evaluation on simulated and experimental data sets, I demonstrate that IsoSCM annotates 3' termini with higher sensitivity and specificity than can be achieved with existing methods. I highlight the utility of IsoSCM by demonstrating its ability to recover known patterns of tissue-regulated APA. The methodology encapsulated by IsoSCM will facilitate future efforts for 3'UTR annotation and genome-wide studies of the breadth, regulation, and roles of APA leveraging RNA-seq data. Finally, I describe CrossBrowse, a multi-species genome browser, and use several examples to illustrate how the visualizations generated by CrossBrowse inform comparative data analysis.
3'UTR; APA; polyadenylation; post-transcriptional regulation; RNA-seq; transcriptome
Physiology, Biophysics & Systems Biology
Attribution-NonCommercial-NoDerivatives 4.0 International