Sang (Tony) Chun, Ph.D.
The transmission of genetic information from the transcription of DNA to RNA and the subsequent translation of RNA into protein is often abstracted into a linear process. However, as methods and technologies to measure the genomic, transcriptome, and proteomic content of cells have advanced, so too has our understanding that the transmission of genetic information does not always flow in a lossless manner. For instance, changes observed in messenger RNA (mRNA) abundance are not always retained at the proteomic level. Indeed, a diverse array of mechanisms have been identified that exert regulatory control over this transmission of information. Next-generation short read sequencing has driven many of these insights and provided increasingly nuanced understanding of these regulatory mechanisms. However, the continued development and application of sequencing methodologies and analytics are required to properly contextualize many of these insights on a more global scale. Ribosome profiling is one such recent advancement which enriches for ribosome-protected fragments (RPFs) of mRNA; sequencing and analysis of these ribosome-protected mRNA fragments enables profiling of the translational content of a sample. The aim of this dissertation is to address the need for the development and application of statistical and analytical algorithms to profile the regulatory factors that contribute to the translational dynamics in cells.
In the first chapter, I survey the development and application of next-generation sequencing methods for the profiling and computational analysis of translation and translational dynamics. In the second chapter of this thesis, I present SPECtre, a software package that identifies regions of active translation through measurement of the translational engagement of ribosomes over a transcript. SPECtre achieves high sensitivity and specificity in its classification of regions undergoing translation by leveraging the codon-dependent elongation of peptides; this tri-nucleotide periodicity is evident in the alignment of ribosome profiling sequence reads to a reference transcriptome. SPECtre classifies actively translated transcripts according to their coherence in read coverage over a region to an optimal tri-nucleotide signal.
In the third chapter, I describe the application of SPECtre to identify the translation of upstream-initiated open-reading frames (uORFs) that may regulate differentiation in a neuron-like cell model. uORFs are transcripts that result from the initiation of translation from AUG, and under certain biological constraints, from non-AUG sequences localized in the 5’ untranslated regions of annotated protein-coding genes. Subsets of these uORFs have been implicated in the regulation of their downstream protein-coding genes in yeast, mice and humans. In this chapter, I provide further evidence for this regulation as well as the spatial context for the functional consequences of uORF translation on downstream protein-coding genes in a neuron-like cell line model of differentiation.
Finally, in the fourth chapter, I outline a strategy using our coherence-based translational scoring algorithm to profile ribosomal engagement over chimeric gene fusion breakpoints in prostate cancer. Here, known breakpoints from current annotation databases are integrated with novel junctions nominated by existing whole genome and transcriptomic gene fusion detection algorithms, and the translational profile over these chimeric junctions using SPECtre is measured. This provides an additional layer of translational evidence to known and novel gene fusion breakpoints in prostate cancer. Ongoing development of a database and visualization platform based on these results will enable integrative insights into the transcriptional and translational topology of these breakpoints.