Friday, June 26, 2020

Bioinformatics Ph.D. Thesis Defense

1:00 PM

Advisor:  Stephen Parker

by Ph.D. Candidate Peter Orchard

"Epigenomic and transcriptomic profiling for the study of monogenic and polygenic traits and disease"

Abstract

Many disease- and trait-associated genomic loci are in non-coding regions of the genome. Determining which genetic variants in these regions are causally related to a trait of interest and elucidating their mechanisms of action and downstream effects can be difficult. Layering transcriptomic and epigenomic data on top of genetic variation data can help nominate causal phenotype-associated genetic variants and generate hypotheses about their effects in different cellular and biological contexts. 

In this dissertation, I first apply RNA-sequencing (RNA-seq) and the assay for transposase accessible chromatin using sequencing (ATAC-seq) to investigate the changes in gene expression and chromatin accessibility in the Danforth mouse, a mouse model of human birth defects. These analyses suggest that previously-discovered overexpression of Ptf1a leads to disruption of the sonic hedgehog signaling pathway, and raise the possibility that Ptf1a overexpression may be partially mediated by a nearby lncRNA promoter orthologous to a human PTF1A enhancer. 

Next, I apply a new software package for the quality control of ATAC-seq data to public datasets to measure heterogeneity, and analyze new GM12878 ATAC-seq datasets to quantify the impact of Tn5 transposase concentration and sequencing lane cluster density. This analysis demonstrates some of the difficulties in reliably quantifying chromatin accessibility and utilizing public ATAC-seq datasets. 

In my third project, I apply single-nucleus ATAC-seq and single-nucleus RNA-seq to human and rat skeletal muscle samples to generate cell type specific transcriptomic and chromatin accessibility maps for skeletal muscle cell types. I integrate these maps with genome-wide association study (GWAS) data from the UK Biobank and several consortia to explore enrichment of GWAS signal in cell type specific ATAC-seq peaks and nominate causal genetic variants, most notably a T2D-associated variant at the ARL15 locus. 

Lastly, to gain insight into the genetic regulation of chromatin architecture and its association with aerobic exercise capacity (a phenotype negatively correlated with mortality and risk for many complex diseases), we analyze skeletal muscle ATAC-seq, RNA-seq, and genotype data from a rat model for untrained running capacity to search for regulatory elements and genes associated with running capacity as well as identify genetic variants associated with chromatin accessibility. 

Together, these projects demonstrate the value of epigenomic and transcriptomic data in the investigation of monogenic and polygenic traits, as well as the challenges and limitations of applying epigenomic and transcriptomic data in this context.