Monday, July 30, 2018

Bioinformatics Ph.D. Thesis Defense

1:00 PM

Great Lakes North, 4th Fl., Palmer Commons Bldg. 

by Ph.D. Candidate Ari Allyn-Feuer 

Advisor: Brian Athey

"The Pharmacoepigenomics Informatics Pipeline and H-GREEN Hi-C Compiler: Discovering Pharmacogenomic Variants and Pathways with the Epigenome and Spatial Genome"

Abstract

Over the last decade, biomedical science has been transformed by the epigenome and spatial genome. We have discovered that the multifarious chemical and spatial arrangements of the genome in the nucleus have a bidirectional causal relationship with gene expression, and therefore cell fate, and therefore organismal function and dysfunction and disease, that is both extremely complicated and extremely powerful. We have begun to chart atlases of the epigenome and spatial genome that have progressively increased in depth, breadth, conceptual comprehension, and quality, and such atlases have begun, like the reference genome, to be a foundational underpinning for research in many fields.

In particular, the epigenome and spatial genome are increasingly used to discover causative regulatory variants in the significance regions of genome-wide association studies, for the discovery of the biological mechanisms underlying these phenotypes and the design of genetic tests to predict them. It has become widely understood within the epigenome field that regulatory variants acting by way of the epigenome exert a significantly more potent influence than coding variants on the genetic variance of many important biomedical phenotypes.

However, in the area of pharmacogenomics, the study of the genetic underpinnings of pharmacological phenotypes like drug response and adverse events, and the design of genetic tests for these phenotypes, such advances have been radically underapplied. The majority of pharmacogenomics tests are designed manually on the basis of mechanistic work with candidate genes, and where genome wide approaches are used, they are typically not interpreted with the epigenome. Coding variants predominate.

This work describes a series of analyses of pharmacogenomics association studies with the tools and datasets of the epigenome and spatial genome, undertaken with the intent of discovering causative regulatory variants to enable new genetic tests. It describes the potent regulatory variants discovered thereby to have a putative causative and predictive role in a number of medically important phenotypes, and in particular the tendency for such variants to cluster into spatially interacting, conceptually unified pathways which offer mechanistic insight into these phenotypes.

It describes the Pharmacoepigenomics Informatics Pipeline, an integrative multiple omics variant discovery pipeline designed to make this kind of analysis easier and cheaper to perform, more reproducible, and amenable to the addition of advanced features. It describes the H-GREEN Hi-C compiler, designed to analyze spatial genome data and discover the distant target genes of such regulatory variants, as a module in a future PIP-style pipeline.

It describes a potential featureset of a future pipeline, using the latest epigenome research and the lessons of the previous pipeline. It describes my thinking about how to use the output of a multiple omics variant pipeline to design genetic tests that also incorporate clinical data. And it concludes by describing a long term vision for a comprehensive pharmacophenomic atlas, to be constructed by applying a variant pipeline and machine learning test design system, such as is described, to thousands of phenotypes in parallel.

Scientists struggled to assay genotypes for the better part of a century, and in the last twenty years, succeeded. The struggle to predict phenotypes on the basis of the genotypes we assay remains ongoing. The use of multiple omics variant pipelines and machine learning models with omics atlases, genetic association, and medical records data will be an increasingly significant part of that struggle for the foreseeable future.