The human genome sequence folds in three dimensions (3D) into a rich variety of locus-specific contact patterns. Despite growing appreciation for the importance of 3D genome folding in evolution and disease, we lack models for relating mutations in genome sequences to changes in genome structure and function. Towards that goal, we discovered that the organization of gene regulatory domains within chromosomes and the specific sequences that sit at boundaries between domains are under strong negative selection in the human population and over primate evolution. Motivated by this signature of functional importance, we developed a deep convolutional neural network, called Akita, that accurately predicts genome folding from DNA sequence alone. Representations learned by Akita underscore the importance of the structural protein CTCF but also reveal a complex grammar beyond CTCF binding sites that underlies genome folding. Akita enabled rapid in silico predictions for effects of sequence mutagenesis on the 3D genome, including differences in genome folding across species and in disease cohorts, which we are validating with CRISPR-edited genomes. This prediction-first strategy exemplifies my vision for a more proactive, rather than reactive, role for data science in biomedical research.
Dr. Katherine S. Pollard is Director of the Gladstone Institute of Data Science & Biotechnology, Investigator at the Chan Zuckerberg Biohub, Professor in the Department of Epidemiology & Biostatistics and Bioinformatics Graduate Program at UCSF. Her lab develops statistical models and open source bioinformatics software for the analysis of massive genomic datasets. Previously, Dr. Pollard was an assistant professor in the University of California, Davis Genome Center and Department of Statistics. She earned her PhD in Biostatistics from the University of California, Berkeley and was a comparative genomics postdoctoral fellow at the University of California, Santa Cruz. She was awarded the Thomas J. Watson Fellowship, the Sloan Research Fellowship, and the Alumna of the Year from UC Berkeley. She is a Fellow of the International Society for Computational Biology and of the California Academy of Sciences.