Wednesday, December 8, 2021

CCMB Seminar: "Deep learning and Template-based Protein and RNA Structure Prediction"

4:00 PM

Forum Hall, 4th Floor, Palmer Commons Building

CCMB Seminar Series – sponsored by DCMB
by Chengxin Zhang (Postdoc Fellow, Yale University)

Abstract

The biological functions of proteins and non-coding RNAs are typically contingent on their specific tertiary and quaternary structures. Since experimental structure determination is laborious and costly, accurate computational structure prediction has been a major focus of bioinformatics for several decades. I developed D-QUARK, a protein structure prediction protocol powered by deep multiple sequence alignment generation, deep learning-based inter-residue distance and orientation prediction, and extensive replica-exchange Monte Carlo-based conformation sampling. D-QUARK participated in CASP14 as the “QUARK” group and was the first ranked automated server for template-free protein structure prediction (https://predictioncenter.org/casp14/zscores_final.cgi?gr_type=server_onl...).

While the field has made significant advancements in protein structure prediction, techniques for structure prediction of RNA monomers and RNA-protein complexes are still underdeveloped. For example, during recent RNA-puzzle challenges, even the best teams in the RNA structure prediction community struggled to consistently produce RNA structure models with RMSDs <7Å relative to the native structure, even for RNAs with <100 nucleotides. To address this issue, I have developed several algorithms for RNA structure modeling. First, I developed rMSA, a hierarchical approach for generation of deep and high-quality RNA multiple sequence alignments. Alignments from rMSA consistently improve RNA secondary structure and contact prediction by coevolution and deep learning.

Next, I developed US-align, the first method to universally align monomer and complex structures of nucleic acids and proteins. US-align is built on a uniform size-independent objective function (TM-score) coupled with a heuristic alignment searching algorithm. US-align method is 10-100 times faster and yet generates more accurate alignments compared to the state-of-the-art individual methods developed to perform specific molecular structure alignments. When applied to template-based RNA-protein docking, US-align generates the complex structures with lower RMSD in 6-28x the speed compared to existing docking programs. Most recently, I am developing CoMMiT, a template-based RNA structure prediction pipeline to combine multiple in-house and third-party RNA threading programs. CoMMiT consistently improves RNA structure prediction accuracy over all its component methods as well as state-of-the-art template-free RNA structure prediction programs. These algorithms are part of my initial steps towards a composite pipeline for deep learning-based structure and function prediction of RNAs and RNA-protein complexes.