Yu Chen, Ph.D.
06

Ph.D. Program

Software Engineer

Google, Inc.

Chair

Gordon Crippen

Dissertation Title

Protein Structural Alignment and Application for Fold Recognition

Research Interest

Structural genomics aims to experimentally determine nearly all protein folds by 2010, which makes fold recognition a feasible and promising tool to reveal sequence-structure relationships. In the widely used 3D-1D method, each three dimensional structure is converted into a profile based on structural environments, and fold recognition will be achieved via sequence-profile alignments. Multiple alignments provide structural conservation information. But it is not easy to directly make use of multiple alignments of structures in fold recognition, because (1) most structural comparison methods do not consider environment-environment compatibilities in the alignment; and (2) multiple alignment algorithms cannot always remove conflicts between pair-wise alignments, especially when the noise level is high (which is common for structural alignments of proteins with low sequence identities). In the first part of the thesis, I introduced a novel approach to Structural Alignment Using a Clique finding algorithm and Environmental information (SAUCE), in which the alignment is built based on not only structural coordinate information, but also on realistic environmental information. This method is able to match environmentally compatible residues, and is able to find flexible alignments between multi-domain structures. In the second part, I described an iterative refinement routine, IRIS, to generate conflict-free multiple structural alignments. An optional feature of this routine is that the environmental compatibility could also be maintained, which made possible the assembly of SAUCE alignments. Results showed that the algorithm consistently improves multiple alignment performance and outperforms other state-of-the-art methods. In the third part, a tree-based fold search method is formulated. I applied this method to a group of structures with sequence identity less than 35% and did a series of leave one out tests. These tests are approximately comparable to real fold recognition tests on superfamily level. Results show that fold recognition via a fold tree can be faster and better at detecting distant homologues than classic fold recognition methods.

Current Placement