"qTMclust: Quick TM-score-based Clustering of Protein Structures"
Searching a query protein structure through the PDB structure database to identify its structural homologs is a common task in structure-based protein function annotation, structure model refinement, and protein design. However, the alignment process between the query structure and other proteins in the PDB database is slow, which makes the search time-consuming. One plausible approach to accelerate structure searches is to cluster redundant PDB entries according to pairwise structural similarity. In this way, a query structure only needs to be compared against a reduced set of cluster representatives instead of the whole PDB database. Currently, there is no program which is capable of clustering similar protein structures in PDB database. In this project, we are developing a new algorithm, qTMclust, which can re-organize different proteins in the PDB database into clusters based on structural similarity. Assuming that a small number of pairwise structure alignments are sufficient to establish the clusters, qTMclust accelerates the clustering process by avoiding unnecessary structure comparisons using two heuristics: a greedy-incremental clustering algorithm, and an approximate structure similarity calculation.