Nickolay Khazanov, Ph.D.
12

Ph.D. Program

Bioinformatics Scientist III

Thermo Fisher Scientific, Inc.

Chair

Heather Carlson

Dissertation Title

Large-Scale Analysis of Protein-Ligand Binding Sites using the Binding MOAD Database

Research Interest

Current structure-based drug design (SBDD) methods require understanding of general tends of protein-ligand interactions. Informative descriptors of ligand-binding sites provide powerful heuristics to improve SBDD methods designed to infer function from protein structure. These descriptors must have a solid statistical foundation for assessing general trends in large sets of protein-ligand complexes. This dissertation focuses on mining the Binding MOAD database of highly curated protein-ligand complexes to determine frequently observed patterns of binding-site composition. An extension to Binding MOAD’s framework is developed to store structural details of binding sites and facilitate large-scale analysis. This thesis uses the framework to address three topics. It first describes a strategy for determining over-representation of amino acids within ligand-binding sites, comparing the trends of residue propensity for binding sites of biologically relevant ligands to those of spurious molecules with no known function. To determine the significance of these trends and to provide guidelines for residue-propensity studies, the effect of the data set size on the variation in propensity values is evaluated. Next, binding-site residue propensities are applied to improve the performance of a geometry-based, binding-site prediction algorithm. Propensity-based scores are found to perform comparably to the native score in successfully ranking correct predictions. For large proteins, propensity-based and consensus scores improve the scoring success. Finally, current protein-ligand scoring functions are evaluated using a new criterion: the ability to discern biologically relevant ligands from “opportunistic binders,” molecules present in crystal structures due to their high concentrations in the crystallization medium. Four different scoring functions are evaluated against a diverse benchmark set. All are found to perform well for ranking biologically relevant sites over spurious ones, and all performed best when penalties for torsional strain of ligands were included. The final chapter describes a structural alignment method, termed HwRMSD, which can align proteins of very low sequence homology based on their structural similarity using a weighted structure superposition. The overall aims of the dissertation are to collect high-quality binding-site composition data within the largest available set of protein-ligand complexes and to evaluate the appropriate applications of this data to emerging methods for computational proteomics.

Current Placement