Abhik Shah

Abhik Shah, Ph.D.
11

Ph.D. Program
VP of Engineering
Lexent Bio, Inc.

Chair

Dissertation Title

Mechanistic Bayesian Networks for Integrating Knowledge and Data to Unravel Biological Complexity

Research Interest

The determination of how protein interactions affect gene regulation is an important problem in systems biology. By identifying quantitative relationships between the interactome and transcriptome in complex pathologies, we can better characterize dysfunctional pathways, generate further hypotheses and identify potential targets for therapeutic interventions. This thesis develops methods and software for elucidating biological networks consistent with known mechanisms using Bayesian networks (BN) and high-throughput datasets in a novel methodology termed Mechanistic Bayesian networks (MBN). This thesis contributes new algorithms for data pre-processing and evaluating hidden variable models that are implemented in PEBL, an open-source library for MBN modeling with features unmatched by other software. Due to its ease of use and extensibility, PEBL allows one to run large, distributed analyses using cloud-computing platforms. MBN are used to identify the targets of the Sonic hedgehog signaling pathway that is implicated in development and cancer progression. The use of hidden variable models and the ability of BN to capture nonlinear, combinatorial and stochastic relationships identifies known and novel targets that are more biologically meaningful and outperforms other BN and non-BN methods. The approach developed is useful for identifying pathway targets, upstream regulators or, more generally, to identify additional components of partially-characterized topologies. MBN are next applied to identify subnetworks of the global interactome that govern gene expression during the epithelial-mesenchymal transition (EMT), a developmental process implicated in cancer metastasis. By modeling the effects of a protein interaction on downstream genes, a scoring metric was developed that quantifies the relevance of interactions to EMT. Application of the method to a cell-line lung cancer dataset identifies a core subnetwork that recapitulates EMT biology and makes predictions about protein interactions and their targets. Because the method does not rely on differential expression or the co-regulation assumptions, it is equally useful for microRNA-target, protein-DNA or mixed-interaction networks. The methods and software in this thesis are generally applicable to problems in elucidating interactions among variables using partially characterized knowledge and noisy high-dimensional datasets and furthers state-of-the-art BN methods by identifying results consistent with both known mechanisms and statistical relationships in data.

Current Placement

Lexent Bio, Inc.