Dongxiao Zhu, Ph.D.
06
Chair
Dissertation Title
Reconstructing Signaling Pathways from High Throughput Data
Research Interest
Many bioinformatics problems can be tackled from a fresh angle offered by the network perspective. Taking into account the network constraints on gene interaction, we propose a series of logically-coherent approaches to reconstruct signaling pathways from high throughput expression profiling data. These approaches proceed in three consecutive steps: co-expression network construction with controlled biological and statistical significance, network constrained clustering, and reconstruction of the order of pathway components. The first step relies on detecting pairwise co-expression of genes. We attack the problem from both frequentist statistics and Bayesian statistics perspectives. We designed and implemented a frequentist two-stage co-expression detection algorithm that controls both statistical significance (False Discovery Rate, FDR) and biological significance (Minimum Acceptable Strength, MAS) of the discovered co-expressions. In order to regularize variances of the correlation estimation in small sample scenario, we also designed and implemented a Bayesian hierarchical model, in which correlation parameters are assumed to be exchangeable and sampled from a parental Gaussian distribution. Using simulated data and the galactose metabolism data, we demonstrated advantages of our approaches and compared the differences among them. The second problem considered is distance-based clustering that accounts for network constraints extracted from the Giant Connected Component (GCC) of the network discovered from the data. The clustering is performed using a hybrid distance matrix composed of direct distance between adjacent genes and shortest-path distance between non-adjacent genes in the network. The third problem considered is the reconstruction of the order of pathway components. We applied a first-order Markov model, originally developed and applied to a network tomography problem in telecommunication networks, to reconstruct three well-known signaling pathways from unordered pathway components. We suggest that the methods proposed here can also be applied to other high throughput data analysis problems.