"Computational Methods to Dissect Tissue-Specific Landscapes of Transcription Factor and DNA Interactions"
The intricately ordered structure of the human genome is a product of dynamic interactions between DNA and proteins such as nucleosomes and transcription factors (TFs), which allow cells to respond to environmental changes while maintaining robustness of genetic programs. Changes in the non-coding genome can affect gene regulation and lead to increased disease predisposition, but the underlying mechanisms are not fully understood. Therefore, understanding how the genome is organized and regulated is a central question in biomedical research. My thesis aims to develop and apply novel computational methods to understand general biological mechanisms of genome regulation, with a focus on TF-DNA interactions.
In the initial part of this thesis, I develop computational methods to quantify TF-DNA interaction patterns by applying information theory to high-throughput molecular profiles of chromatin accessibility data (using the assay for transposase-accessible chromatin followed by high-throughput sequencing, ATAC-seq) to measure a property which we name chromatin information. To circumvent the requirement of high-throughput molecular profiles of TF binding (chromatin immunoprecipitation followed by sequencing, ChIP-seq) to obtain chromatin information measurements, I develop BMO, a novel algorithm to predict TF binding from chromatin accessibility data that outperforms current state-of-the-art methods. Using BMO in combination with the information theoretical approach developed here, I quantify the chromatin information patterns of hundreds of TF motifs across different human tissues and cell lines. Only a subset of TFs (10-20%) have high chromatin information, and are therefore associated with organized chromatin. By integrating multiple layers of molecular profiles, I find that high chromatin information TFs have longer TF-DNA residence times, associate with nucleosome phasing, and are enriched to overlap regions associated with the genetic control of gene expression. I then use genetic data to find evidence that high chromatin information TFs associate with increased chromatin accessibility, and may therefore act as pioneer TFs. In the last part of this thesis, I apply TF binding prediction algorithms to characterize the regulatory landscape associated with thymocyte development. The results from these analyses support that thymocyte development is a highly dynamic process and help prioritize novel candidate TFs and regulatory elements for future experimental validation.
This work represents a novel fusion of two research domains - information theory and genomics - which allowed to capture properties of TF-chromatin interactions, with important implications for gene regulation, cell state dynamics, and understanding the pathological mechanisms associated with non-coding disease-associated genetic variants.