"Combined DNase footprinting and motif matching model for identification of transcription factor binding sites"
Transcription is tightly regulated by cis-regulatory DNA elements where transcription factors (TFs) can bind. This regulation plays an important role in development and cell fate. Thus, identification of transcription factor binding sites (TFBSs) is key to the understanding of gene expression and whole regulatory networks within a cell. The standard approaches for identification of TFBSs such as position weight matrices (PWMs) and chromatin immunoprecipitation followed by sequencing (ChIP-seq) are widely used but have drawbacks such as high false positive rates and difficulty creating specific antibodies respectively.
Traditional DNase I footprinting assays are another classical method to investigate in vivo protection of DNA by individually bound proteins. It has been adapted for high-throughput sequencing (DNase-seq), which can be used for genome-wide footprinting. DNase-seq allows nucleotide-level identification of TFBSs by searching for footprint-like regions with low numbers of DNase I cuts surrounded by regions with high numbers of cuts.
Several computational footprinting algorithms have been developed to detect TFBSs by investigating chromatin accessibility patterns, but most methods need pre-identified motif sites and were not designed to distinguish between different TFs. Here I developed a footprinting method based on a Hidden Markov Model (HMM) to integrate both DNase I hypersensitivity (DNase-seq) and genome sequence information (PWMs) for TFBS prediction. This new method can annotate binding sites for all desired TFs automatically and doesn't need pre-generated candidate binding sites that are required by other footprinting methods.