"Exploring bias in ATAC-seq experiments with ataqv, an interactive quality control tool for ATAC-seq data"
ATAC-seq has become the preferred method for mapping chromatin accessibility due to its speed and low input material requirements. However, it can be difficult to evaluate data quality when processing many samples. Here we uniformly processed >2,000 published ATAC-seq libraries across 27 studies. We identified considerable heterogeneity in the data even within single studies. To facilitate ATAC-seq quality control (QC) we created ataqv, a software package for quantifying and dynamically visualizing ATAC-seq bias. Ataqv enables comparison of diverse QC metrics across samples, producing both machine-readable metrics (JSON format) and an interactive human-readable report (HTML format).
Chromatin structure differs across classes of regulatory elements, and different chromatin structures yield different local ATAC-seq fragment length distributions (FLDs). We therefore hypothesized that FLD-influencing technical noise, such as over- or under-transposition, will obscure true biological signals. To test this, we performed ATAC-seq on GM12878 cells using seven different Tn5 concentrations with six replicates per concentration (42 libraries total), while holding the number of nuclei constant at 50K. In addition to predictably shifting the FLD towards shorter fragments, Tn5 concentration positively correlated with enrichment of reads around transcription start sites and in peaks and negatively correlated with the percentage of mitochondrial reads. Using a negative binomial generalized linear model (NB-GLM), we identified 59,531 (of 92,015) peaks showing an increase in signal with increasing Tn5 concentration (5% FDR). We found that the percentage of reads falling in enhancer and active TSS chromatin states increases with increasing Tn5, and that this is accompanied by a decrease in the proportion of reads falling in the low signal chromatin state. This suggests that ATAC-seq experiments performed specifically for the purpose of identifying enhancers and promoters will achieve better signal-to-noise with increasing Tn5 concentration. In a second experiment, we sequenced a set of ATAC-seq libraries on two sequencing runs in order to determine the effect of sequencing lane cluster density on the ATAC-seq metrics. We found that increasing sequencing lane cluster density shifts the FLD toward shorter fragments and robustly alters the apparent TSS enrichment of the libraries. We conclude that ataqv will help control for technical bias and produce more reliable ATAC-seq analysis results.