August 2, 2019

“Statistical methods for flexible differential analysis of cross-sample single-cell RNA-seq datasets”,

2:00 PM to 3:00 PM

Forum Hall, Palmer Commons

MIDAS Seminar: Mark Robinson PhD, University of Zurich

Abstract

Single-cell RNA sequencing (scRNA-seq) has quickly become an empowering technology to characterize the transcriptomes of individual cells. A primary task in the analysis of scRNA-seq data is differential expression analysis (DE). Most early analyses of DE in scRNA-seq data have aimed at identifying differences between cell types, and thus are focused on finding markers for cell sub-populations (experimental units are cells).

There is now an emergence of multi-sample multi-condition scRNA-seq datasets where the goal is to make sample-level inferences (experimental units are samples), with 100s to 1000s of cells measured per replicate. To tackle such complex experimental designs, so-called differential state (DS) analysis follows cell types across a set of samples (e.g., individuals) and experimental conditions (e.g., treatments), in order to identify cell-type specific responses, i.e., changes in cell state. DS analysis: i) should be able to detect expressed changes that affect only a single cell type, a subset of cell types, or even a subset of cells within a cell type; and, ii) is orthogonal to clustering or cell type assignment (i.e., genes typically associated with cell types are not of direct interest for DS). Furthermore, cell-type level DE analysis is arguably more interpretable and biologically meaningful.

We compared three conceptually different approaches that act on the cell-, sample-, and group-level, including: i) mixed-models to cell-level measurements (replicates are cells); ii) aggregating single cells into “pseudo-bulk” data at the sub-population level and leveraging existing robust bulk RNA-seq frameworks (replicates are samples); and, iii) as a reference, existing scRNA-seq DE methods disregarding sample labels, treating each group as a different cell-type (no replicates).

To compare method performances, we implemented a flexible simulation framework that accommodates multiple clusters and samples across experimental conditions, varying sample, cluster, and group sizes, and is able to introduce a broad range of differential expression patterns. Notably, our framework reproduces the many structures characteristic to scRNA-seq data (e.g., dispersion-mean, dropout percentages) as well as the cell-, sample-, and pseudobulk-level variability.

We have implemented this framework along with various DS analysis methods in muscat (https://github.com/HelenaLC/muscat), an R package that provides differential testing and visualization tools for multi-sample multi-group scRNA-seq data.