Wednesday, October 28, 2020

CCMB Seminar: "Cluster Ensemble and Batch Correction Methods for Single Cell RNA-Sequencing Data"

4:00 PM to 5:00 PM

This seminar will be web stream only

CCMB Seminar Series – sponsored by DCMB
by Dr. Yun Li

Abstract

Single-cell RNA sequencing (scRNA-seq) allows researchers to examine the transcriptome at the single-cell resolution and has been increasingly employed as technologies continue to advance. Due to technical and biological reasons unique to scRNA-seq data, clustering and batch effect correction are almost indispensable to ensure valid and powerful data analysis. Multiple methods have been proposed for these two important tasks. For clustering, we have found that different methods, including state-of-the-art methods such as Seurat, SC3, CIDR, SIMLR, t-SNE + k-means, yield varying results in terms of both the number of clusters and actual cluster assignments. We have developed ensemble methods, SAFE-clustering and SAME-clustering, that leverages hyper-graph partitioning algorithms and a mixture model-based approach respectively to produce more robust and accurate ensemble solution on top of clustering results from individual methods. For batch effect correction, we have developed methods based on supervised mutual nearest neighbor detection to harness the power of known cell type labels for certain single cells. We benchmarked all methods in various scRNA-seq datasets to demonstrate their utilities.

Short Bio

Single-cell RNA sequencing (scRNA-seq) allows researchers to examine the transcriptome at the single-cell resolution and has been increasingly employed as technologies continue to advance. Due to technical and biological reasons unique to scRNA-seq data, clustering and batch effect correction are almost indispensable to ensure valid and powerful data analysis. Multiple methods have been proposed for these two important tasks. For clustering, we have found that different methods, including state-of-the-art methods such as Seurat, SC3, CIDR, SIMLR, t-SNE + k-means, yield varying results in terms of both the number of clusters and actual cluster assignments. We have developed ensemble methods, SAFE-clustering and SAME-clustering, that leverages hyper-graph partitioning algorithms and a mixture model-based approach respectively to produce more robust and accurate ensemble solution on top of clustering results from individual methods. For batch effect correction, we have developed methods based on supervised mutual nearest neighbor detection to harness the power of known cell type labels for certain single cells. We benchmarked all methods in various scRNA-seq datasets to demonstrate their utilities.