Team 5: unbiased computational strategy to identify distinct populations

The recent developments in single-cell RNA-sequencing provide an unprecedented resolution to study heterogeneous cell types and developmental states. High-throughput fluidic-based technologies (10X Genomics, Drop-seq) allow parallel sequencing of thousands of cells pooled prior to reverse transcription. Cell barcode (CB) and unique molecular identifier (UMI) sequences are introduced to cDNA molecules during cDNA synthesis step. Cell barcode information is used to identify reads coming from the same cell after pooled library sequencing. UMI is used to count exact transcript numbers corresponding to each gene eliminating PCR duplication bias.

Despite these advances, two main challenges remain in distinguishing cell populations over ambient RNA, low quality cells and debris: 1) Current strategies use total UMI number per cell as a measure of total RNA content and filter cells with relatively lower UMI content. This cut-off is sometimes arbitrary or user-defined based on expected cell numbers. However, heterogeneous samples such as human brain include cells with different total RNA content and complexity and use of one “cut-off” value might miss cell populations with relatively lower UMIs. 2) Sequencing error in the CB and UMI need to be considered. Here, we provide single-nuclei sequencing library data from adult human brain tissue prepared using 10X Genomics technology. The goal is to develop an unbiased computational strategy to identify distinct populations that presumably represent neurons, glia, debris, and ambient RNA.

Team Lead: Fatma Ayhan, Neuroscience,