Nathan Schaefer
@nkschaefer.bsky.social
21 followers 36 following 52 posts
UCSF postdoc, human, mammal
Posts Media Videos Starter Packs
nkschaefer.bsky.social
Thanks for reading, and good luck checking IDs and keeping the rifraff out of your single cell data sets.

www.biorxiv.org/content/10.1...

github.com/nkschaefer/c...
nkschaefer.bsky.social
In total, our study demonstrates the need for this set of tools, which provide new functionality, speed, and/or accuracy over existing tools. It also demonstrates the power of pooled single cell studies, including those involving composite cell lines, to discover new and interesting biology.
nkschaefer.bsky.social
Back-mutations to the ancestral state at this type are uncommon, at a frequency typically seen in mitochondrial protein-coding or disease-implicated mutations. This suggests that this mutation may be one of the changes affecting gene regulation at this locus.
nkschaefer.bsky.social
Mitochondrial genes are expressed as polycistronic transcripts, then cleaved and selectively degraded. We looked at species differences in this process, from two causes: nuclear and mitochondrial mutations. Interestingly, the biggest differences we found were compensatory, with little net effect.
nkschaefer.bsky.social
By finding one fusion line that tended to retain both species’ mitochondria, we were able to hone in on the gene network involved in this process: we can see what was turned up in the unhealthy cells, and what was turned down in those that survived.
nkschaefer.bsky.social
Cells with two species’ mitochondria have significantly altered gene expression related to cell cycle arrest and apoptosis relative to other cells, suggesting they’re in trouble. They also express fewer mitochondrial transcripts overall and have abnormal post-expression transcriptional regulation.
nkschaefer.bsky.social
After demultiplexing with CellBouncer, we found that composite cells mostly inherit only one species’ mitochondria: human, for human/chimpanzee cells, and bonobo, for chimpanzee/bonobo cells. Not always, though: some cells retained both mitochondria, or those from the less common species.
nkschaefer.bsky.social
We take CellBouncer for a spin on a cool data set: inter-species composite iPSCs we created by cell fusion (www.nature.com/articles/s41...) for studying species differences in gene regulation. Here, we asked if there were biases in which species’ mitochondria were inherited by the composite cells.
nkschaefer.bsky.social
doublet_dragon takes assignments from the other programs and infers a global doublet rate that encompasses both homotypic doublets (invisible to individual programs) and heterotypic ones. This can help with QC (given expectation based on cell loading density) and serve as a prior for other tools.
nkschaefer.bsky.social
demux_tags assigns custom labels (e.g. MULTIseq/HTO data), or sgRNAs (CRISPR guide capture data) to cells. Our method considers the distribution of all tag counts together, rather than considering each tag independently, and handles noisy/low-count data better than some alternatives.
nkschaefer.bsky.social
bulkprops takes genotypes and bulk data (or single cell data, ignoring cell barcodes) and infers the proportion of each individual in the pool. This can cross-check the other programs, and we provide a method to bootstrap proportions and get p-values when comparing two sets of proportions.
nkschaefer.bsky.social
Additionally, quant_contam models the genotypic origins of ambient RNA, meaning it can highlight when specific donors or cell lines contribute disproportionately to ambient RNA. If expression data are provided, quant_contam can adjust counts to account for contamination.
nkschaefer.bsky.social
quant_contam quantifies ambient RNA by measuring how often cells mismatch their expected genotypes. This introduces an external ground truth (genotype data), avoids the need to consider empty droplets, and can find ambient RNA in data lacking cell type diversity.
nkschaefer.bsky.social
After running demux_mt, we suggest a pipeline that can produce a VCF file of nuclear variants and demultiplex more cells using demux_vcf. While not suited to every data set, we demonstrate this method on whole-cell RNA-seq and single nucleus ATAC data, outperforming competing methods.
nkschaefer.bsky.social
demux_mt answers this problem by simultaneously clustering mitochondrial haplotypes and inferring the number of individuals in the pool. It takes only a BAM file. There is also a way to plot the haplotypes to see how well clustering worked.
nkschaefer.bsky.social
If you don’t have preexisting genotype data, there are tools to assign cells to individuals of origin by clustering genotypes (Vireo, souporcell, scSplit, freemuxlet), but there’s not a clear way to check results, and they can make mistakes.
nkschaefer.bsky.social
demux_vcf assigns cells to individuals using genotypes and is fast, accurate, and robust to deep population structure. It groups SNPs by allelic state in each pair of individuals and compares the likelihood of each pair of IDs for each cell, improving speed over methods that filter or refine SNPs.
nkschaefer.bsky.social
demux_species uses an alignment-free k-mer counting strategy to save time and memory and assigns cells to species using a statistical model instead of a cutoff. Users can plot the clustered k-mer counts to see if it worked. demux_species also separates reads by species for downstream processing.
nkschaefer.bsky.social
CellBouncer provides fast, compiled, self-contained, interacting programs with methods to validate results where possible (e.g. you can visually compare two sets of IDs for the same cells, and you can visualize inferred mitochondrial haplotypes to determine how well the clustering worked).
nkschaefer.bsky.social
…and tools that can identify specific types of cell doublets, but cannot calculate a global doublet rate (which includes droplets containing two cells of the same type).
nkschaefer.bsky.social
…genotype-free demultiplexing tools that lack a validation method, ambient RNA removal tools that require cell type heterogeneity, custom tag (e.g. MULTIseq, HTO) or sgRNA (e.g. CRISPR guide capture) assignment strategies that fail when data are sparse or noisy, …
nkschaefer.bsky.social
Useful bioinformatic tools for demultiplexing and QCing pooled data exist. We identified several unmet needs, though, including: no dedicated method for species demultiplexing, slow genotype demultiplexing with large SNP panels, sensitivity to deep population structure in SNP reference panels…
nkschaefer.bsky.social
Pooling cells from multiple donors, cell lines, or species makes it easy to scale up experiments, incorporate genetic variation, and mitigate technical artifacts, while doing cool things like disentangling the effects of cell-extrinsic from cell-intrinsic variation (www.nature.com/articles/s41...).
Human neuronal maturation comes of age: cellular mechanisms and species differences - Nature Reviews Neuroscience
Human cortical neurons undergo a protracted period of postmitotic maturation compared with those of other species. Wallace and Pollen review the cell-intrinsic and cell-extrinsic mechanisms that gover...
www.nature.com