Rohit Singh
@rohitsingh8080.bsky.social
960 followers 370 following 72 posts
Computational biologist. Faculty @DukeU. Co-founder http://martini.ai. Prev @MIT_CSAIL. Did quant investing for a while, before returning to research. https://singhlab.net
Posts Media Videos Starter Packs
Pinned
rohitsingh8080.bsky.social
Bio foundation models are great design and engg tools. But can they help decode the fundamental principles of life?

We harnessed a single-cell FM for decoding the long-debated relationship between genome arch. and gene coregulation. 1/

Preprint here: www.biorxiv.org/content/10.1...
Tracing the Shared Foundations of Gene Expression and Chromatin Structure
The three-dimensional organization of chromatin into topologically associating domains (TADs) may impact gene regulation by bringing distant genes into contact. However, many questions about TADs' fun...
www.biorxiv.org
rohitsingh8080.bsky.social
Excited about this! I'll focus on analyzing immune repertoires w PLMs.

Biological systems are inherently multi-scale. With the representational power and speed of PLMs, we can now bridge the molecular and systems scales, to study what makes each of us distinctive, immunity-wise.
Reposted by Rohit Singh
scottsoderling.bsky.social
Yes, we have lots of exciting collaborative projects at the interface of computation and biology. Deep expertise in many domains between our labs, so a wonderful and committed training environment!
rohitsingh8080.bsky.social
Our fantastic trainees and collaborators made this possible. Kanchan Jha, Aditya Parekh and Pooja Parameswaran led the dry-lab work, while Daichi Shonai and Aki Uezu led the wet-lab work.
rohitsingh8080.bsky.social
This was a wonderful collab with @scottsoderling.bsky.social , whose lab is situated next to ours.

If you want to do cool collaborative work like this, join us! We're building a great ecosystem of AIxBio at Duke.
rohitsingh8080.bsky.social
I loved working on this project! Neither the kinase specificity prediction nor the proximity proteomics is enough on its own– you need both.

I think this project shows how a close collaboration between biologists and computer scientists can introduce entirely new capabilities.
rohitsingh8080.bsky.social
With KolossuS, we studied sleepiness in mice, and especially the signaling impact of Sik3, a kinase whose mutation leads to sleepy mice.

We think our Kolossus + proteomics approach has a ton of potential in deconvolving kinases involved in specific processes. 12/
rohitsingh8080.bsky.social
And of course, the interpretability of the kinase embedding space led to some fun explorations.

For example, we asked if the phylogeny of kinase families actually corresponds to substrate preferences? Broadly yes, but with a few key exceptions. 11/
rohitsingh8080.bsky.social
KolossuS’ architecture applies broadly across all kinases (and generalizes to other species) and it is well-calibrated.

This, combined with a proximity proteomics that lets us assay in a tissue of interest and sub-cellular locale, gives us the end-to-end solution we need. 10/
rohitsingh8080.bsky.social
A poorly calibrated model might always score one kinase highly even if, on a per-kinase basis, it is accurate on substrate specificities. See this note from the preprint: 9/
rohitsingh8080.bsky.social
Breadth and interpretability are self-explanatory, but why emphasize calibration? And what does it even mean?

The key insight is that given a phosphorylated peptide, we’ll computationally screen it against every human kinase. Calibration is critical for that. 8/
rohitsingh8080.bsky.social
Using the ESM-2 15B model (PLM scaling worked, for once!) we predict kinase-substrate specificity by learning a co-embedding of the two.

As models go, KolossuS is relatively simple. In its design, we emphasized three aspects: breadth, calibration and interpretability. 7/
rohitsingh8080.bsky.social
The relevant assay here is phosphoproteomics: you can identify the phosphorylated peptides in a sample. Proximity proteomics will let you further target a specific tissue and sub-cellular neighborhood. But that doesn’t tell you which kinases are active.

Enter KolossuS.
rohitsingh8080.bsky.social
Identifying the precise kinase involved in your pathway and tissue of interest therefore requires a mix of computation and experimentation. 5/
rohitsingh8080.bsky.social
As writers, the specificity of kinases is only moderately high– multiple kinases can often phosphorylate a substrate. The moderate specificity makes it easier to have signal integration but it of course leads to disease risk. 4/
rohitsingh8080.bsky.social
Kinases are the proto-example of something nature does often: take a simple biophysical phenomenon (here phosphorylation, but see also ubiquitination, acetylation etc.) and supercharge it as a signaling vehicle by evolving a spectrum of signal writers (e.g., kinases) and readers. 3/
rohitsingh8080.bsky.social
Surprisingly little is known about kinases. Despite their therapeutic and biological importance, 80% of the human kinome is “dark” i.e. we don’t have a good sense of the substrates a kinase targets, and in which cell types or sub-cellular compartments. 2/
rohitsingh8080.bsky.social
Introducing KolossuS to address a 50-year old problem: which kinases are active in your pathway of interest?

As computational biologists, our work mostly involves post-hoc analysis algorithms. KolossuS is the rare case where a ML model enables entirely new capabilities. 1/
Reposted by Rohit Singh
scottsoderling.bsky.social
The BEST part: This would not have been possible without the close (and SO FUN!) collaboration of my lab with @rohitsingh8080.bsky.social and the lab of Masashi Yanagisawa.
Reposted by Rohit Singh
scottsoderling.bsky.social
Over 50 yrs since the discovery of protein kinases, 80% of human kinases still have ≤20 known substrates, and many are “dark.” I'm EXCITED to announce our new work towards solving this- combining (1) deep learning with (2) proximity proteomics in vivo! ➡️
www.biorxiv.org/content/10.1...
Reposted by Rohit Singh
recombseq.bsky.social
We had a great morning session, including a keynote on single cells and long reads @aliciao.bsky.social, and talks on spatial transcriptomics
@rohitsingh8080.bsky.social, DNA storage, and TAD inference
rohitsingh8080.bsky.social
For instance, there's a temptation to interpret the embeddings as implying a gene regulatory network. I don't think they do, at least not as per the typical GRN interpretation of causality within an individual cell.
Reposted by Rohit Singh
anshulkundaje.bsky.social
Enjoyed reading this one. Very insightful with some clever and intricate analyses. Also very well written.
rohitsingh8080.bsky.social
Bio foundation models are great design and engg tools. But can they help decode the fundamental principles of life?

We harnessed a single-cell FM for decoding the long-debated relationship between genome arch. and gene coregulation. 1/

Preprint here: www.biorxiv.org/content/10.1...
Tracing the Shared Foundations of Gene Expression and Chromatin Structure
The three-dimensional organization of chromatin into topologically associating domains (TADs) may impact gene regulation by bringing distant genes into contact. However, many questions about TADs' fun...
www.biorxiv.org
rohitsingh8080.bsky.social
Agree completely. Specifically, re single-cell foundation models: I think they are best thought of as representation learners that summarize massive scRNA-seq datasets usefully. But those representations have to be carefully interpreted.
Reposted by Rohit Singh