Lightnews — Scholar-powered news

Rohit Singh @rohitsingh8080.bsky.social · Jun 12

Excited about this! I'll focus on analyzing immune repertoires w PLMs.

Biological systems are inherently multi-scale. With the representational power and speed of PLMs, we can now bridge the molecular and systems scales, to study what makes each of us distinctive, immunity-wise.

AIRR-Community @airr-community.bsky.social · Jun 12

The next AIRR-C seminar is scheduled for June 26 at 1600 CET, 1000 ET, 700 PT with Rohit Sing @rohitsingh8080.bsky.social (antibody PLMs) and @joshuamcgrath.bsky.social (immunoglobulin codon mutability). Register here: antibodysociety.org/the-airr-com...

AIRR Community Seminar Series - The Antibody Society

Join the AIRR Community for a 90 minute “lab-style” virtual seminar. Each month an established and an early career scientist will discuss their AIRR-seq related research. Seminar time will be 16:00 – ...

www.antibodysociety.org

1

Reposted by Rohit Singh

Scott Soderling @scottsoderling.bsky.social · May 9

Yes, we have lots of exciting collaborative projects at the interface of computation and biology. Deep expertise in many domains between our labs, so a wonderful and committed training environment!

1 1

Rohit Singh @rohitsingh8080.bsky.social · May 9

Our fantastic trainees and collaborators made this possible. Kanchan Jha, Aditya Parekh and Pooja Parameswaran led the dry-lab work, while Daichi Shonai and Aki Uezu led the wet-lab work.

1

Rohit Singh @rohitsingh8080.bsky.social · May 9

This was a wonderful collab with @scottsoderling.bsky.social , whose lab is situated next to ours.

If you want to do cool collaborative work like this, join us! We're building a great ecosystem of AIxBio at Duke.

2 1 1

Rohit Singh @rohitsingh8080.bsky.social · May 9

I loved working on this project! Neither the kinase specificity prediction nor the proximity proteomics is enough on its own– you need both.

I think this project shows how a close collaboration between biologists and computer scientists can introduce entirely new capabilities.

1

Rohit Singh @rohitsingh8080.bsky.social · May 9

With KolossuS, we studied sleepiness in mice, and especially the signaling impact of Sik3, a kinase whose mutation leads to sleepy mice.

We think our Kolossus + proteomics approach has a ton of potential in deconvolving kinases involved in specific processes. 12/

1

Rohit Singh @rohitsingh8080.bsky.social · May 9

And of course, the interpretability of the kinase embedding space led to some fun explorations.

For example, we asked if the phylogeny of kinase families actually corresponds to substrate preferences? Broadly yes, but with a few key exceptions. 11/

1

Rohit Singh @rohitsingh8080.bsky.social · May 9

KolossuS’ architecture applies broadly across all kinases (and generalizes to other species) and it is well-calibrated.

This, combined with a proximity proteomics that lets us assay in a tissue of interest and sub-cellular locale, gives us the end-to-end solution we need. 10/

1

Rohit Singh @rohitsingh8080.bsky.social · May 9

A poorly calibrated model might always score one kinase highly even if, on a per-kinase basis, it is accurate on substrate specificities. See this note from the preprint: 9/

1

Rohit Singh @rohitsingh8080.bsky.social · May 9

Breadth and interpretability are self-explanatory, but why emphasize calibration? And what does it even mean?

The key insight is that given a phosphorylated peptide, we’ll computationally screen it against every human kinase. Calibration is critical for that. 8/

1

Rohit Singh @rohitsingh8080.bsky.social · May 9

Using the ESM-2 15B model (PLM scaling worked, for once!) we predict kinase-substrate specificity by learning a co-embedding of the two.

As models go, KolossuS is relatively simple. In its design, we emphasized three aspects: breadth, calibration and interpretability. 7/

1

Rohit Singh @rohitsingh8080.bsky.social · May 9

The relevant assay here is phosphoproteomics: you can identify the phosphorylated peptides in a sample. Proximity proteomics will let you further target a specific tissue and sub-cellular neighborhood. But that doesn’t tell you which kinases are active.

Enter KolossuS.

1

Rohit Singh @rohitsingh8080.bsky.social · May 9

Identifying the precise kinase involved in your pathway and tissue of interest therefore requires a mix of computation and experimentation. 5/

1

Rohit Singh @rohitsingh8080.bsky.social · May 9

As writers, the specificity of kinases is only moderately high– multiple kinases can often phosphorylate a substrate. The moderate specificity makes it easier to have signal integration but it of course leads to disease risk. 4/

1

Rohit Singh @rohitsingh8080.bsky.social · May 9

Kinases are the proto-example of something nature does often: take a simple biophysical phenomenon (here phosphorylation, but see also ubiquitination, acetylation etc.) and supercharge it as a signaling vehicle by evolving a spectrum of signal writers (e.g., kinases) and readers. 3/

1 1

Rohit Singh @rohitsingh8080.bsky.social · May 9

Surprisingly little is known about kinases. Despite their therapeutic and biological importance, 80% of the human kinome is “dark” i.e. we don’t have a good sense of the substrates a kinase targets, and in which cell types or sub-cellular compartments. 2/

1 2

Rohit Singh @rohitsingh8080.bsky.social · May 9

Read on for the skeetorial. But if you're in a rush, here's the preprint:

www.biorxiv.org/content/10.1...

Deep Learning-coupled Proximity Proteomics to Deconvolve Kinase Signaling In Vivo

Deconvolving the substrates of hundreds of kinases linked to phosphorylation networks driving cellular behavior is a fundamental, unresolved biological challenge, largely due to the poorly understood ...

www.biorxiv.org

1

Rohit Singh @rohitsingh8080.bsky.social · May 9

Introducing KolossuS to address a 50-year old problem: which kinases are active in your pathway of interest?

As computational biologists, our work mostly involves post-hoc analysis algorithms. KolossuS is the rare case where a ML model enables entirely new capabilities. 1/

1 2 12

Reposted by Rohit Singh

Scott Soderling @scottsoderling.bsky.social · Apr 28

The BEST part: This would not have been possible without the close (and SO FUN!) collaboration of my lab with @rohitsingh8080.bsky.social and the lab of Masashi Yanagisawa.

1 1

Reposted by Rohit Singh

Scott Soderling @scottsoderling.bsky.social · Apr 28

Over 50 yrs since the discovery of protein kinases, 80% of human kinases still have ≤20 known substrates, and many are “dark.” I'm EXCITED to announce our new work towards solving this- combining (1) deep learning with (2) proximity proteomics in vivo! ➡️
www.biorxiv.org/content/10.1...

1 8 34

Reposted by Rohit Singh

recombseq.bsky.social @recombseq.bsky.social · Apr 25

We had a great morning session, including a keynote on single cells and long reads @aliciao.bsky.social, and talks on spatial transcriptomics
@rohitsingh8080.bsky.social, DNA storage, and TAD inference

6 12

Rohit Singh @rohitsingh8080.bsky.social · Apr 6

For instance, there's a temptation to interpret the embeddings as implying a gene regulatory network. I don't think they do, at least not as per the typical GRN interpretation of causality within an individual cell.

4

Reposted by Rohit Singh

Anshul Kundaje @anshulkundaje.bsky.social · Apr 6

Enjoyed reading this one. Very insightful with some clever and intricate analyses. Also very well written.

Rohit Singh @rohitsingh8080.bsky.social · Apr 5

Bio foundation models are great design and engg tools. But can they help decode the fundamental principles of life?

We harnessed a single-cell FM for decoding the long-debated relationship between genome arch. and gene coregulation. 1/

Preprint here: www.biorxiv.org/content/10.1...

Tracing the Shared Foundations of Gene Expression and Chromatin Structure

The three-dimensional organization of chromatin into topologically associating domains (TADs) may impact gene regulation by bringing distant genes into contact. However, many questions about TADs' fun...

www.biorxiv.org

2 22

Rohit Singh @rohitsingh8080.bsky.social · Apr 6

Agree completely. Specifically, re single-cell foundation models: I think they are best thought of as representation learners that summarize massive scRNA-seq datasets usefully. But those representations have to be carefully interpreted.

1 2

Reposted by Rohit Singh

Anshul Kundaje @anshulkundaje.bsky.social · Apr 6

100% agree with this.

1