Pooja Kathail
@poojakathail.bsky.social
87 followers 200 following 4 posts
Computational Biology PhD student @ucberkeley
Posts Media Videos Starter Packs
Pinned
poojakathail.bsky.social
Super excited to share our review on genomic deep learning models for non-coding variant effect prediction, with Ayesha Bajwa and Nilah Ioannidis. We’d like this review to be a useful resource, and welcome any feedback, comments, or questions! 1/4

arxiv.org/abs/2411.11158
Leveraging genomic deep learning models for non-coding variant effect prediction
The majority of genetic variants identified in genome-wide association studies of complex traits are non-coding, and characterizing their function remains an important challenge in human genetics. Gen...
arxiv.org
Reposted by Pooja Kathail
lianafaye.bsky.social
This preprint from Helen Sakharova is one of the coolest things to come out of my lab: “Protein language models reveal evolutionary constraints on synonymous codon choice.” Codon choice is a big puzzle in how information is encoded in genomes, and we have a new angle. www.biorxiv.org/content/10.1...
Protein language models reveal evolutionary constraints on synonymous codon choice
Evolution has shaped the genetic code, with subtle pressures leading to preferences for some synonymous codons over others. Codons are translated at different speeds by the ribosome, imposing constrai...
www.biorxiv.org
Reposted by Pooja Kathail
anshulkundaje.bsky.social
Congratulations to incoming postdoc @rrastogi.bsky.social for being awarded the Warren Alpert Postdoctoral Scholarship! Look forward to having him join us in soon!
Reposted by Pooja Kathail
davidaknowles.bsky.social
We had a bunch of requests so we're extending the #MLCB2025 deadline to June 3rd (anywhere on earth)! cmt3.research.microsoft.com/MLCB2025 to submit.
Reposted by Pooja Kathail
jeremymberg.bsky.social
I have confirmation from several sources now that all T32s, many F30s and F31s, and most or all Center awards (P30, P50) have been terminated at Columbia.

This is quite damaging to research and to individuals.

This is pure terrorism and cannot be legal. But litigation will take time...
Reposted by Pooja Kathail
davidaknowles.bsky.social
Wow. "NIH" canceled my co-mentored (with Dave Sulzer) PhD student's F31 funding. His work is on understanding the genetics and neuroscience of language learning disorders. F31 provides no indirect $ to Columbia, just pays his salary. Not that it should matter, but he's an American citizen. W.T.F.
Reposted by Pooja Kathail
fernandoperez.org
It's today, T-3h! If you're in the East Bay and care about science or education (i.e. if you care about living on this planet in any form 😃), join us, 11:45 at Upper Sproul!

And if you're elsewhere, look up a local event in your area, there's a LOT happening today!

www.standup4scienceberkeley.com
Map of Northern hemisphere with many blue place markers.
Reposted by Pooja Kathail
scienceyael.bsky.social
NEXT FRIDAY! San Francisco. I'll be there.

@standupforscience.bsky.social #StandUpforScience #SciComm #Science
San Francisco 
Stand Up for Science 2025
March 7, 2025
Civic Center Plaza
1-3pm
science is for everyone
find your local rally site and other ways to get involved
standupforscience2025.org
Reposted by Pooja Kathail
saramostafavi.bsky.social
Our new paper describing a scalable approach for training sequence-to-function models on personal genomes ("personal genome training"), includes our observations on when this works and its limitations. www.biorxiv.org/content/10.1...
Congrats: Anna, @xinmingtu.bsky.social , @lxsasse.bsky.social
A scalable approach to investigating sequence-to-expression prediction from personal genomes
A key promise of sequence-to-function (S2F) models is their ability to evaluate arbitrary sequence inputs, providing a robust framework for understanding genotype-phenotype relationships. However, despite strong performance across genomic loci , S2F models struggle with inter-individual variation. Training a model to make genotype-dependent predictions at a single locus-an approach we call personal genome training-offers a potential solution. We introduce SAGE-net, a scalable framework and software package for training and evaluating S2F models using personal genomes. Leveraging its scalability, we conduct extensive experiments on model and training hyperparameters, demonstrating that training on personal genomes improves predictions for held-out individuals. However, the model achieves this by identifying predictive variants rather than learning a cis-regulatory grammar that generalizes across loci. This failure to generalize persists across a range of hyperparameter settings. These findings highlight the need for further exploration to unlock the full potential of S2F models in decoding the regulatory grammar of personal genomes. Scalable software and infrastructure development will be critical to this progress. ### Competing Interest Statement The authors have declared no competing interest.
www.biorxiv.org
Reposted by Pooja Kathail
saorisakaue.bsky.social
📣Excited to share my last postdoc paper with
@soumya-boston.bsky.social on eQTL mechanisms depending on where the RNA is in the cell! @broadinstitute.org @harvardmed.bsky.social
TL;DR:Early RNA eQTL variants in the nucleus and late RNA eQTL variants in the cytosol have distinct molecular mechanism🧵
Reposted by Pooja Kathail
pkoo562.bsky.social
[SAVE THE DATE] MLCB 2025 is happening Sept 10-11 at the NY Genome Center in NYC!

Attend the premier conference at the intersection of ML & Bio, share your research and make lasting connections!

Submission deadline: June 1
More details: mlcb.github.io

Help spread the word—please RT! #MLCB2025
Reposted by Pooja Kathail
davidaknowles.bsky.social
#MLCB2025 will be Sept 10-11 at @nygenome.org in NYC! Paper deadline June 1st & in-person registration will open in May. Please sign up for our mailing list groups.google.com/g/mlcb/ for future announcements. More details at mlcb.github.io. Please RP!
Reposted by Pooja Kathail
amyxlu.bsky.social
1/🧬 Excited to share PLAID, our new approach for co-generating sequence and all-atom protein structures by sampling from the latent space of ESMFold. This requires only sequences during training, which unlocks more data and annotations:

bit.ly/plaid-proteins
🧵
overview of results for PLAID!
poojakathail.bsky.social
Finally, we discuss downstream applications of models to understand disease-relevant non-coding variants, such as functionally informed fine-mapping and de novo variant prioritization. 4/4
poojakathail.bsky.social
We also review variant effect prediction evaluations that have been performed to date on genomic deep learning models, highlighting strengths and limitations of current models and the need for more comprehensive evaluation. 3/4
Overview of variant effect prediction evaluations that have been
performed to date using current genomic deep learning models.
poojakathail.bsky.social
We cover two popular genomic deep learning modeling paradigms — supervised sequence-to-activity models and self-supervised genomic language models — and describe practical considerations for using both types of models to make variant effect predictions. 2/4
Schematic overview of two popular genomic deep learning modeling paradigms. Constructing variant effect predictions using genomic deep learning
models
poojakathail.bsky.social
Super excited to share our review on genomic deep learning models for non-coding variant effect prediction, with Ayesha Bajwa and Nilah Ioannidis. We’d like this review to be a useful resource, and welcome any feedback, comments, or questions! 1/4

arxiv.org/abs/2411.11158
Leveraging genomic deep learning models for non-coding variant effect prediction
The majority of genetic variants identified in genome-wide association studies of complex traits are non-coding, and characterizing their function remains an important challenge in human genetics. Gen...
arxiv.org