Sarah Gurev
@sarahgurev.bsky.social
89 followers 190 following 13 posts
Postdoc @ Debbie Marks Lab, Harvard | Prev. PhD @ MIT EECS || ML for Proteins + Viruses 🦠
Posts Media Videos Starter Packs
Reposted by Sarah Gurev
slavov-n.bsky.social
Large AI models are reported to achieve high accuracy (AUROC) predicting pathogenic variants across the genome.

A preprint reports that the predictions are based on splice variants. Using only this info (no sequences, no AI) achieves AUROC=0.944 across noncoding variants.

1/2
Reposted by Sarah Gurev
erinedoherty.bsky.social
Excited to share our new preprint co-led by @jnoms.bsky.social!

Here we reveal an exceptional diversity of viral 2H phosphodiesterases (PDEs) that enable immune evasion by selectively degrading oligonucleotide-based messengers. This 2H PDE fold has evolved striking substrate breath & specificity.
Divergent viral phosphodiesterases for immune signaling evasion
Cyclic dinucleotides (CDNs) and other short oligonucleotides play fundamental roles in immune system activation in organisms ranging from bacteria to humans. In response, viruses use phosphodiesterase...
www.biorxiv.org
sarahgurev.bsky.social
🙏Amazing collaboration co-led with Noor Youssef
and Navami Jain, @deboramarks.bsky.social, and our funders @cepi.net!
11/12
sarahgurev.bsky.social
This matters for:
⚠️ Future-proof vaccine and therapeutics design
⚠️ Monitoring of high-pandemic risk viruses
⚠️ Dual-use biosecurity risk assessment

Without reliable models, we risk underestimating viral evolution—and overestimating our ability to counter it.
10/12
sarahgurev.bsky.social
EVEREST highlights:
✅ Where models fail—and why
✅ Which viruses are least/most predictable
✅ How to estimate per-protein, model-specific reliability
✅ Concrete steps to improve ML for viral mutation prediction
9/12
sarahgurev.bsky.social
🌍Current models fail to reliably predict mutations in more than half of the high-priority viruses identified by the WHO.
8/12
sarahgurev.bsky.social
💪Is bigger always better? Maybe not for other taxa but for viruses - yes! For viruses, models continue to improve with increased numbers of parameters.
7/12
sarahgurev.bsky.social
🤏Why? Viruses are severely underrepresented in training datasets (<1%) and are further downsampled after common clustering approaches.
6/12
sarahgurev.bsky.social
📉Despite the hype, protein language models trained across the “protein universe” are outperformed by even the simplest, site-independent alignment-based model.
5/12
sarahgurev.bsky.social
💭Imagine: It’s Day 0 of an outbreak and there’s little experiment data. Computational mutational effect predictions could provide valuable information…if we could trust them. Can we?

EVEREST doesn’t just assess performance. It also quantifies reliability for new viruses.
4/12
sarahgurev.bsky.social
🚀To find out, we built EVEREST: Evolutionary Variant Effect prediction with Reliability ESTimation.

We benchmark models across 45 viral deep mutational scanning datasets spanning >340,000 mutations.
3/12
sarahgurev.bsky.social
🦠 Protein language models (PLMs) have shown impressive performance in predicting mutation effects. But... viruses are a different beast.

They evolve fast, cross species, and are under pressure from host immunity. Do PLMs still work here?
2/12
sarahgurev.bsky.social
🚨New paper 🚨

Can protein language models help us fight viral outbreaks? Not yet. Here’s why 🧵👇
1/12
Reposted by Sarah Gurev
arambaut.bsky.social
Some great new features and updates from the awesome Pathoplexus project. This is a new open pathogen genome database that can provide access to your sequences under a use-restricted license but also feed directly in to INSDC (EBI, Genbank etc) when you are ready. pathoplexus.org/news/2025-07...
Pathoplexus | Pathoplexus July Update
Pathoplexus is a new, open-source database dedicated to the efficient sharing of human viral pathogen genomic data, fostering global collaboration and public health response.
pathoplexus.org
Reposted by Sarah Gurev
pascalnotin.bsky.social
🚨 New paper 🚨 RNA modeling just got its own Gym! 🏋️ Introducing RNAGym, large-scale benchmarks for RNA fitness and structure prediction.
🧵 1/9
Reposted by Sarah Gurev
Reposted by Sarah Gurev
jnoms.bsky.social
Hello everyone! I am pleased to share information on the first ever Computational Structural Virology Symposium, conducted August 4th on zoom and highlighting work in this emerging field. You can register for this event here: forms.gle/CNiqskMwQEuV.... Please re-post!