Pascal Notin
@pascalnotin.bsky.social
560 followers 74 following 13 posts
Research in AI for Protein Design @Harvard | Prev. CS PhD @UniofOxford, Maths & Physics @Polytechnique
Posts Media Videos Starter Packs
pascalnotin.bsky.social
Congratulations to the entire RNAGym team @rohitarorayyc.bsky.social @murfalo.bsky.social @christianchoe.bsky.social @cshearer.bsky.social Aaron Kollasch, Fiona Qu, Ruben Weitzman, Artem Gazizov, @sarahgurev.bsky.social Erik Xie @deboramarks.bsky.social
8/9
pascalnotin.bsky.social
The moderate performance across all tasks reveals exciting opportunities! Key directions: RNA-specific training data, integrating structure-function relationships, and improving non-canonical base pair prediction. RNAGym provides the standardized foundation for progress.
7/9
pascalnotin.bsky.social
🌀 Tertiary structure: 215 diverse 3D structures from the PDB. NuFold leads monomers (0.393 TM-score), AlphaFold3 dominates complexes (0.381 TM-score). Non-Watson-Crick interactions remain a major challenge for all methods
6/9
pascalnotin.bsky.social
🔗 Secondary structure: 901k chemical mapping profiles using DMS & 2A3 reactivity. EternaFold achieves top performance (0.656 F1-score), closely followed by CONTRAfold & Vienna. Traditional thermodynamic methods are still competitive with newer deep learning approaches
5/9
pascalnotin.bsky.social
🔬 Fitness prediction: 70 assays across tRNA, ribozymes, aptamers & mRNAs (1M+ mutations total). Evo 2 performs best overall (0.276), but performance varies dramatically by RNA type: RNA-FM excels at tRNA/aptamers while Evo 2 leads mRNA tasks. Lots of room for improvement across the board!
4/9
pascalnotin.bsky.social
RNAGym tackles three essential RNA prediction tasks: 🔬 Fitness prediction: How mutations affect RNA function 🔗 Secondary structure: Base-pairing patterns 🌀 Tertiary structure: 3D molecular architecture
All evaluated zero-shot to test true generalization!
3/9
pascalnotin.bsky.social
Why do we need this? RNA modeling faces major challenges: limited experimental data (<1% of PDB entries), inherently less stable structures than proteins, and evaluation has been scattered across different studies with varying approaches.
2/9
pascalnotin.bsky.social
🚨 New paper 🚨 RNA modeling just got its own Gym! 🏋️ Introducing RNAGym, large-scale benchmarks for RNA fitness and structure prediction.
🧵 1/9
Reposted by Pascal Notin
Reposted by Pascal Notin
isabellease.bsky.social
Pascal Notin at #VariantEffect25
pascalnotin.bsky.social
But more broadly I wanted to convey in the blog that the two (structure + MSA) are critical for proper functional protein design & effects prediction
pascalnotin.bsky.social
Thank you @delalamo.xyz! Understand where you are coming from re: design. For some design setups structure is critical -- here my point was more for a directed evolution setup where you have to select top mutants that go in the next round
pascalnotin.bsky.social
Even simple methods leveraging these 2 modalities significantly outperform billion-parameter sequence-only models. So, what's next? Better retrieval, advanced multimodal approaches, & alignment. Read more: pascalnotin.substack.com/p/have-we-hi... #BioTech #AI #pLMs
Have We Hit the Scaling Wall for Protein Language Models?
Beyond Scaling: What Truly Works in Protein Fitness Prediction
pascalnotin.substack.com
pascalnotin.bsky.social
Have we hit a "scaling wall" for protein language models? 🤔 Our latest ProteinGym v1.3 release suggests that for zero-shot fitness prediction, simply making pLMs bigger isn't better beyond 1-4B parameters. The winning strategy? Combining MSAs & structure in multimodal models!
Reposted by Pascal Notin
biorxiv-biophys.bsky.social
Large-scale discovery, analysis, and design of protein energy landscapes https://www.biorxiv.org/content/10.1101/2025.03.20.644235v1