Lightnews — Scholar-powered news

Reposted by Amy Lu

Martin Steinegger 🇺🇦 @martinsteinegger.bsky.social · Jan 26

Just coincidentally found GenBank Release 84.0 from 1994 in the neighboring lab. Anyone out there with an even older version?

6 13 80

Amy Lu @amyxlu.bsky.social · Dec 28

In case you missed our ML for proteins seminar on CHEAP compression for protein embeddings back in October, here it is -- thanks @megthescientist.bsky.social for doing so much for the MLxProteins community 🫶

Meg T (she/her/hers) @megthescientist.bsky.social · Nov 29

&& superstar @amyxlu.bsky.social (and prev. Co-organizer of @ml4proteins.bsky.social) CHEAP talk online now!

youtu.be/7XnROkjo5Vg?...

Tokenized and Continuous Embedding Compressions of Protein Sequence and Structure (CHEAP)

YouTube video by ML for protein engineering seminar series

youtu.be

2 1 15

Reposted by Amy Lu

Meg T (she/her/hers) @megthescientist.bsky.social · Dec 16

•introduced “zero shot prediction” as a question of guessing a bioassay’s outcome by likelihoods of pLMs
•commented on biases in evolutionary signals from Tree of life used to train pLMs (a favorite paper I read in 2024: shorturl.at/fbC7g)

1 1 8

Amy Lu @amyxlu.bsky.social · Dec 15

Thanks @workshopmlsb.bsky.social for letting us share our work!

🔗📄 bit.ly/plaid-proteins

1 2 21

Amy Lu @amyxlu.bsky.social · Dec 10

Another straightforward application is generation, either by next-token sampling or MaskGIT style denoising. We made the tokenized version of CHEAP to do generation, and decided to go with diffusion on continuous embeddings instead — but I think either would’ve worked

6

Reposted by Amy Lu

Kevin K. Yang 楊凱筌 @kevinkaichuang.bsky.social · Dec 9

We trained a model to co-generate protein sequence and structure by working in the ESMFold latent space, which encodes both. PLAID only requires sequences for training but generates all-atom structures!

Really proud of @amyxlu.bsky.social 's effort leading this project end-to-end!