Junhao (Bear) Xiong
@junhaobearxiong.bsky.social
10 followers 15 following 11 posts
Machine learning for computational biology. PhD student at Berkeley EECS.
Posts Media Videos Starter Packs
Pinned
junhaobearxiong.bsky.social
Guide your favorite protein generative model with experimental data? Meet ProteinGuide - a method to condition pre-trained models on properties without retraining. We validated it both in silico by guiding ProteinMPNN and ESM3 on 3 tasks and in vitro by engineering base editors.
junhaobearxiong.bsky.social
On a personal note, it is at once surreal, gratifying and humbling to be part of a wet-dry colab, I’m so grateful for my collaborators (also great friends) for making it real + keeping it fun! Also thankful our buildings (BAIR and @innovativegenomics.bsky.social) are right next to each other :)
junhaobearxiong.bsky.social
This work is only made possible through an incredible interdisciplinary collaboration between the Listgarten lab and @savagecatsonly.bsky.social . All kudos go to the amazing team that I’m super grateful to be part of: @hnisonoff.bsky.social @marialukarska.bsky.social (and Ishan and Luke)
junhaobearxiong.bsky.social
The guided library in round 2 showed significantly higher activity than the initial unguided library in the experimental base editing assay.
junhaobearxiong.bsky.social
We didn't just validate in silico - we also synthesized & tested proteins in the lab. We used ProteinGuide to engineer an adenine base editor for high activity: generated 2,000 variants → tested in bacteria → used results to guide 2,000 new designs.
junhaobearxiong.bsky.social
In our third task, we demonstrate the generality of ProteinGuide beyond amino acid sequences, to structure tokens. In particular, we guide ESM3 to generate backbone structures (as tokens) with specified CATH fold class labels.
junhaobearxiong.bsky.social
In our second task, we guided ESM3 to re-design enzymes sequences predicted to belong to specific enzyme classes, based on a published classifier, CLEAN, for enzyme commission number.
junhaobearxiong.bsky.social
In our first task, we guided ProteinMPNN with experimental stability measurements from the @grocklin.bsky.social lab to generate amino acid sequences encoding proteins that are more stable than what ProteinMPNN would do on its own.
junhaobearxiong.bsky.social
To illustrate the potential of ProteinGuide, we applied it, in silico, to three tasks, using two representative, well-known protein generative models, ProteinMPNN and ESM3. Across these three tasks, we observed that guidance, as expected, led to the desired outcome.
junhaobearxiong.bsky.social
We leverage the fact that MLMs (e.g., ESM3), OA-AR models (e.g., ProteinMPNN), and masking-based diffusion models are actually equivalent. This allows us to leverage our previously-developed guidance methodology for discrete diffusion and flow models for MLMs and OA-AR models.
junhaobearxiong.bsky.social
Guide your favorite protein generative model with experimental data? Meet ProteinGuide - a method to condition pre-trained models on properties without retraining. We validated it both in silico by guiding ProteinMPNN and ESM3 on 3 tasks and in vitro by engineering base editors.