Sergey Ovchinnikov
@sokrypton.org
3.8K followers 470 following 130 posts
Scientist, Assistant Professor at MIT biology, #FirstGen
Posts Media Videos Starter Packs
sokrypton.org
Looks like someone has already tried to replace me with an AI agent 🫣
sokrypton.org
hmmm... any ideas why mpnn would make things worse for af2, but make things about the same as af2 when used with boltz?
Reposted by Sergey Ovchinnikov
martinpacesa.bsky.social
Exciting to see our protein binder design pipeline BindCraft published in its final form in @Nature ! This has been an amazing collaborative effort with Lennart, Christian, @sokrypton.org, Bruno and many other amazing lab members and collaborators.

www.nature.com/articles/s41...
Reposted by Sergey Ovchinnikov
maxfus.bsky.social
Now that OpenCRISPR is in nature and rekindled the 'what's-a-novel-sequence' debate, I'm happy to share an app to check this, which I built for fun some time ago.

fuerstlab.shinyapps.io/SeqNovelty/

quick 🧵
sokrypton.org
Yeah, we would expect the pseudo-likelihood to be maximized for best paired MSA!
sokrypton.org
We'll add an option to add custom MSA inputs to the notebook (later tonight). 😎
Reposted by Sergey Ovchinnikov
martinsteinegger.bsky.social
MMseqs2 v18 is out
- SIMD FW/BW alignment (preprint soon!)
- Sub. Mat. λ calculator by Eric Dawson
- Faster ARM SW by Alexander Nesterovskiy
- MSA-Pairformer’s proximity-based pairing for multimer prediction (www.biorxiv.org/content/10.1...; avail. in ColabFold API)
💾 github.com/soedinglab/M... & 🐍
sokrypton.org
2Y69 is technically "prokaryotic" as it's from the mitochondria. 🧐
sokrypton.org
We find this to be true across a number of targets. Where method used to pair sequences and filter them makes a big difference. This we find to be important when trying to disentangle paralogs from orthologs (4/4).
sokrypton.org
The big difference is in the pairing. The MMseqs2 server pairs sequences based on species, while our old HHblits MSAs were paired based on genome proximity (number of genes apart). Working w/ @milot.bsky.social and @martinsteinegger.bsky.social we implemented the proximity filtering in server (3/4)
sokrypton.org
Side story: While working on the Google Colab notebook for MSA pairformer. We encountered a problem: The MMseqs2 ColabFold MSA did not show any contacts at protein interfaces, while our old HHblits alignments showed clear contacts 🫥... (2/4)
sokrypton.org
Excited to re-share work from
@yoakiyama.bsky.social and Zhidian Zhang on MSA pairformer. (1/4)
sokrypton.org
Not sure, I was referring to the earlier observation that alphafold adjust it's confidence when additional proteins are provided. Was making a point that this is expected, and just math 🙃
sokrypton.org
At bare minimum, the author should try, during inference of their own model, exclude the extra context that isn't present for gLM and see how it performs. Then I'll be convinced it's potentially learning something different.
sokrypton.org
I guess it is possible that including the full genome, the method could pick up on some key protein(s), founds across all genomes. Which would be a better classifier of phylogenetic relationships and use that for some downstream process that a smaller context of species-specific operon might miss...
sokrypton.org
As it stands, it's not clear if it's actually learning anything different from gLM. For bacteria, I'm not too convinced there's context beyond +/- 15 genes for any given gene.
sokrypton.org
the authors say "It was included in a previous version but was lost unfortunately in the editing process. " 🫠
Reposted by Sergey Ovchinnikov
acritschristoph.bsky.social
looks cool but they should really have cited and compared to gLM and gLM2, which are very conceptually similar:
www.nature.com/articles/s41...
www.biorxiv.org/content/10.1...

I'll leave a biorxiv comment for the authors. It's hard to find all prior literature but this one is kinda an oof.
sokrypton.org
something that was [x,y]/sum([x,y]) is now [x,y,z]/sum([x,y,z]). if z is higher, x & y will have lower confidence. (2/2)
sokrypton.org
It's all just math... softmax() more specifically (aka "attention" or partition function). The assumption of alphafold is that everything you provide is interacting/folded. As you add more context, the softmax is over larger number of possibilities. (1/2)
sokrypton.org
We are still far from being able to go from single sequence input to single static structure, from first principles (no evolutionary input), let alone modeling protein folding, ensembles or dynamics.