Lightnews — Scholar-powered news

Sergey Ovchinnikov @sokrypton.org · 27d

Maybe something akin to paper2agent:
huggingface.co/papers/2509....

Paper page - Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI Agents

Join the discussion on this paper page

huggingface.co

1

Sergey Ovchinnikov @sokrypton.org · 27d

Looks like someone has already tried to replace me with an AI agent 🫣

1 19

Sergey Ovchinnikov @sokrypton.org · Sep 2

hmmm... any ideas why mpnn would make things worse for af2, but make things about the same as af2 when used with boltz?

1 1

Reposted by Sergey Ovchinnikov

Martin Pacesa @martinpacesa.bsky.social · Aug 27

Exciting to see our protein binder design pipeline BindCraft published in its final form in @Nature ! This has been an amazing collaborative effort with Lennart, Christian, @sokrypton.org, Bruno and many other amazing lab members and collaborators.

www.nature.com/articles/s41...

14 110 300

Reposted by Sergey Ovchinnikov

Max Fürst @maxfus.bsky.social · Aug 1

Now that OpenCRISPR is in nature and rekindled the 'what's-a-novel-sequence' debate, I'm happy to share an app to check this, which I built for fun some time ago.

fuerstlab.shinyapps.io/SeqNovelty/

quick 🧵

1 10 17

Sergey Ovchinnikov @sokrypton.org · Aug 6

Yeah, we would expect the pseudo-likelihood to be maximized for best paired MSA!

2

Sergey Ovchinnikov @sokrypton.org · Aug 5

We'll add an option to add custom MSA inputs to the notebook (later tonight). 😎

2

Reposted by Sergey Ovchinnikov

Martin Steinegger 🇺🇦 @martinsteinegger.bsky.social · Aug 5

MMseqs2 v18 is out
- SIMD FW/BW alignment (preprint soon!)
- Sub. Mat. λ calculator by Eric Dawson
- Faster ARM SW by Alexander Nesterovskiy
- MSA-Pairformer’s proximity-based pairing for multimer prediction (www.biorxiv.org/content/10.1...; avail. in ColabFold API)
💾 github.com/soedinglab/M... & 🐍

17 62

Sergey Ovchinnikov @sokrypton.org · Aug 5

It was expected. The surprising part for me was that AlphaFold doesn't seem to care....

See study here from @lindseyguan.bsky.social
www.biorxiv.org/content/10.1...

This makes me wonder if the reason it doesn't seem to care is because it was trained on and evaluated on poorly paired MSAs. 🤔

How AlphaFold and related models predict protein-peptide complex structures

Protein-peptide interactions mediate many biological processes, and access to accurate structural models, through experimental determination or reliable computational prediction, is essential for unde...

www.biorxiv.org

4

Sergey Ovchinnikov @sokrypton.org · Aug 5

2Y69 is technically "prokaryotic" as it's from the mitochondria. 🧐

1 1

Sergey Ovchinnikov @sokrypton.org · Aug 5

We find this to be true across a number of targets. Where method used to pair sequences and filter them makes a big difference. This we find to be important when trying to disentangle paralogs from orthologs (4/4).

2 6

Sergey Ovchinnikov @sokrypton.org · Aug 5

The big difference is in the pairing. The MMseqs2 server pairs sequences based on species, while our old HHblits MSAs were paired based on genome proximity (number of genes apart). Working w/ @milot.bsky.social and @martinsteinegger.bsky.social we implemented the proximity filtering in server (3/4)

1 3 11

Sergey Ovchinnikov @sokrypton.org · Aug 5

Side story: While working on the Google Colab notebook for MSA pairformer. We encountered a problem: The MMseqs2 ColabFold MSA did not show any contacts at protein interfaces, while our old HHblits alignments showed clear contacts 🫥... (2/4)

1 3 13

Sergey Ovchinnikov @sokrypton.org · Aug 5

Excited to re-share work from
@yoakiyama.bsky.social and Zhidian Zhang on MSA pairformer. (1/4)

1 5 25

Reposted by Sergey Ovchinnikov

Yo Akiyama @yoakiyama.bsky.social · Aug 5

Excited to share work with
Zhidian Zhang, @milot.bsky.social, @martinsteinegger.bsky.social, and @sokrypton.org
biorxiv.org/content/10.1...
TLDR: We introduce MSA Pairformer, a 111M parameter protein language model that challenges the scaling paradigm in self-supervised protein language modeling🧵

Scaling down protein language modeling with MSA Pairformer

Recent efforts in protein language modeling have focused on scaling single-sequence models and their training data, requiring vast compute resources that limit accessibility. Although models that use ...

biorxiv.org

1 43 95

Sergey Ovchinnikov @sokrypton.org · Aug 3

Not sure, I was referring to the earlier observation that alphafold adjust it's confidence when additional proteins are provided. Was making a point that this is expected, and just math 🙃

2

Sergey Ovchinnikov @sokrypton.org · Aug 2

At bare minimum, the author should try, during inference of their own model, exclude the extra context that isn't present for gLM and see how it performs. Then I'll be convinced it's potentially learning something different.

2

Sergey Ovchinnikov @sokrypton.org · Aug 2

I guess it is possible that including the full genome, the method could pick up on some key protein(s), founds across all genomes. Which would be a better classifier of phylogenetic relationships and use that for some downstream process that a smaller context of species-specific operon might miss...

1 3

Sergey Ovchinnikov @sokrypton.org · Aug 2

As it stands, it's not clear if it's actually learning anything different from gLM. For bacteria, I'm not too convinced there's context beyond +/- 15 genes for any given gene.

1 2

Sergey Ovchinnikov @sokrypton.org · Aug 2

the authors say "It was included in a previous version but was lost unfortunately in the editing process. " 🫠

1 4

Reposted by Sergey Ovchinnikov

Alex Crits-Christoph @acritschristoph.bsky.social · Aug 2

looks cool but they should really have cited and compared to gLM and gLM2, which are very conceptually similar:
www.nature.com/articles/s41...
www.biorxiv.org/content/10.1...

I'll leave a biorxiv comment for the authors. It's hard to find all prior literature but this one is kinda an oof.

2 3 12

Sergey Ovchinnikov @sokrypton.org · Aug 2

something that was [x,y]/sum([x,y]) is now [x,y,z]/sum([x,y,z]). if z is higher, x & y will have lower confidence. (2/2)

1 2

Sergey Ovchinnikov @sokrypton.org · Aug 2

It's all just math... softmax() more specifically (aka "attention" or partition function). The assumption of alphafold is that everything you provide is interacting/folded. As you add more context, the softmax is over larger number of possibilities. (1/2)

1 1 5

Sergey Ovchinnikov @sokrypton.org · Jul 22

PSA = Google Colab Pro free for one year for academic use (US only):
blog.google/outreach-ini...

New Google Colab features for higher education

Google Colab offers free Colab Pro for students, interactive slideshows and AI controls in notebooks.

blog.google

7 21

Sergey Ovchinnikov @sokrypton.org · Jul 14

We are still far from being able to go from single sequence input to single static structure, from first principles (no evolutionary input), let alone modeling protein folding, ensembles or dynamics.

4