Nicola Bordin
@nbordin.bsky.social
840 followers 350 following 46 posts
Research Fellow in Bioinformatics (Proteins + ML + Function) @ CATH University College London | Mountains, Proteins, Food in that order | MSCA Alumn | Replies to emails!
Posts Media Videos Starter Packs
Reposted by Nicola Bordin
judewells.bsky.social
It was lovely to speak at the CATH 30 symposium, celebrating 30 years of the @cathgene3d.bsky.social protein structure classification database. I was presenting recent work on our new generative protein-family language model: preprint coming soon.
nbordin.bsky.social
Packing for our first flight with our kid tomorrow. Wish us luck!

We went from 9kg of checked luggage for 2 months in Thailand to 3 checked suitcases and a pram. Send help!
Reposted by Nicola Bordin
cathgene3d.bsky.social
We have a stellar lineup of speakers!

Christine Orengo
Burkhard Rost
Janet Thornton
David Jones
Gonzalo Parra @gonzaparra.bsky.social
Sameer Velankar
Alex Bateman
Maria Martin
Rob Finn
Gerardo Tauriello
Alexey Murzin
Reposted by Nicola Bordin
cathgene3d.bsky.social
There will be talks from world leaders in structural bioinfomatics on various themes including pioneering protein language models and key international resources including: PDBe, InterPro, UniProt, MGnify, SWISS-MODEL, FrustraEvo and CATH.
Reposted by Nicola Bordin
cathgene3d.bsky.social
CATH turns 30 years old this year!

We are organising a 1-day symposium on September 16th at UCL, highlighting recent AI-based developments to enhance protein family classifications, annotations and analyses.

www.eventbrite.co.uk/e/protein-an...
Protein Annotations in the age of AI
A not-for-profit symposium hosted at UCL - more details about speakers and venue below.
www.eventbrite.co.uk
nbordin.bsky.social
Thank you David! Officially a guiri!
nbordin.bsky.social
Today I became a British citizen! 🇬🇧
nbordin.bsky.social
#ISMBECCB2025 is over! Back to London tomorrow after a science feast, a talk, and a selfie with John Jumper. Not too bad!
Reposted by Nicola Bordin
nbordin.bsky.social
Off to Liverpool for #ISMBECCB2025!

Looking forward to some awesome science and friends!
nbordin.bsky.social
Just reverted the video to explain protein folding!
Reposted by Nicola Bordin
neuralnoise.com
"in 2025 we will have flying cars" 😂😂😂
Reposted by Nicola Bordin
martinsteinegger.bsky.social
We've updated our AFESM website to now include biome filtering, allowing exploration of protein structures adapted to specific environments.
🌐 afesm.foldseek.com
Read more about the work in the skeetorial
🦋 bsky.app/profile/mart...
or our preprint
📄 www.biorxiv.org/content/10.1...
nbordin.bsky.social
Very good point! It might worth investigating. We noticed this behaviour also when we clustered TED (over 81M singletons). that analysis was done at the domain-level, not at the chain level but the clustering wasn't that strict. Here I focussed more on the downstream from the domain end of things.
nbordin.bsky.social
Explore AFESM with our website! You can search your favorite proteins from ESMatlas or AFDB using their identifiers. It's still a work in progress, with many exciting features on the way! Thanks @milot.bsky.social !
nbordin.bsky.social
However, these novel domain combinations comprise only a small fraction (0.3%) of ESM-only clusters. The remainder are mostly low-quality predictions (53%), fragments (16%), known domains with potential unknown extensions (19%), or without identifiable domains (9.3%).
nbordin.bsky.social
@yewonhan.bsky.social identified 11,941 novel multi-domain combinations!
We found membrane-associated domains (e.g., TonB dependent receptor), highlighting domain recombination rather than new folds as a driver of structural innovation in ESMatlas.
nbordin.bsky.social
ESM-only clusters contain ZERO novel folds using the TED workflow. Re-modelling discarded domains (2.3M) with ColabFold revealed 1 novel fold; unlike AFDB’s >7k novel folds, hinting at a saturating fold space or ESMfold limitations.
nbordin.bsky.social
With MGnify environmental labels, we computed the lowest common biomes per structural cluster, revealing protein adaptations unique to specific environments, especially extreme ones like hyperthermal, hypersaline, and glaciers.
nbordin.bsky.social
We annotated ESMatlas with MMseqs2 taxonomy (93% coverage) and computed the lowest common ancestors (LCA). Most LCAs are at the superkingdom level, indicating structures shared across domains. Avg. clusters/genus: Bacteria 1,557; Archaea 723; Viruses 17; Eukaryotes 2 (sampling bias).
nbordin.bsky.social
Our latest preprint is out on bioRxiv!

A collaboration between the groups of @martinsteinegger.bsky.social , David Jones and Christine Orengo, we clustered AlphaFold Database and ESMatlas, a whopping 821 million proteins!

We reveal biome-specific groups & over 11k novel domain combinations.
Metagenomic-scale analysis of the predicted protein structure universe
Protein structure prediction breakthroughs, notably AlphaFold2 and ESMfold, have led to an unprecedented influx of computationally derived structures. The AlphaFold Protein Structure Database now prov...
www.biorxiv.org
nbordin.bsky.social
Thank you Reid for bringing it to our attention! Glad that @shaunkandathil.bsky.social is on it!