Roman Bushuiev
@roman-bushuiev.bsky.social
180 followers 220 following 19 posts
🌏 https://roman-bushuiev.github.io/
Posts Media Videos Starter Packs
Reposted by Roman Bushuiev
kyobinkang.bsky.social
Yesterday @roman-bushuiev.bsky.social visited our lab. He gave a seminar talk about his recent work on DreaMS (www.nature.com/articles/s41...), and provided a hands-on training to my lab members. Looking forward to what we will discover using this awesome tool!
roman-bushuiev.bsky.social
This project would not be possible without our amazing team: @anton-bushuiev.bsky.social, Raman Samusevich, Corinna Brungs, Josef Sivic, @pluskal-lab.org based at @iocbprague.bsky.social and CIIRC CTU
roman-bushuiev.bsky.social
All models and datasets are freely available to the community under the MIT license 💚
📄 Paper: www.nature.com/articles/s41...
💻 Code: github.com/pluskal-lab/...
💡 Tutorials: dreams-docs.readthedocs.io/en/latest/
📊 Dataset: huggingface.co/datasets/rom...
roman-bushuiev.bsky.social
We also introduce the DreaMS Atlas: a large-scale molecular network built from 200 million mass spectra, providing a new way to connect and analyze thousands of distinct scientific studies.
roman-bushuiev.bsky.social
When fine-tuned, DreaMS achieves state-of-the-art performance across a range of molecule annotation tasks (e.g., predicting chemical similarity of molecular structures based on their mass spectra).
roman-bushuiev.bsky.social
Trained in a self-supervised way on tens of millions of mass spectra from diverse samples (e.g., plants, human tissues, microbes, foods, soil samples), DreaMS learns rich representations ("embeddings") of molecular structures.
roman-bushuiev.bsky.social
Mass spectrometry is a key method to discover and identify molecules in biological and environmental samples. Yet, >90% of mass spectra remain hard to interpret. In our recent paper, we present DreaMS — a foundation model to interpret mass spectra of small molecules.
www.nature.com/articles/s41...
Self-supervised learning of molecular representations from millions of tandem mass spectra using DreaMS - Nature Biotechnology
A transformer model is used to construct the DreaMS Atlas—a molecular network of 201 million MS/MS spectra.
www.nature.com
Reposted by Roman Bushuiev
pluskal-lab.org
This paper represents a great effort by @roman-bushuiev.bsky.social and his brother @anton-bushuiev.bsky.social. The DreaMS foundation model for mass spectra of small molecules now opens lots of avenues for possible downstream applications. It might be a game changer for computational metabolomics.
roman-bushuiev.bsky.social
⭐ DiffMS (Connor Coley group, MIT) predicts molecular graph edges from a set of nodes (i.e., chemical formula atoms) using discrete diffusion. Conditioning on mass spectra is achieved by incorporating the MIST encoder.
📄 arxiv.org/abs/2502.09571
roman-bushuiev.bsky.social
⭐ MADGEN (Soha Hassoun group, Tufts) retrieves molecular scaffold from a mass spectrum and a corresponding molecular formula using contrastive learning, then refines it with discrete diffusion and classifier-free guidance from mass spectrum.
📄 arxiv.org/abs/2501.01950
roman-bushuiev.bsky.social
🚀 Exciting MassSpecGym leaderboard update 🚀

Two new machine learning models achieve up to a 300% improvement in de novo molecular generation given mass spectra and corresponding chemical formulae. 🔥 1/n
Reposted by Roman Bushuiev
rdkbio.bsky.social
MassSpecGym - the first comprehensive benchmark for the discovery and identification of molecules from MS/MS data.

@roman-bushuiev.bsky.social
@anton-bushuiev.bsky.social
@josef-sivic.bsky.social
@pluskal-lab.org

NeurIPS 2024 paper: arxiv.org/abs/2410.23326

#ChemSky #MassSpec #AI4Science
iocbprague.bsky.social
🤝 In April 2024, brothers Roman and Anton Bushuiev from the teams of @pluskal-lab.org @iocbprague.bsky.social and Josef Šivic #CIIRC_CTU initiated a collaboration among 14 research institutes across the globe to benchmark #AI methods for the discovery of molecules from mass spectrometry data. 1/2
Reposted by Roman Bushuiev
iocbprague.bsky.social
🤝 In April 2024, brothers Roman and Anton Bushuiev from the teams of @pluskal-lab.org @iocbprague.bsky.social and Josef Šivic #CIIRC_CTU initiated a collaboration among 14 research institutes across the globe to benchmark #AI methods for the discovery of molecules from mass spectrometry data. 1/2
roman-bushuiev.bsky.social
Did you experiment with 1D CNN? It should be very similar to GAT in this setup up to some normalizations. Also, would GAT outperform SetTransformer with the same number of parameters/layers?
Reposted by Roman Bushuiev
polarishub.io
MassSpecGym is the largest publicly available collection of mass spectra data with 231K spectra for 29K unique molecular structures. 33% of the dataset was generated from newly measured, in-house data.

🛡️The dataset is now certified on Polaris! polarishub.io/datasets/rom...

youtu.be/G8ZnVRm0ogc
MassSpecGym: A benchmark for the discovery and identification of molecules
YouTube video by PolarisHQ
youtu.be
roman-bushuiev.bsky.social
Marcus Ludwig,Nils Haupt,Apurva Kalia,Corinna Brungs,Robin Schmid,Russell Greiner,Bo Wang,David Wishart,Li-Ping Liu,Juho Rousu,Wout Bittremieux,Hannes Rost,Tytus Mak,Soha Hassoun, @me-datapoint.bsky.social , @jjjvanderhooft.bsky.social ,Michael Stravs,Sebastian Böcker,Josef Sivic, @pluskal-lab.org
roman-bushuiev.bsky.social
This project would not have been possible without the amazing collaboration of our incredible team: @roman-bushuiev.bsky.social, @anton-bushuiev.bsky.social , Niek F. de Jonge, Adamo Young, Fleming Kretschmer, Raman Samusevich, Janne Heirman, Fei Wang, Luke Zhang, Kai Dührkop,
5/6
roman-bushuiev.bsky.social
MassSpecGym provides a user-friendly environment to develop and evaluate new models. The only required code to integrate a new model is the implementation of a forward pass! Feel free to experiment with new ideas and approaches. 3/6
roman-bushuiev.bsky.social
MassSpecGym has three main components:
- ⭐ The largest public dataset of mass spectra labeled with molecules
- ⭐ Three challenges with predefined metrics:
-💥 De novo molecule generation
-💥 Molecule retrieval
-💥 Spectrum simulation
- ⭐ A generalization-demanding data split
2/6
roman-bushuiev.bsky.social
Check out our NeurIPS 2024 spotlight poster on MassSpecGym, a dataset and benchmark for discovering new molecules from nature 🌿. If you work on generative models for graphs/molecules, plug your model into MassSpecGym and see how many molecules you can discover! 🚀 1/6
Reposted by Roman Bushuiev
polarishub.io
What are the most interesting datasets and benchmark-related work for ML in drug discovery at NeurIPS?

We’ll be at the conference doing short interviews with researchers and handing out some Polaris merch!

Here’s who we have on the shortlist. 🧵