Sam Blau
@samblau.bsky.social
2.2K followers 340 following 32 posts
Research scientist & computational chemist at Berkeley Lab using HT DFT workflows, machine learning, and reaction networks to model complex reactivity.
Posts Media Videos Starter Packs
Pinned
samblau.bsky.social
The Open Molecules 2025 dataset is out! With >100M gold-standard ωB97M-V/def2-TZVPD calcs of biomolecules, electrolytes, metal complexes, and small molecules, OMol is by far the largest, most diverse, and highest quality molecular DFT dataset for training MLIPs ever made 1/N
samblau.bsky.social
Come work with @emorychannano.bsky.social and me!
#robotics #nanochemistry #machinelearning #UCNPs
emorychannano.bsky.social
We have a postdoc opening w/ @samblau.bsky.social
on the autonomous synthesis of colloidal upconverting nanoparticles.

If you're looking for an exciting postdoc combining #robotics, #nanochemistry, #machinelearning, #UCNPs & simulations, see the link below!

combinano.lbl.gov/openings
Chan Group @ Molecular Foundry - openings
The Chan group welcomes inquiries for motivated, creative, and independent researchers at all levels (postdoctoral fellow, graduate students, undergraduates, and visitors), even in the absence of post...
combinano.lbl.gov
Reposted by Sam Blau
ewcspottesmith.bsky.social
Interested in learning more about our recently published OMol25 dataset and the advances that it's bringing to atomistic machine learning? Check out this talk that my boy @samblau.bsky.social gave as part of the "Modeling Talk Series".

#CompChem ⚗️ 🧪 #SciML
Modeling Talk Series - The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models
Samuel Blau, Berkeley Lab Video Recording Slides (pptx, pdf)
sites.google.com
samblau.bsky.social
I'm presenting OMol25 tomorrow 7/29 at 9 AM PST as part of a talk series at Google. Learn how we built the dataset + how MLIPs trained on OMol are revolutionizing comp chem!
Meet: lnkd.in/g4AAWkcK
YouTube Stream: lnkd.in/ggmtMtTR
Join group: lnkd.in/g5ciuNuX
samblau.bsky.social
OMol25 was calculated with ORCA. I want to acknowledge the work of the ORCA team to improve the quality of the gradient + the robustness of SCF convergence for complicated systems as part of the OMol effort - it was much appreciated and critical to ensuring that we're releasing high quality data!
faccts.de
FACCTs @faccts.de · May 15
“Built with the high-performance quantum chemistry program package ORCA (Version 6.0.1), OMol25 contains simulations of large atomic systems that, until now, have been out of reach.” - Meta

#ORCAqc #ORCA6 #CompChem #QuantumChem #ML #Meta

ai.meta.com/blog/meta-fa...

arxiv.org/abs/2505.08762
Sharing new breakthroughs and artifacts supporting molecular property prediction, language processing, and neuroscience
Meta FAIR is sharing new research artifacts that highlight our commitment to advanced machine intelligence (AMI) through focused scientific and academic progress.
ai.meta.com
Reposted by Sam Blau
berkeleylab.lbl.gov
🚨 Just dropped: Open Molecules 2025 — a record-breaking dataset co-led by Berkeley Lab + Meta FAIR.

100M+ DFT snapshots. Built to train #AI for real-world chemistry 🧪.

Could reshape discovery in batteries, drug discovery & much more! @cs.lbl.gov ⬇️
Computational Chemistry Unlocked: A Record-Breaking Dataset to Train AI Models has Launched - Berkeley Lab
Scientists will finally be able to simulate the chemistry that drives our bodies, our environment, and our technologies.
newscenter.lbl.gov
samblau.bsky.social
We can't wait to see what the community does with OMol! Don't hesitate to reach out with feedback on the data, models, or paper - we aren't going to submit to a journal until the leaderboard goes up, which means we have time to incorporate community feedback (within reason) 10/10
samblau.bsky.social
A special shout out to co-first authors Daniel Levine and Muhammed Shuaibi who moved mountains making OMol a reality. I also want to recognize the substantial and critical contributions of @ewcspottesmith.bsky.social, Michael Taylor, Muhammad Hasyim, and Kyle Michel 9/N
samblau.bsky.social
Co-leading OMol with Brandon and Larry was a joy and an honor - as was assembling a world-leading team of scientists from 2 companies, 2 national labs, and 6 universities who were excited to help build an open-source, revolutionary molecular DFT dataset to push science forward 8/N
samblau.bsky.social
Right now, OMol data has energy, forces, partial charges, partial spins, and HOMO/LUMO. But we have far more info that we still need to parse and hope to do a battery of GBW postprocessing. Plus we have 10 petabytes of electron densities. Lots more to come! 7/N
samblau.bsky.social
And check out the UMA demo (facebook-fairchem-uma-demo.hf.space UMA is trained on OMol + other FAIR Chemistry datasets) - metal complexes at +1 vs +2 correctly optimize to tetrahedral/planar and reduced ethylene carbonate correctly ring-opens while a neutral EC remains stable 6/N
Gradio
facebook-fairchem-uma-demo.hf.space
samblau.bsky.social
We're also releasing baseline models trained on OMol. To guide future MLIP development, we built novel evaluations on intermolecular interactions, conformers, and charge/spin. We hope to include frequency, ΔG, and TSopt tasks when we put up a public leaderboard in the summer 4/N
samblau.bsky.social
OMol was constructed via an unprecedented diversity of methods: MD, ML-MD, RPMD, rattling, Architector, rxn path interpolation, AFIR, optimization, and scaled separation. We also recalculated some previous datasets and did additional sampling/structure generation atop others 3/N
samblau.bsky.social
OMol covers 83 elements, a wide range of intra and intermolecular interactions, explicit solvation, reactive structures, conformers, charges -10 to 10, 0-10 unpaired electrons, and 2-350 atoms per snapshot. It required >6B CPU hrs, 10x more than any prev MLIP training dataset 2/N
samblau.bsky.social
The Open Molecules 2025 dataset is out! With >100M gold-standard ωB97M-V/def2-TZVPD calcs of biomolecules, electrolytes, metal complexes, and small molecules, OMol is by far the largest, most diverse, and highest quality molecular DFT dataset for training MLIPs ever made 1/N
samblau.bsky.social
It was a pleasure to give an IIDAI seminar on nanoparticle ML for gradient-based heterostructure optimization (w/ @emorychannano.bsky.social ) and neural network path opt for finding reaction transition states on MLIPs (w/ @thglab.bsky.social) - find the talk here: www.youtube.com/watch?v=-4jB...
IIDAI Seminar, 5/1/2025, Samuel M. Blau (Berkeley Lab)
YouTube video by Coordinated Science Laboratory
www.youtube.com
Reposted by Sam Blau
andrewrosen.bsky.social
🧠 New postdoctoral researcher position at Princeton for those interested in data science and machine learning! Specify my group if you are interested in working together. Deadline is May 31. Details: puwebp.princeton.edu/AcadHire/app...
puwebp.princeton.edu
samblau.bsky.social
Final day to submit abstracts for ACS Fall 2025! Reminder that @ewcspottesmith.bsky.social , Brett Savoie (Notre Dame), and I are organizing a symposium on "Chemical Reaction Networks, Retrosynthesis, and Reaction Prediction". Will be a mix of invited and contributed talks - please submit! #CompChem
Reposted by Sam Blau
gabegomes.bsky.social
the @gpggrp.bsky.social is at the ACS Spring 2025! come check out the works of Daniil Boiko and Rob MacKnight at the "ML + AI in Organic Chemistry" Symposium (Hall B-1, Room 4) today! extreme scaling of experimental chemical reactions via MS and an OS for autonomous comp chem!
samblau.bsky.social
Looking forward to speaking at ACS on Sunday at 5:30! Come learn about "Popcornn" - a new method for double-ended transition state optimization atop machine learned interatomic potentials that is substantially better than NEB or GSM.
samblau.bsky.social
Fantastic new work from Aditi & co that shows how to leverage the expressivity + accuracy of massive pre-trained MLIPs to distill smaller, much faster models that are still extremely accurate to drive downstream simulations - no need to compromise on speed vs accuracy!
ask1729.bsky.social
1/ Machine learning force fields are hot right now 🔥: models are getting bigger + being trained on more data. But how do we balance size, speed, and specificity? We introduce a method for doing model distillation on large-scale MLFFs into fast, specialized MLFFs! More details below:

#ICLR2025
samblau.bsky.social
Applications closing in one week! If you’re interested in a prestigious postdoc at the intersection of AI/ML and nuclear nonproliferation, don’t hesitate to apply - come work with me on fascinating f-block chemistry and computational/ML methods! (Must be a US citizen)
Reposted by Sam Blau
ewcspottesmith.bsky.social
@samblau.bsky.social, Brett Savoie (Notre Dame), and I are organizing a symposium for @amerchemsociety.bsky.social Fall 2025 called "Chemical Reaction Networks, Retrosynthesis, and Reaction Prediction" under @acscomp.bsky.social.

#reactionnetwork #CRN #retrosynthesis 🧪 ⚗️ #CompChem
Reposted by Sam Blau
chemrxivbot.bsky.social
Inverse Design of Complex Nanoparticle Heterostructures via Deep Learning on Heterogeneous Graphs

Authors: Eric Sivonxay, Lucas Attia, Evan Walter Clark Spotte-Smith, Benjamin Lengeling, Xiaojing Xia, Daniel Barter, Emory Chan, Samuel Blau
DOI: 10.26434/chemrxiv-2024-1dw4q