Predictive chemistry often struggles with scarce data. Surrogate models can help, but should we use their predicted QM descriptors or hidden embeddings? Chen & Stuyver show that hidden spaces usually win—faster, more robust, and data-efficient. pubs.rsc.org/en/content/a...
Tested across 93 functionals, errors on CYCLO70 are far larger than on BH9PERI. This dataset captures the worst-case scenarios you might encounter in screening or reactivity modeling. (3/6)
Why CYCLO70? Popular datasets like BH9 are biased toward “easy” cases. They give an overly optimistic picture of DFT accuracy. CYCLO70 is built to probe the hardest regions of the pericyclic reaction space. (2/6)
New paper from our group out in JCTC! We introduce CYCLO70 — a benchmarking set of 70 challenging cycloaddition reactions (Diels–Alder, dipolar, sigmatropic). 👉 doi.org/10.1021/acs.... (1/6)
Main conclusion of this project: we find that hidden representations extracted from surrogate models generally outperform predicted QM descriptors, particularly when descriptor selection is not tightly aligned with the downstream task
Performing a PCA for the errors across the dataset, we demonstrate not only that the errors across different functionals correlate to a significant extent, but also that functionals belonging to the same rung of Jacob’s ladder cluster together in the resulting plot (5/5)
We observe that only one functional, the range-separated hybrid ωB97M-V, reaches ”chemical accuracy” to model barriers and reaction energies; among the double hybrids, PBE-QIDH performs best, and among the hybrids, it is M06-2X and r2SCAN50 that exhibit the lowest errors (4/5)
CYCLO70 is a challenging benchmarking dataset for pericyclic reactions. Testing 93 distinct functionals, we observe that the errors on CYCLO70 are significantly bigger than those on the cycloaddition subset of BH9, the most popular benchmarking set for this reaction class (3/5)
Continuing our recent efforts in constructing more challenging/representative benchmarking datasets with the help of active learning, we present here CYCLO70 (2/5)
Overall, this hybrid ML–computational chemistry approach enables data-efficient discovery of thermally responsive DA reactions, advancing the rational design of self-healing polymers with tunable properties (5/5)
We first leverage our models to screen a comprehensive reaction space of synthetic diene-dienophile pairs, and subsequently use them to mine a database of commercially available natural products (4/5)
Refining only a small fraction of these profiles with DFT, we can train a robust ML model that predicts reaction characteristics with excellent accuracy. Adding a graph-based model to the workflow for pre-screening enables expansion to reaction spaces of 100k+ reactions (3/5)
In this work, we present a hierarchical workflow that integrates ML with automated reaction profile calculations to efficiently screen DA reaction spaces. Using our in-house TS-tools software, we first rapidly generate reaction profiles at the semi-empirical xTB level (2/5)
This is (to some extent) negotiable, and it will also depend on the experience of the retained candidate. In any case, it will be significantly higher than a typical postdoc position in France; probably in the range of €3000 and €3500 a month net (after all taxes)