Olivier Grisel
banner
ogrisel.bsky.social
Olivier Grisel
@ogrisel.bsky.social
Software engineer at probabl, scikit-learn contributor.

Also at:

https://sigmoid.social/@ogrisel
https://github.com/ogrisel
Reposted by Olivier Grisel
LeJEPA: a novel pretraining paradigm free of the (many) heuristics we relied on (stop-grad, teacher, ...)

- 60+ arch., up to 2B params
- 10+ datasets
- in-domain training (>DINOv3)
- corr(train loss, test perf)=95%
November 13, 2025 at 6:19 AM
Reposted by Olivier Grisel
The Python Software Foundation was recommended for a $1.5M grant from the National Science Foundation. The terms of the award said PSF could not work on DEI, whether or not the grant funding was used for it.

PSF therefore declined the funding.

Science suffers, but commitment to core values remains
TLDR; The PSF has made the decision to put our community and our shared diversity, equity, and inclusion values ahead of seeking $1.5M in new revenue. Please read and share. pyfound.blogspot.com/2025/10/NSF-...
🧵
The official home of the Python Programming Language
www.python.org
October 27, 2025 at 5:31 PM
I will speak about probabilistic regressions, @skrub-data.bsky.social and skore contributors will also present their libraries. Come join us!
A bunch of scikit-learn core contributors will attend or speak at @pydataparis.bsky.social 2025 on Tuesday and Wednesday next week.

Ticketing, practical infos and schedule at: pydata.org/paris2025
PyData Paris 2025
pydata.org
September 26, 2025 at 8:55 AM
scikit-learn 1.8 will be the first scikit-learn release with native extensions that are officially marked as free-threading compatible.

github.com/scikit-learn...
MNT Mark cython extensions as free-threaded compatible by lesteve · Pull Request #31342 · scikit-learn/scikit-learn
Part of #30007 Cython 3.1 has been released on May 8 2025. Following scipy PR scipy/scipy#22658 to use -Xfreethreading_compatible=True cython argument if cython >= 3.1 This cleans up the lock-fi...
github.com
September 2, 2025 at 4:51 PM
Reposted by Olivier Grisel
We’re happy to announce our Social Event, taking place on Tuesday 30th September at 6pm at the Cité des sciences. A perfect opportunity to unwind and connect with fellow attendees after a day of interesting talks!

pydata.org/paris2025/so...
pydata.org/paris2025/ti...
August 28, 2025 at 12:55 PM
Looking forward to attending PyData Paris 2025! I will give a talk about probabilistic predictions for regression problems (I need to start working on my slides ;)
📢 Talk Announcement

"Probabilistic regression models: let's compare different modeling strategies and discuss how to evaluate them", by @ogrisel.bsky.social from @probabl.ai .

📜 pretalx.com/pydata-paris-2025/talk/DVMZBT
📅 pydata.org/paris2025/schedule
🎟 pydata.org/paris2025/tickets
August 28, 2025 at 7:33 AM
Reposted by Olivier Grisel
👋 JupyterLab and Jupyter Notebook users:

What's one thing you'd love to see improved in JupyterLab, Jupyter Notebook, or JupyterLite?

The team is prepping the upcoming 4.5/7.5 releases and wants to tackle some usability issues.

Drop your feedback below, this will help prioritize what gets fixed!👇
August 19, 2025 at 9:43 AM
Today at #EuroScipy2025, @glemaitre58.bsky.social and I presented a tutorial on pitfalls of machine learning for imbalanced classification problems.

We discussed what (not) to do when fitting a classifier and obtaining degenerate precision or recall values.

probabl-ai.github.io/calibration-...
Imbalanced classification: pitfalls and solutions — Probabilistic calibration of cost-sensitive learning
probabl-ai.github.io
August 19, 2025 at 11:58 AM
Attending the @skrub-data.bsky.social tutorial by @riccardocappuzzo.com and @glemaitre58.bsky.social at #EuroScipy2025. They introduce the new DataOps feature released in skrub 0.6.

Here is the repo with the material for the tutorial: github.com/skrub-data/E...
August 18, 2025 at 9:08 AM
Reposted by Olivier Grisel
🚨What is SOTA on tabular data, really? We are excited to announce 𝗧𝗮𝗯𝗔𝗿𝗲𝗻𝗮, a living benchmark for machine learning on IID tabular data with:

📊 an online leaderboard (submit!)
📑 carefully curated datasets
📈 strong tree-based, deep learning, and foundation models

🧵
June 23, 2025 at 10:15 AM
Reposted by Olivier Grisel
👨‍🎓🧾✨#icml2025 Paper: TabICL, A Tabular Foundation Model for In-Context Learning on Large Data
With Jingang Qu, @dholzmueller.bsky.social, and Marine Le Morvan

TL;DR: a well-designed architecture and pretraining gives best tabular learner, and more scalable
On top, it's 100% open source
1/9
July 9, 2025 at 6:42 PM
Reposted by Olivier Grisel
Excited to have co-contributed the SquashingScaler, which implements the robust numerical preprocessing from RealMLP!
⚡ Release 0.6.0 is now out! ⚡

🚀 Major update! Skrub DataOps, various improvements for the TableReport, new tools for applying transformers to the columns, and a new robust transformer for numerical features are only some of the features included in this release.
July 24, 2025 at 4:00 PM
Reposted by Olivier Grisel
I got 3rd out of 691 in a tabular kaggle competition – with only neural networks! 🥉

My solution is short (48 LOC) and relatively general-purpose – I used skrub to preprocess string and date columns, and pytabkit to create an ensemble of RealMLP and TabM models. Link below👇
July 29, 2025 at 11:10 AM
Reposted by Olivier Grisel
We are excited to announce that the schedule for PyData Paris 2025 has been released! Discover the incredible lineup of speakers we have in store for you.

Don't miss out on this chance to learn, connect, and grow. Save the date and get your ticket!

pydata.org/paris2025/sc...
Schedule — PyData Paris 2025
pydata.org
July 15, 2025 at 1:14 PM
Reposted by Olivier Grisel
Then why does flow matching generalize?? Because it fails!

The inductive bias of the neural network prevents from perfectly learning u* and overfitting.

In particular neural networks fail to learn the velocity field for two particular time values.

See the paper for a finer analysis 😀
June 18, 2025 at 8:15 AM
Reposted by Olivier Grisel
📢 Last Chance!

Don't miss out! Early bird tickets are available only until the end of the day. Grab yours now and save!

pydata.org/paris2025/ti...
Tickets — PyData Paris 2025
Cité des Sciences
pydata.org
June 8, 2025 at 7:01 AM
Reposted by Olivier Grisel
📢 Time is Running Out!

Only a few days left to grab your early bird tickets! Don't wait—this special offer ends Sunday at the end of the day. Secure yours now!

pydata.org/paris2025/ti...
Tickets — PyData Paris 2025
pydata.org
June 5, 2025 at 2:05 PM
Reposted by Olivier Grisel
🚨 Just 10 Days Left! 🚨
The countdown is on — only 10 days left to grab your early bird tickets for our 2025 edition! Lock in your spot at the best price before rates go up.

pydata.org/paris2025/ti...
Tickets — PyData Paris 2025
pydata.org
May 21, 2025 at 1:55 PM
Reposted by Olivier Grisel
The week of September 29th, Paris will become the epicenter of #opensource scientific computing, with a great series of events. This rare alignment creates the perfect opportunity to visit and join a vibrant community of developers, maintainers, and users!
Check this out (links in thread) ⬇️
May 16, 2025 at 2:00 PM
Reposted by Olivier Grisel
There were layoffs at MS yesterday and 3 #Python core devs from the Faster CPython team were caught in them. If you know of any jobs, please send them their way:

Eric Snow: www.linkedin.com/in/ericsnowc...
Irit Katriel: www.linkedin.com/in/irit-katr...
Mark Shannon: www.linkedin.com/in/mark-shan...
May 14, 2025 at 9:14 PM
Reposted by Olivier Grisel
The ‘What’s New’ page for Python 3.14 now includes updates on the free-threaded (GIL-disabled) build!

docs.python.org/3.14/whatsne...
What’s new in Python 3.14
Editor, Hugo van Kemenade,. This article explains the new features in Python 3.14, compared to 3.13. For full details, see the changelog. Summary – release highlights: Python 3.14 beta is the pre-r...
docs.python.org
May 10, 2025 at 1:23 PM
Reposted by Olivier Grisel
I don't get any benefit from citations but an inline or footnote link is always incredibly appreciated. If you are limited to bibliography citations then the standard website format is great,

"Betancourt, Michael (YEAR ACCESSED). CHAPTER NAME. Retrieved from GITHUB LINK, commit COMMIT NUMBER"
What is the right way to cite articles in the Writing section of your website @betanalpha.bsky.social?
April 30, 2025 at 1:53 PM
Reposted by Olivier Grisel
#GitHub mini tip:

Use the GitHub CLI command `gh issue develop -c` to create and checkout a local branch linked to the given issue. No need to name the branch, and when you later run `gh pr create`, the pull request will be linked to the issue automatically.
April 20, 2025 at 9:04 AM