Sebastian Bordt
@sbordt.bsky.social
470 followers 250 following 66 posts
Language models and interpretable machine learning. Postdoc @ Uni Tübingen. https://sbordt.github.io/
Posts Media Videos Starter Packs
Pinned
sbordt.bsky.social
Have you ever wondered whether a few times of data contamination really lead to benchmark overfitting?🤔 Then our latest #ICML paper about the effect of data contamination on LLM evals might be for you!🚀

Paper: arxiv.org/abs/2410.03249
👇🧵
Reposted by Sebastian Bordt
ulrikeluxburg.bsky.social
I am hiring PhD students and/or Postdocs, to work on the theory of explainable machine learning. Please apply through Ellis or IMPRS, deadlines end october/mid november. In particular: Women, where are you? Our community needs you!!!

imprs.is.mpg.de/application
ellis.eu/news/ellis-p...
Reposted by Sebastian Bordt
We need new rules for publishing AI-generated research. The teams developing automated AI scientists have customarily submitted their papers to standard refereed venues (journals and conferences) and to arXiv. Often, acceptance has been treated as the dependent variable. 1/
Reposted by Sebastian Bordt
sbordt.bsky.social
I dont know if it's a good point to start, but you might want to take a look at the works by Daron Acemoglu and Pascual Restrepo pascual.scripts.mit.edu/research/
Pascual Restrepo - Research
Pascual Restrepo Official Website. Economist, MIT.
pascual.scripts.mit.edu
Reposted by Sebastian Bordt
sbordt.bsky.social
I see the point of the original post, but I think it's also important to keep in mind this other aspect.
Reposted by Sebastian Bordt
gracekind.net
Grace @gracekind.net · Jul 19
The stochastic parrot is now an IMO gold medalist parrot
sbordt.bsky.social
Wednesday: Position: Rethinking Explainable Machine Learning as Applied Statistics icml.cc/virtual/2025...
sbordt.bsky.social
I'm at #ICML in Vancouver this week, hit me up if you want to chat about pre-training experiments or explainable machine learning.

You can find me at these posters:

Tuesday: How Much Can We Forget about Data Contamination? icml.cc/virtual/2025...
sbordt.bsky.social
Great to hear that you like it, and thank you for the feedback! I agree that stakeholders are important, although you are not going to find much about it in this paper. We might argue, though, that similar aspects with stakeholders arise in data science with large datasets, hence the analogy :)
Reposted by Sebastian Bordt
ulrikeluxburg.bsky.social
Our #ICML position paper: #XAI is similar to applied statistics: it uses summary statistics in an attempt to answer real world questions. But authors need to state how concretely (!) their XAI statistics contributes to answer which concrete (!) question!
arxiv.org/abs/2402.02870
sbordt.bsky.social
During the last couple of years, we have read a lot of papers on explainability and often felt that something was fundamentally missing🤔

This led us to write a position paper (accepted at #ICML2025) that attempts to identify the problem and to propose a solution.

arxiv.org/abs/2402.02870
👇🧵
sbordt.bsky.social
There are many more interesting aspects to this, so take a look at our paper!

arxiv.org/abs/2402.02870

We would also be happy for questions and comments on why we got it completely wrong.😊

If you are at ICML, I will present this paper on Wed 16 Jul 4:30 in the East Exhibition Hall A-B #E-501.📍
arxiv.org
sbordt.bsky.social
We think the literature on explainable machine learning can learn a lot from looking at these papers!📚
sbordt.bsky.social
As I learned from our helpful ICML reviewers, there is a lot of existing research at the intersection of machine learning and statistics that takes the matter of interpretation quite seriously.
sbordt.bsky.social
In this framework, another way to formulate the initial problems is: For many popular explanation algorithms, it is not clear whether they have an interpretation.
sbordt.bsky.social
Having an interpretation means that the explanation formalizes an intuitive human concept, which is a fancy philosophical way of saying that it is clear what aspect of the function the explanation describes.🧠
sbordt.bsky.social
In addition, the way to develop explanations that are useful "in the world" is to develop explanations that have an interpretation.
sbordt.bsky.social
This has several important implications. Most importantly, explainable machine learning has often been trying to reinvent the wheel when we already have a robust framework for discussing complex objects in the light of pressing real-world questions.
sbordt.bsky.social
It took us a while to recognize it, but once you see it, you can't unsee it: Explainable machine learning is applied statistics for learned functions.✨
sbordt.bsky.social
Concretely, researchers in applied statistics study complex datasets by mapping their most important properties into low-dimensional structures. Now think:

Machine learning model ~ Large dataset
Explanation algorithm ~ Summary statistics, visualization
sbordt.bsky.social
Here comes our key realization: This question has occurred in other disciplines before, specifically in applied statistics research.
sbordt.bsky.social
So, how can we seriously discuss whether an explanation algorithm can be used to answer relevant questions about our trained model or the world?🌍