Samuel Müller
@sammuller.bsky.social
310 followers 120 following 37 posts
(Tab)PFNs, TrivialAugment etc.
Posts Media Videos Starter Packs
sammuller.bsky.social
There are already early examples of this, that we discuss, in areas as diverse as biology, Bayesian optimization, time-series forecasting, and tabular data. The most prominent being TabPFN (Nature '25). 5/n

news.ycombinator.com/item?id=4264...
Show HN: TabPFN v2 – A SOTA foundation model for small tabular data | Hacker News
news.ycombinator.com
sammuller.bsky.social
We go into detailed comparisons to other Bayesian methods and the trade-offs that lead us to the conclusion, that PFNs will become dominant for Bayesian prediction, and further that Bayesian prediction will become more important overall with better priors. 4/n
sammuller.bsky.social
What's nice is that the model after training on this random data, will start to make sense of real-world data, too. It will approximate the posterior belonging to the prior of choice, e.g., a BNN, a GP, or in the most interesting cases a Bayesian model that doesn't exist yet. 3/n
sammuller.bsky.social
Prior-data fitted networks (PFNs) do just that!

The PFN idea is to use a prior, e.g. a bayesian neural network (BNN) prior, sample datasets from that prior, and then train to predict the hold-out labels of these datasets. (no training on real-world data) 2/n
sammuller.bsky.social
Compute is increasing much faster than data. How can we improve classical supervised learning long term (the underlying tech of most of GenAI)?

Our ICML position paper's answer: simply train on a bunch of artificial data (noise) and only do inference on real-world data! 1/n
sammuller.bsky.social
I am so proud to co-organize the workshop on foundation models for structured data at ICML. At this workshop, we will discuss on how to further extend the GenAI revolution to tabular data, time series forecasting etc. Consider submitting your work by May 19!
icml-structured-fm-workshop.github.io
Foundation Models for Structured Data
icml-structured-fm-workshop.github.io
sammuller.bsky.social
Could it be that @fchollet.bsky.social is not Francois Chollet?? They have a lot of ML followers 😅
sammuller.bsky.social
To then change it? In like „overhaul“?
sammuller.bsky.social
Find my full write up (including scenarios with bad actors, as well as the prompts used) plus the game here: github.com/SamuelGabrie...
If you think, my single person experiment is not to be trusted? You are right, try it yourself!
GitHub - SamuelGabriel/LMARENA-GAMING
Contribute to SamuelGabriel/LMARENA-GAMING development by creating an account on GitHub.
github.com
sammuller.bsky.social
In combination with the large employee numbers at top AI labs and small numbers of votes on lmarena lead me to the conclusion that lmarena scores are probably dominated by biased votes.
sammuller.bsky.social
In hard mode I attributed 13/20 completely correctly, much higher than the expected 3.3 of random guessing.
That is I could identify all 3 models correctly in 13/20 cases after practicing with 20 questions.
That means attributing responses to LLMs is super easy for humans.
sammuller.bsky.social
I first played easy mode (see below), where I got two answers from each model and need to match them.
I used 20 interactions in the easy mode to learn the models' behaviors.
In hard mode (see prev post), you need to match three responses to the LLM name.
sammuller.bsky.social
Second, employees are very likely able to tell models apart based on their gut feeling.
To figure out if this is the case, I created a game with two modes.
The game is about identifying which answer was provided by which LLM.
sammuller.bsky.social
First, AI labs have enough employees to bias the benchmarks.
E.g. Grok 3 only has 10K votes and there are 2.7M votes in total on lmarena.
If half of e.g. OpenAI (2,000 employees) voted just once a day, they would make up > 10% of all 2.7M lmarena votes over its one-year existence.
sammuller.bsky.social
I believe lmarena.ai scores are not to be trusted, as the people voting are likely to come from the AI labs in the leaderboard and push their own models unintentionally. A thread 🧵
Chatbot Arena (formerly LMSYS): Free AI Chat to Compare & Test Best AI Chatbots
lmarena.ai
sammuller.bsky.social
How wrong do you think are the lmarena scores? Grok must be very easy to distinguish from other models in a blind evaluation
sammuller.bsky.social
seems to beat boosting there, too, but prob a bit early to make definitive statements
sammuller.bsky.social
What did you think was interesting? The interview had such bad timing, a few days before the r1 launch
sammuller.bsky.social
MiniMax-01 takeaways

- 7 of 8 layers are linear att
- implemented a flash-variant of linear attention + ring-att
- post-norm is back in large models! (using deepnorm)
- prob. wrong scaling laws, as lr schedule is not adapted (see Chinchilla)

Let's see how it fares in the arena!
sammuller.bsky.social
Thank you :) So far, we only open source the model itself and how to use it. We do not open source how to train it exactly, sorry for that :| there is a company starting based on the model, thus it is kinda its mode
Reposted by Samuel Müller
victorbcn.bsky.social
Los modelos preentrenados para datos tabulares (TabPFN) podrían ser el nuevo state of the art para regresión y clasificación. 🙄 Habrá que probarlo. El GitHub al final del hilo. Éste en concreto es enorme y si se comporta como dicen es un gran salto adelante en el state of the art del campo.
sammuller.bsky.social
This might be the first time after 10 years that boosted trees are not the best default choice when working with data in tables.
Instead a pre-trained neural network is, the new TabPFN, as we just published in Nature 🎉