We got too used to no longer seeing the GPT base model.
Let’s compare to the DeepSeek base model.
The jump from base to reasoning is tremendous!
Large 3 starts off slightly higher than DeepSeek base. I’m eager to see Magistral Large!
We got too used to no longer seeing the GPT base model.
Let’s compare to the DeepSeek base model.
The jump from base to reasoning is tremendous!
Large 3 starts off slightly higher than DeepSeek base. I’m eager to see Magistral Large!
It might not impress anyone because it lags behind GPT-5.1 all the reasoning models, even when accounting for their increased token consumption costs. GPT-OSS-20B High might beat it everywhere except agentic coding.
It might not impress anyone because it lags behind GPT-5.1 all the reasoning models, even when accounting for their increased token consumption costs. GPT-OSS-20B High might beat it everywhere except agentic coding.
Large 3 improves reasoning compared to Large 2, but is overtaken by… reasoning models.
Large 3 improves reasoning compared to Large 2, but is overtaken by… reasoning models.
And the original announcement: api-docs.deepseek.com/news/news251...
And the original announcement: api-docs.deepseek.com/news/news251...
It jumps ahead of the pack, which had caught up Gemini 2.5.
It jumps ahead of the pack, which had caught up Gemini 2.5.
You can contribute them here: github.com/espadrine/me...
You can contribute them here: github.com/espadrine/me...
Stunningly, we get to compare models really fast.
No need to wait for independent benchmarks to run, or for @arena votes.
A few benchmarks are enough.
Stunningly, we get to compare models really fast.
No need to wait for independent benchmarks to run, or for @arena votes.
A few benchmarks are enough.
We can infer unknown benchmark scores from published ones.
So we aggregate a lot of benchmarks, and predict the others.
(There is a bit of math involved in getting the right algorithm!)
We can infer unknown benchmark scores from published ones.
So we aggregate a lot of benchmarks, and predict the others.
(There is a bit of math involved in getting the right algorithm!)
models/text-embedding-004 doesn't have 429.
models/text-embedding-004 doesn't have 429.