Thaddée Tyl
banner
espadrine.bsky.social
Thaddée Tyl
@espadrine.bsky.social
Self-replicating organisms. shields.io, Captain Train, Qonto. They.
The path of the Mistral 7B is nice to see!

The OG one topped open models of that size. For the first time, a local model felt usable on consumer hardware.

Not only is the latest Ministral 8B on the Pareto frontier for knowledge vs. cost (and for search, math, agentic uses)…
December 3, 2025 at 10:40 AM
DeepSeek released V3.2 (and V3.2 Speciale, a math-oriented model).

New model, new benchmarks!

The biggest jump for DeepSeek V3.2 is on agentic coding, where it seems poised to erase a lot of models on the Pareto frontier, including Sonnet 4.5, Minimax M2, and K2 Thinking.
December 1, 2025 at 6:28 PM
So, how is Gemini 3 on this new leaderboard?

Its intrinsic knowledge is unmatched, surpassing 2.5 and GPT-5.1.

bsky.app/profile/espa...
November 18, 2025 at 5:37 PM
Unveiling a new LLM leaderboard: metabench.organisons.com

Why?

Company C1 releases model M1 and discloses benchmarks B1.
Company C2 releases M2, showing off benchmarks B2 which are distinct.
Comparing those models is hard since they don't share benchmarks!
November 18, 2025 at 5:21 PM
Am I using the Gemini APIs wrong? I keep getting 429's. The key was fresh from aistudio.google.com.

gemini-embedding-exp-03-07 is the only embedding model in the market that I can’t benchmark because of it.

The quota in the Console says I'm at 0.33% usage…
June 30, 2025 at 8:14 AM
Reposted by Thaddée Tyl
Our latest open-source speech-to-text model just claimed 1st place among streaming models and 5th place overall on the OpenASR leaderboard 🥇🎙️
While all other models need the whole audio, ours delivers top-tier accuracy on streaming content.
Open, fast, and ready for production!
June 27, 2025 at 10:31 AM
Isn’t there a better way to handle screens than asking a *language model* to guess the number of pixels to the left and top of a UI widget?
June 10, 2025 at 12:51 PM
Reposted by Thaddée Tyl
Talk to unmute.sh 🔊, the most modular voice AI around. Empower any text LLM with voice, instantly, by wrapping it with our new speech-to-text and text-to-speech. Any personality, any voice. Interruptible, smart turn-taking. We’ll open-source everything within the next few weeks.
May 23, 2025 at 10:14 AM
Search > Recommendation.

I find more interesting, high-signal things from querying what I like, than linearly going through a feed that learnt from my navigation.

Generally, giving users the ability to send reliable signals beats extracting signals from their background noise.
May 18, 2025 at 11:27 AM
Reposted by Thaddée Tyl
It is critical for scientific integrity that we trust our measure of progress.

The @lmarena.bsky.social has become the go-to evaluation for AI progress.

Our release today demonstrates the difficulty in maintaining fair evaluations on the Arena, despite best intentions.
April 30, 2025 at 2:55 PM
I wonder what the story was for Phi-4 Mini. Its tokenizer for conversation is completely different from Phi-4.
April 6, 2025 at 4:55 PM
Reposted by Thaddée Tyl
New paper: Simulating Time With Square-Root Space

people.csail.mit.edu/rrw/time-vs-...

It's still hard for me to believe it myself, but I seem to have shown that TIME[t] is contained in SPACE[sqrt{t log t}].

To appear in STOC. Comments are very welcome!
people.csail.mit.edu
February 21, 2025 at 10:19 PM
Censorship is when the government silences speech.

With Mr Musk being in government, doesn’t that make every X suspension or shadow ban, censorship?
March 24, 2025 at 2:25 AM
Preventing political opponents from joining elections, by removing their diploma and putting them in prison with unjustified charges, is not democratic.

Is there a shred of reason behind Ekrem Immamoglu's jailing?

apnews.com/article/turk...
Turkish court orders Erdogan rival jailed pending trial on corruption charges as protests grow
A Turkish court formally arrested Mayor Ekrem Imamoglu, a key rival to President Recep Tayyip Erdogan, and ordered him jailed pending the outcome of a trial on corruption charges.
apnews.com
March 24, 2025 at 1:23 AM
Reposted by Thaddée Tyl
We've kept pushing our Open-R1 project, an open initiative to replicate and extend the techniques behind DeepSeek-R1

And even we were mind-blown by the results we got with this latest model we're releasing: ⚡️OlympicCoder

[1/3]
March 12, 2025 at 1:22 PM
Is there an economic reason for which the tariffs established during Mr Trump’s first term didn’t cause a recession, but those established now did?
March 11, 2025 at 5:41 PM
We can get GNSS spacial positioning all the way to the moon, given the right receiver!

Greatly simplifies space travel.

I still believe we should set up a separate GNSS on every planet.

ntrs.nasa.gov/api/citation...
March 5, 2025 at 2:30 PM
Italy will reintroduce nuclear energy through SMRs and fusion research.

Decarbonization fights against an existential risk. I approve!

www.mase.gov.it/comunicati/n...
Nucleare sostenibile: MASE, il Consiglio dei Ministri approva la delega | Ministero dell'Ambiente e della Sicurezza Energetica
Obiettivo disciplinare la produzione di energia attraverso i nuovi moduli, lo smantellamento delle vecchie centrali, la gestione di rifiuti e combustibile esaurito, ricerca e sviluppo su energia da fu...
www.mase.gov.it
March 4, 2025 at 2:16 PM
LLMs get better at tool use and search.
Model memorization is thus less useful than reasoning.
Yet a lot of benchmarks still focus on the former.
February 26, 2025 at 9:28 AM
It is a bit sad that codec programs gave up on using GPGPU / CUDA, which is much more widespread than hardware acceleration.
February 23, 2025 at 9:08 AM
Mistral Chat Pro being so fast to generate messages is really nice.

I would love to see how it feels if they release a reasoning model.
February 21, 2025 at 10:26 AM
Surprisingly, bigger Llama 3 models are worse at learning from relevant context, and giving a good answer, than smaller ones.

Unsurprisingly, base models evaluate the probability of a good answer better than instruct models, which will give a low probability to speech that doesn't match their style
February 17, 2025 at 10:11 PM
Reposted by Thaddée Tyl
We just released Hibiki 🟢, a real time speech-to-speech translation 🇫🇷 -> 🇬🇧. It preserves the voice of the user, and the smaller variant can run on iPhone as showed by Neil in this video.
Find the code on github github.com/kyutai-labs/... and the weights on HF and give it a spin!
February 7, 2025 at 8:26 AM
Bittersweet to see the latest Codestral so close to the open-weights version, yet to see both are so close to Claude.
February 6, 2025 at 10:28 AM
US presidential actions related to energy:

• Eliminate EV mandate
• Terminate the Green New Deal
• Stop funding EV charging stations
• Eliminate taxes on fuel and gas-powered vehicles

Doesn’t that negatively impact Tesla?
January 23, 2025 at 9:11 AM