Lightnews — Scholar-powered news

Thaddée Tyl

@espadrine.bsky.social

The path of the Mistral 7B is nice to see!

The OG one topped open models of that size. For the first time, a local model felt usable on consumer hardware.

Not only is the latest Ministral 8B on the Pareto frontier for knowledge vs. cost (and for search, math, agentic uses)…

December 3, 2025 at 10:40 AM

Thaddée Tyl

@espadrine.bsky.social

DeepSeek released V3.2 (and V3.2 Speciale, a math-oriented model).

New model, new benchmarks!

The biggest jump for DeepSeek V3.2 is on agentic coding, where it seems poised to erase a lot of models on the Pareto frontier, including Sonnet 4.5, Minimax M2, and K2 Thinking.

December 1, 2025 at 6:28 PM

Thaddée Tyl

@espadrine.bsky.social

So, how is Gemini 3 on this new leaderboard?

Its intrinsic knowledge is unmatched, surpassing 2.5 and GPT-5.1.

bsky.app/profile/espa...

November 18, 2025 at 5:37 PM

Thaddée Tyl

@espadrine.bsky.social

Unveiling a new LLM leaderboard: metabench.organisons.com

Why?

Company C1 releases model M1 and discloses benchmarks B1.
Company C2 releases M2, showing off benchmarks B2 which are distinct.
Comparing those models is hard since they don't share benchmarks!

November 18, 2025 at 5:21 PM

Thaddée Tyl

@espadrine.bsky.social

Am I using the Gemini APIs wrong? I keep getting 429's. The key was fresh from aistudio.google.com.

gemini-embedding-exp-03-07 is the only embedding model in the market that I can’t benchmark because of it.

The quota in the Console says I'm at 0.33% usage…

June 30, 2025 at 8:14 AM

Reposted by Thaddée Tyl

Kyutai

@kyutai-labs.bsky.social

Our latest open-source speech-to-text model just claimed 1st place among streaming models and 5th place overall on the OpenASR leaderboard 🥇🎙️
While all other models need the whole audio, ours delivers top-tier accuracy on streaming content.
Open, fast, and ready for production!

June 27, 2025 at 10:31 AM

Thaddée Tyl

@espadrine.bsky.social

Isn’t there a better way to handle screens than asking a *language model* to guess the number of pixels to the left and top of a UI widget?

WARNING: Holo1 is using absolute coordinates (number of pixels) and HuggingFace processor is doing image resize. To have matching coordinates, one needs to smart_resize the image.

from transformers.models.qwen2_vl.image_processing_qwen2_vl import smart_resize

June 10, 2025 at 12:51 PM

Reposted by Thaddée Tyl

Kyutai

@kyutai-labs.bsky.social

Talk to unmute.sh 🔊, the most modular voice AI around. Empower any text LLM with voice, instantly, by wrapping it with our new speech-to-text and text-to-speech. Any personality, any voice. Interruptible, smart turn-taking. We’ll open-source everything within the next few weeks.

May 23, 2025 at 10:14 AM

Thaddée Tyl

@espadrine.bsky.social

Search > Recommendation.

I find more interesting, high-signal things from querying what I like, than linearly going through a feed that learnt from my navigation.

Generally, giving users the ability to send reliable signals beats extracting signals from their background noise.

May 18, 2025 at 11:27 AM

Reposted by Thaddée Tyl

Sara Hooker

@sarahooker.bsky.social

It is critical for scientific integrity that we trust our measure of progress.

The @lmarena.bsky.social has become the go-to evaluation for AI progress.

Our release today demonstrates the difficulty in maintaining fair evaluations on the Arena, despite best intentions.

April 30, 2025 at 2:55 PM

Thaddée Tyl

@espadrine.bsky.social

I wonder what the story was for Phi-4 Mini. Its tokenizer for conversation is completely different from Phi-4.

April 6, 2025 at 4:55 PM

Reposted by Thaddée Tyl

Ryan Williams

@rrwilliams.bsky.social

New paper: Simulating Time With Square-Root Space

people.csail.mit.edu/rrw/time-vs-...

It's still hard for me to believe it myself, but I seem to have shown that TIME[t] is contained in SPACE[sqrt{t log t}].

To appear in STOC. Comments are very welcome!

people.csail.mit.edu

February 21, 2025 at 10:19 PM

Thaddée Tyl

@espadrine.bsky.social

Censorship is when the government silences speech.

With Mr Musk being in government, doesn’t that make every X suspension or shadow ban, censorship?

March 24, 2025 at 2:25 AM

Thaddée Tyl

@espadrine.bsky.social

Preventing political opponents from joining elections, by removing their diploma and putting them in prison with unjustified charges, is not democratic.

Is there a shred of reason behind Ekrem Immamoglu's jailing?

apnews.com/article/turk...

Turkish court orders Erdogan rival jailed pending trial on corruption charges as protests grow

A Turkish court formally arrested Mayor Ekrem Imamoglu, a key rival to President Recep Tayyip Erdogan, and ordered him jailed pending the outcome of a trial on corruption charges.

apnews.com

March 24, 2025 at 1:23 AM

Reposted by Thaddée Tyl

Thomas Wolf

@thomwolf.bsky.social

We've kept pushing our Open-R1 project, an open initiative to replicate and extend the techniques behind DeepSeek-R1

And even we were mind-blown by the results we got with this latest model we're releasing: ⚡️OlympicCoder

[1/3]

March 12, 2025 at 1:22 PM

Thaddée Tyl

@espadrine.bsky.social

Is there an economic reason for which the tariffs established during Mr Trump’s first term didn’t cause a recession, but those established now did?

March 11, 2025 at 5:41 PM

Thaddée Tyl

@espadrine.bsky.social

We can get GNSS spacial positioning all the way to the moon, given the right receiver!

Greatly simplifies space travel.

I still believe we should set up a separate GNSS on every planet.

ntrs.nasa.gov/api/citation...

March 5, 2025 at 2:30 PM

Thaddée Tyl

@espadrine.bsky.social

Italy will reintroduce nuclear energy through SMRs and fusion research.

Decarbonization fights against an existential risk. I approve!

www.mase.gov.it/comunicati/n...

Nucleare sostenibile: MASE, il Consiglio dei Ministri approva la delega | Ministero dell'Ambiente e della Sicurezza Energetica

Obiettivo disciplinare la produzione di energia attraverso i nuovi moduli, lo smantellamento delle vecchie centrali, la gestione di rifiuti e combustibile esaurito, ricerca e sviluppo su energia da fu...

www.mase.gov.it

March 4, 2025 at 2:16 PM

Thaddée Tyl

@espadrine.bsky.social

LLMs get better at tool use and search.
Model memorization is thus less useful than reasoning.
Yet a lot of benchmarks still focus on the former.

Humanity's Last Exam (HLE): 5 stars reasoning, 3 stars memorization
MATH: 5 stars reasoning, 1 star memorization
ARB: 5 stars reasoning, 1 star memorization
GSM8K: 4 stars reasoning, 1 star memorization
ARC: 3 stars reasoning, 3 stars memorization
DROP: 3 stars reasoning, 3 stars memorization
HellaSwag: 3 stars reasoning, 3 stars memorization
GPQA: 2 stars reasoning, 5 stars memorization
MMMU: 2 stars reasoning, 4 stars memorization
MMLU: 1 star reasoning, 5 stars memorization

February 26, 2025 at 9:28 AM

Thaddée Tyl

@espadrine.bsky.social

It is a bit sad that codec programs gave up on using GPGPU / CUDA, which is much more widespread than hardware acceleration.

February 23, 2025 at 9:08 AM

Thaddée Tyl

@espadrine.bsky.social

Mistral Chat Pro being so fast to generate messages is really nice.

I would love to see how it feels if they release a reasoning model.

February 21, 2025 at 10:26 AM

Thaddée Tyl

@espadrine.bsky.social

Surprisingly, bigger Llama 3 models are worse at learning from relevant context, and giving a good answer, than smaller ones.

Unsurprisingly, base models evaluate the probability of a good answer better than instruct models, which will give a low probability to speech that doesn't match their style

Graph of the Rate of golden answers more likely to be generated with RAG for various models based on their Parameter Count.

February 17, 2025 at 10:11 PM

Reposted by Thaddée Tyl

Laurent Mazare

@lmazare.bsky.social

We just released Hibiki 🟢, a real time speech-to-speech translation 🇫🇷 -> 🇬🇧. It preserves the voice of the user, and the smaller variant can run on iPhone as showed by Neil in this video.
Find the code on github github.com/kyutai-labs/... and the weights on HF and give it a spin!

February 7, 2025 at 8:26 AM

Thaddée Tyl

@espadrine.bsky.social

Bittersweet to see the latest Codestral so close to the open-weights version, yet to see both are so close to Claude.

Claude 3.5 Sonnet 10/22: 1006 Elo.
Codestral 25.01: 1003 Elo.
Codestral 05/24 (open-weights): 1000 Elo.

February 6, 2025 at 10:28 AM

Thaddée Tyl

@espadrine.bsky.social

US presidential actions related to energy:

• Eliminate EV mandate
• Terminate the Green New Deal
• Stop funding EV charging stations
• Eliminate taxes on fuel and gas-powered vehicles

Doesn’t that negatively impact Tesla?

January 23, 2025 at 9:11 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news