The OG one topped open models of that size. For the first time, a local model felt usable on consumer hardware.
Not only is the latest Ministral 8B on the Pareto frontier for knowledge vs. cost (and for search, math, agentic uses)…
The OG one topped open models of that size. For the first time, a local model felt usable on consumer hardware.
Not only is the latest Ministral 8B on the Pareto frontier for knowledge vs. cost (and for search, math, agentic uses)…
New model, new benchmarks!
The biggest jump for DeepSeek V3.2 is on agentic coding, where it seems poised to erase a lot of models on the Pareto frontier, including Sonnet 4.5, Minimax M2, and K2 Thinking.
New model, new benchmarks!
The biggest jump for DeepSeek V3.2 is on agentic coding, where it seems poised to erase a lot of models on the Pareto frontier, including Sonnet 4.5, Minimax M2, and K2 Thinking.
Its intrinsic knowledge is unmatched, surpassing 2.5 and GPT-5.1.
bsky.app/profile/espa...
Its intrinsic knowledge is unmatched, surpassing 2.5 and GPT-5.1.
bsky.app/profile/espa...
Why?
Company C1 releases model M1 and discloses benchmarks B1.
Company C2 releases M2, showing off benchmarks B2 which are distinct.
Comparing those models is hard since they don't share benchmarks!
Why?
Company C1 releases model M1 and discloses benchmarks B1.
Company C2 releases M2, showing off benchmarks B2 which are distinct.
Comparing those models is hard since they don't share benchmarks!
gemini-embedding-exp-03-07 is the only embedding model in the market that I can’t benchmark because of it.
The quota in the Console says I'm at 0.33% usage…
gemini-embedding-exp-03-07 is the only embedding model in the market that I can’t benchmark because of it.
The quota in the Console says I'm at 0.33% usage…
While all other models need the whole audio, ours delivers top-tier accuracy on streaming content.
Open, fast, and ready for production!
While all other models need the whole audio, ours delivers top-tier accuracy on streaming content.
Open, fast, and ready for production!
I find more interesting, high-signal things from querying what I like, than linearly going through a feed that learnt from my navigation.
Generally, giving users the ability to send reliable signals beats extracting signals from their background noise.
I find more interesting, high-signal things from querying what I like, than linearly going through a feed that learnt from my navigation.
Generally, giving users the ability to send reliable signals beats extracting signals from their background noise.
The @lmarena.bsky.social has become the go-to evaluation for AI progress.
Our release today demonstrates the difficulty in maintaining fair evaluations on the Arena, despite best intentions.
The @lmarena.bsky.social has become the go-to evaluation for AI progress.
Our release today demonstrates the difficulty in maintaining fair evaluations on the Arena, despite best intentions.
people.csail.mit.edu/rrw/time-vs-...
It's still hard for me to believe it myself, but I seem to have shown that TIME[t] is contained in SPACE[sqrt{t log t}].
To appear in STOC. Comments are very welcome!
people.csail.mit.edu/rrw/time-vs-...
It's still hard for me to believe it myself, but I seem to have shown that TIME[t] is contained in SPACE[sqrt{t log t}].
To appear in STOC. Comments are very welcome!
With Mr Musk being in government, doesn’t that make every X suspension or shadow ban, censorship?
With Mr Musk being in government, doesn’t that make every X suspension or shadow ban, censorship?
Is there a shred of reason behind Ekrem Immamoglu's jailing?
apnews.com/article/turk...
Is there a shred of reason behind Ekrem Immamoglu's jailing?
apnews.com/article/turk...
And even we were mind-blown by the results we got with this latest model we're releasing: ⚡️OlympicCoder
[1/3]
And even we were mind-blown by the results we got with this latest model we're releasing: ⚡️OlympicCoder
[1/3]
Greatly simplifies space travel.
I still believe we should set up a separate GNSS on every planet.
ntrs.nasa.gov/api/citation...
Greatly simplifies space travel.
I still believe we should set up a separate GNSS on every planet.
ntrs.nasa.gov/api/citation...
Decarbonization fights against an existential risk. I approve!
www.mase.gov.it/comunicati/n...
Decarbonization fights against an existential risk. I approve!
www.mase.gov.it/comunicati/n...
Model memorization is thus less useful than reasoning.
Yet a lot of benchmarks still focus on the former.
Model memorization is thus less useful than reasoning.
Yet a lot of benchmarks still focus on the former.
I would love to see how it feels if they release a reasoning model.
I would love to see how it feels if they release a reasoning model.
Unsurprisingly, base models evaluate the probability of a good answer better than instruct models, which will give a low probability to speech that doesn't match their style
Unsurprisingly, base models evaluate the probability of a good answer better than instruct models, which will give a low probability to speech that doesn't match their style
Find the code on github github.com/kyutai-labs/... and the weights on HF and give it a spin!
Find the code on github github.com/kyutai-labs/... and the weights on HF and give it a spin!
• Eliminate EV mandate
• Terminate the Green New Deal
• Stop funding EV charging stations
• Eliminate taxes on fuel and gas-powered vehicles
Doesn’t that negatively impact Tesla?
• Eliminate EV mandate
• Terminate the Green New Deal
• Stop funding EV charging stations
• Eliminate taxes on fuel and gas-powered vehicles
Doesn’t that negatively impact Tesla?