Alexander Doria
banner
dorialexander.bsky.social
Alexander Doria
@dorialexander.bsky.social
LLM for the commons.
Pinned
Breaking: we release a fully synthetic generalist dataset for pretraining, SYNTH and two new SOTA reasoning models exclusively trained on it. Despite having seen only 200 billion tokens, Baguettotron is currently best-in-class in its size range. pleias.fr/blog/blogsyn...
DeepSeek just released a new state of the art math prover, DeepSeek-Math-V2, competitive with Google, OpenAI or ByteDance, while being a publicly documented open weight models. A few reading notes along the way:
November 27, 2025 at 3:41 PM
And a major open science release from Prime Intellect: they don’t stress it enough but SFT part is beyond post-training. This is a fully documented mid-training with tons of insights/gems on MoE training, asynchronous infra RL, deep research. storage.googleapis.com/intellect-3-...
November 27, 2025 at 7:47 AM
Not a fan so far of "sovereign" displacing "open" in all things AI/tech in the EU.
November 26, 2025 at 8:58 PM
And another social event on repeat:
>What are you doing?
>So we train from scratch.
>Ok but which models are you fine tuning
>From **scratch**. Zero, nihil, zilch.
November 26, 2025 at 7:47 PM
The threshold for consistent English/query understanding is now 3M parameters.
November 26, 2025 at 9:21 AM
YES. Main reason classic pretraining dominated for so long is just that you don’t have to think so much about the data or what elicits reasoning. It’s "here".

For Sutskever/Patel new podcast: www.dwarkesh.com/p/ilya-sutsk...
November 25, 2025 at 9:27 PM
As far as bubbles go, looks like multiple anti-AI movements are popping before Nvidia.
November 23, 2025 at 9:54 AM
For all the talk about code, I think +50% of my ChatGPT use is daily appliances.
November 22, 2025 at 1:52 PM
Actually an additional note on SYNTH: it might well be the fastest (pre-)training dataset ever created. Due to some major infrastructure issue, we had to reconstitute most of it in a handful of days.
November 21, 2025 at 7:22 PM
Almost coming to regret writing this paper: easily 90% of issues/complaints for no material benefit. Why classic non-synth open data can’t happen in AI.
Announcing the release of the official Common Corpus paper: a 20 page report detailing how we collected, processed and published 2 trillion tokens of reusable data for LLM pretraining arxiv.org/pdf/2506.01732
November 21, 2025 at 9:18 AM
Lol someone trying to sell me the creation of a Wikipedia page. I’ve seen enough as an admin to know it should *only* happen organically. Speedy deletion is far from the worst outcome.
November 20, 2025 at 9:58 PM
one week later, sorry to announce baguettotron has consistently climbed in popularity and prophecy is taking shape.
November 18, 2025 at 3:47 PM
We’re getting fanart now.
I can't help it. I am feeling overly absurd, today
November 17, 2025 at 8:42 PM
First successful fine tune of Baguettotron. And very on brand to see it’s about poetry.
November 17, 2025 at 2:58 PM
At some EU LLM thing and don’t really have to present myself: everyone knows Baguettotron.
November 16, 2025 at 9:59 PM
Still can’t believe i got the opportunity to beta test gemini 4. Model is wild.
November 16, 2025 at 9:33 PM
Nothing to do with AI, but this, this was an incredible novel. One of Borges’ favorite too.
November 16, 2025 at 6:02 PM
german man boarding the plane from france with no less than four baguette: here goes my potential customer target.
November 16, 2025 at 2:31 PM
now reading (1964 SF novel, but it’s really about synthetic environments)
November 15, 2025 at 3:59 PM
Since people were wondering what could be the use cases for Monad:
It's pretty good for text classification as well ngl. Half the size of Bert and can still do nearly as well
November 15, 2025 at 1:27 PM
Getting into pretraining has never been cheaper.
November 15, 2025 at 12:18 PM
Now a concept: vintage computer use model, distributed on disquette, only trained on classic core unix.
November 14, 2025 at 7:32 PM
Looking back, one of my main disappointment in LLM/AI research is seeing the non-commercial space shrinking, becoming more conservative, fragmented and less cooperative
November 14, 2025 at 5:10 PM
Apparently even Monad is not small enough.
Playing around with the PleIAs "smallest viable model" Monad, and realizing that with 4-bit quantization (storing 56 M parameters in ~27 MB) and a SuperDisk drive (to use the FD32MB format), you could turn it into a chat model that fits on a standard 3.5 inch diskette
November 13, 2025 at 8:46 PM
Reposted by Alexander Doria
ultimately, i am stoked on things like baguettotron and pretty big negative on publicly owned data centers because it seems like you should want to get the most productivity out of the least use of resources
November 13, 2025 at 3:59 PM