Nathan Lambert
banner
natolambert.bsky.social
Nathan Lambert
@natolambert.bsky.social
A LLN - large language Nathan - (RL, RLHF, society, robotics), athlete, yogi, chef
Writes http://interconnects.ai
At Ai2 via HuggingFace, Berkeley, and normal places
Pinned
First draft online version of The RLHF Book is DONE. Recently I've been creating the advanced discussion chapters on everything from Constitutional AI to evaluation and character training, but I also sneak in consistent improvements to the RL specific chapter.

rlhfbook.com
New data on the ATOM Project website courtesy of OpenRouter, confirming that open model usage measured through inference over time mirrors all the other adoption metrics we have: China surged in the summer of 2025 and hasn't looked back.
January 28, 2026 at 3:30 PM
Stoked to share my latest podcast as Arcee launches their 400B MoE model today — Trinity Large — open weights, good license, strong performance. A small startup decided to take on the many established AI labs by pretraining their own model to release openly and monetize later, and they’re succeeding
January 27, 2026 at 11:12 PM
There’s a big level up coming for coding agents when they can, and know when to, turn to search models like GPT 5.2 pro automatically.

Another level up when those search models get 10x faster.
January 27, 2026 at 6:02 PM
Has taken a long time to polish, but slowly becoming very proud of rlhfbook.com and do think it's a great resource for many people. A lot of hours (and tokens and reader feedback) going into making it right.
January 25, 2026 at 7:00 PM
Exciting addition to the RLHF Book on my DGX Spark arc is a bunch of single GPU, tinkering scripts for minimal examples of RL, reward models, etc. (and DPO like algorithms soon).

Right now has REINFORCE, RLOO, PPO, GRPO, Dr. GRPO, GSPO, CISPO, standard RM, PRM, ORM.

github.com/natolambert/...
January 25, 2026 at 5:17 PM
Of all the AI assistant auto-PR review tools out there, this ChatGPT-Codex one that 👀 then 👍 or gives very specific feedback (and collects RLHF data too) is my favorite. The feedback it gives is remarkably reliable and very, very specific.
January 24, 2026 at 1:35 AM
Apple should skip launching the new Siri as a chatbot and just launch it as a CLI agent
January 23, 2026 at 3:28 PM
A well known important feature to stabilize RL training is implementing the LM head in fp32 precision to help with gradients. Reproduced the plot from the MiniMax M1 paper entirely on my dgx spark and in Ai2's post-training research codebase.
January 23, 2026 at 3:04 PM
Frontier model transparency documents around model behavior:
Anthropic -> Claude's Constitution -> excellent (Claude focused)
OpenAI -> Model Spec -> excellent (developer/team focused)
xAI -> Elon's Tweets -> ...
DeepMind -> ???

Come on Google, need some transparency here.
January 21, 2026 at 10:56 PM
Get Good at Agents
The tools are getting so powerful that we need to change how we scope, manage, and approach our work.
A.k.a. not getting them to work is a skill issue.
www.interconnects.ai/p/get-good-a...
Get Good at Agents
The tools are getting so powerful that we need to change how we scope, manage, and approach our work.
www.interconnects.ai
January 21, 2026 at 5:09 PM
My mom told me she's using AI for work today, I'm so proud
January 19, 2026 at 9:17 PM
Being good at using AI agents is a better moat than working hard.
January 18, 2026 at 8:06 PM
Software is becoming free, good decision making in research, design, and product has never been so valuable. I hope people realize this and work less, spend more time cultivating peace, so the brain can do its best -- let the agents do most of the hard work.
January 18, 2026 at 8:03 PM
In some domains LLMs are superhuman at search; in others they’re shit. Know thy tool.
PSA: LLMs know how to search. Maybe they’re not as good as you at it, but they’re also search engines.
it's crazy the number of people who will pontificate confidently on LLMs without knowing that the popular consumer chatbots search the internet and link to the results in their responses.
January 18, 2026 at 6:03 PM
Always feel a little misled when the winter weather is this great in Seattle
January 18, 2026 at 5:57 PM
It baffles me that codex cli still has issues with basic git operations lol
January 17, 2026 at 3:57 PM
I spent a bunch of time this week getting my Nvidia DGX Spark working in Ai2's post-training repo (open-instruct) for a local RL debugging machine. It was quite hard due to cuda 13 requirement and not many VLLM wheels.

github.com/natolambert/...
GitHub - natolambert/dgx-spark-setup: Setup guide for ML training on NVIDIA DGX Spark (GB10 Blackwell, CUDA 13, aarch64)
Setup guide for ML training on NVIDIA DGX Spark (GB10 Blackwell, CUDA 13, aarch64) - natolambert/dgx-spark-setup
github.com
January 15, 2026 at 9:50 PM
Claude Code does diagrams too -- a nice comparison of core RL methods.
Coming to the rlhf book soon.
January 15, 2026 at 3:20 AM
I'm playing with adding diagrams to the RLHF book, so I asked GPT 5 Pro for a plan of how to do it, then handed it off to Claude Opus 4.5 with a Gemini API key as an expert for visual feedback.

These are the first diagrams it generated for me -- zero feedback yet, for the reward model chapter.
January 13, 2026 at 11:31 PM
Got our RL research codebase at @ai2.bsky.social running on my Nvidia DGX Spark. Fun times ahead and lots of learnings to share :)
January 13, 2026 at 1:47 AM
GPT 5.2 Pro is Claude Opus 4.5's best search tool and Opus is GPT's best prompter
January 13, 2026 at 12:55 AM
Excited to announce the Relative Adoption Metric a new way of studying model downloads that contextualizes it across time and model sizes.

While building The ATOM Project and other tools to measure the open ecosystem at Interconnects, we are often frustrated with using downloads as a primary metric
January 12, 2026 at 4:22 PM
OpenAI's GPT OSS is still insanely underrated as a highly adopted open LLM. Downloads are out of control.
January 12, 2026 at 1:40 AM
The AI for math revolution is obviously real via verifiable program languages like Lean, just much harder to reason about the impacts vis a vis something like software engineering.
January 12, 2026 at 12:13 AM
The combo of improvements in reasoning efficiency (fewer tokens per answer, still very new research area) and faster chips is going to make coding agents so so much faster in 6-12 months.

The products in 2+ years will feel approx instantaneous relative to today.
January 11, 2026 at 6:04 PM