Kyle Lo
banner
kylelo.bsky.social
Kyle Lo
@kylelo.bsky.social
language model pretraining @ai2.bsky.social, co-lead of data research w/ @soldaini.net, statistics @uw, open science, tabletop, seattle, he/him,🧋 kyleclo.com
lucky to chat w sen. patty murray about olmo & importance of fully open AI
January 18, 2026 at 3:09 AM
just in case it wasn’t clear which room this is
January 7, 2026 at 5:58 PM
paper has:
🐟 more on our eval ideology
🦈 more baselines
🍣 more about RL Zero
etc

we picked final model (internally called moonlit surfer 🌛🏄) not just on bench scores but good vibes 🥰
December 12, 2025 at 6:03 PM
I'll be at #NeurIPS2025 from Tues-Sat!

Come say hi 👋 if you wanna chat about
🦈 olmo 3 stories
🐟 pretraining data & evals
🍣 midtraining shouldnt exist
🐠 model specialization
🐡 AI for education
🍥 tabletop games
December 1, 2025 at 9:51 PM
yess!! sry bout the x-axis, still thinkin how to make figure clearer

it's exactly what you're saying -- each point refers to a stage of development. our release has data+ckpts+evals for all stages we use (figure) and wanted to show how it compares to other models which typically only few stages
November 21, 2025 at 10:00 PM
🍕Finally, we all know midtraining is an exciting time to get a ton of performance boost

But team organization to sustain consistent model improvements (without burnout) is important!

We have explorers "own" target capabilities & centralized assessment team run "integration tests"
November 20, 2025 at 6:20 PM
🍨Data quality signals matter but also how you use them!

Traditional ways of using data quality is to threshold: Define a cutoff and take all the documents above that threshold.

But why not sample *proportional* to data quality?

We use Quality-Aware Upsampling to do exactly this
November 20, 2025 at 6:20 PM
🍣Data mixing is a little too powerful

It's easy to learn "optimal" mixes that oversample from certain pockets heavily. eg, STEM docs are valuable for climbing MMLU & but you don't have infinite STEM docs

We approach mixing as Token Constrained Optimization over diverse evals
November 20, 2025 at 6:20 PM
🦈Invest in your experimental design!

We create evals better suited for different compute scales, with our "easy" set of tasks+metrics able to support very small scale experiments before switching to our "main" set of evals, on which smaller models are below noise floor
November 20, 2025 at 6:20 PM
we released Olmo 3! lot of exciting stuff but wanna focus on:

🐟Olmo 3 32B Base, the best fully-open base model to-date, near Qwen 2.5 & Gemma 3 on diverse evals
🐠Olmo 3 32B Think, first fully-open reasoning model approaching Qwen 3 levels
🐡12 training datasets corresp to different staged training
November 20, 2025 at 6:20 PM
why intern at Ai2?

🐟interns own major parts of our model development, sometimes even leading whole projects
🐡we're committed to open science & actively help our interns publish their work

reach out if u wanna build open language models together 🤝

links 👇
November 5, 2025 at 11:11 PM
woah guess VLMs for OCR the hottest research topic this week😆 since the first olmOCR, we've been..

🔥training our VLM using RLVR with binary unit test rewards🔥

it's incredibly effective & unit test creation easy to scale w synthetic data pipelines

check it out at olmocr.allen.ai
October 22, 2025 at 6:02 PM
bye #colm2025 big fan of the montreal bagels 🥯 hot take I like them better than
October 11, 2025 at 6:16 PM
lol so much love for prepost-postpre training
October 9, 2025 at 5:13 PM
any other fans of pre-pretraining?
October 9, 2025 at 2:53 PM
come say hi at posters this morning for OLMo 2 and fluid benchmarking posters 👋 and dont miss @valentinhofmann.bsky.social's talk in morning #colm2025 @ai2.bsky.social vry proud of my gifs
October 9, 2025 at 1:14 PM
@josephc.bsky.social @mariaa.bsky.social and I are at poster #21

findings from large scale survey of 800 researchers on how they use LMs in their research #colm2025
October 8, 2025 at 8:12 PM
flyin to #colm2025 along w bunch of the @ai2.bsky.social team

come chat w me about pretraining horror stories, data & evals, what we're cookin for next olmo, etc

made a 🔥 poster for thursday sess, come say hi
October 6, 2025 at 3:20 PM
5 am airport for the only direct flight from seattle to montreal #colm2025
October 6, 2025 at 11:56 AM
LM benchmark design requires 3 decisions, how to:
🐟 select test cases
🐠 score LM on each test
🦈 aggregate scores to estimate perf

fluid benchmarking is simple:
🍣 find max informative test cases
🍥 estimate 'ability', not simple avg perf

why care? turn ur grey noisy benchmarks to red ones!
September 17, 2025 at 6:17 PM
looks like the preprint has been updated to include a disclaimer that this was a class project & intentionally provocatively written 😐
August 20, 2025 at 5:30 PM
⚠️ AI-generated content may be inaccurate. Verify important information independently.
August 8, 2025 at 8:33 PM
only took few days to descend into madness
July 1, 2025 at 8:12 PM
back from copenhagen & berkeley travels, now moving into new @ai2.bsky.social office!
June 26, 2025 at 3:45 PM
thx for organizing! great to meet NLP folks & consume fancy bread 🥖🍞🥐
June 21, 2025 at 2:32 PM