Lightnews — Scholar-powered news

Kyle Lo

@kylelo.bsky.social

lucky to chat w sen. patty murray about olmo & importance of fully open AI

January 18, 2026 at 3:09 AM

Kyle Lo

@kylelo.bsky.social

just in case it wasn’t clear which room this is

January 7, 2026 at 5:58 PM

Kyle Lo

@kylelo.bsky.social

paper has:
🐟 more on our eval ideology
🦈 more baselines
🍣 more about RL Zero
etc

we picked final model (internally called moonlit surfer 🌛🏄) not just on bench scores but good vibes 🥰

December 12, 2025 at 6:03 PM

Kyle Lo

@kylelo.bsky.social

I'll be at #NeurIPS2025 from Tues-Sat!

Come say hi 👋 if you wanna chat about
🦈 olmo 3 stories
🐟 pretraining data & evals
🍣 midtraining shouldnt exist
🐠 model specialization
🐡 AI for education
🍥 tabletop games

December 1, 2025 at 9:51 PM

Kyle Lo

@kylelo.bsky.social

yess!! sry bout the x-axis, still thinkin how to make figure clearer

it's exactly what you're saying -- each point refers to a stage of development. our release has data+ckpts+evals for all stages we use (figure) and wanted to show how it compares to other models which typically only few stages

November 21, 2025 at 10:00 PM

Kyle Lo

@kylelo.bsky.social

🍕Finally, we all know midtraining is an exciting time to get a ton of performance boost

But team organization to sustain consistent model improvements (without burnout) is important!

We have explorers "own" target capabilities & centralized assessment team run "integration tests"

November 20, 2025 at 6:20 PM

Kyle Lo

@kylelo.bsky.social

🍨Data quality signals matter but also how you use them!

Traditional ways of using data quality is to threshold: Define a cutoff and take all the documents above that threshold.

But why not sample *proportional* to data quality?

We use Quality-Aware Upsampling to do exactly this

November 20, 2025 at 6:20 PM

Kyle Lo

@kylelo.bsky.social

🍣Data mixing is a little too powerful

It's easy to learn "optimal" mixes that oversample from certain pockets heavily. eg, STEM docs are valuable for climbing MMLU & but you don't have infinite STEM docs

We approach mixing as Token Constrained Optimization over diverse evals

November 20, 2025 at 6:20 PM

Kyle Lo

@kylelo.bsky.social

🦈Invest in your experimental design!

We create evals better suited for different compute scales, with our "easy" set of tasks+metrics able to support very small scale experiments before switching to our "main" set of evals, on which smaller models are below noise floor

November 20, 2025 at 6:20 PM

Kyle Lo

@kylelo.bsky.social

we released Olmo 3! lot of exciting stuff but wanna focus on:

🐟Olmo 3 32B Base, the best fully-open base model to-date, near Qwen 2.5 & Gemma 3 on diverse evals
🐠Olmo 3 32B Think, first fully-open reasoning model approaching Qwen 3 levels
🐡12 training datasets corresp to different staged training

November 20, 2025 at 6:20 PM

Kyle Lo

@kylelo.bsky.social

why intern at Ai2?

🐟interns own major parts of our model development, sometimes even leading whole projects
🐡we're committed to open science & actively help our interns publish their work

reach out if u wanna build open language models together 🤝

links 👇

November 5, 2025 at 11:11 PM

Kyle Lo

@kylelo.bsky.social

woah guess VLMs for OCR the hottest research topic this week😆 since the first olmOCR, we've been..

🔥training our VLM using RLVR with binary unit test rewards🔥

it's incredibly effective & unit test creation easy to scale w synthetic data pipelines

check it out at olmocr.allen.ai

October 22, 2025 at 6:02 PM

Kyle Lo

@kylelo.bsky.social

bye #colm2025 big fan of the montreal bagels 🥯 hot take I like them better than

October 11, 2025 at 6:16 PM

Kyle Lo

@kylelo.bsky.social

lol so much love for prepost-postpre training

October 9, 2025 at 5:13 PM

Kyle Lo

@kylelo.bsky.social

any other fans of pre-pretraining?

October 9, 2025 at 2:53 PM

Kyle Lo

@kylelo.bsky.social

come say hi at posters this morning for OLMo 2 and fluid benchmarking posters 👋 and dont miss @valentinhofmann.bsky.social's talk in morning #colm2025 @ai2.bsky.social vry proud of my gifs

October 9, 2025 at 1:14 PM

Kyle Lo

@kylelo.bsky.social

@josephc.bsky.social @mariaa.bsky.social and I are at poster #21

findings from large scale survey of 800 researchers on how they use LMs in their research #colm2025

October 8, 2025 at 8:12 PM

Kyle Lo

@kylelo.bsky.social

flyin to #colm2025 along w bunch of the @ai2.bsky.social team

come chat w me about pretraining horror stories, data & evals, what we're cookin for next olmo, etc

made a 🔥 poster for thursday sess, come say hi

October 6, 2025 at 3:20 PM

Kyle Lo

@kylelo.bsky.social

5 am airport for the only direct flight from seattle to montreal #colm2025

October 6, 2025 at 11:56 AM

Kyle Lo

@kylelo.bsky.social

LM benchmark design requires 3 decisions, how to:
🐟 select test cases
🐠 score LM on each test
🦈 aggregate scores to estimate perf

fluid benchmarking is simple:
🍣 find max informative test cases
🍥 estimate 'ability', not simple avg perf

why care? turn ur grey noisy benchmarks to red ones!