Lightnews — Scholar-powered news

Kyle Lo @ COLM 2025 🍁

@kylelo.bsky.social

6.5K followers 590 following 500 posts

language model research @ai2.bsky.social, Co-lead of Data for OLMo w/ @soldaini.net, statistics @uw, open science, tabletop, seattle, he/him,🧋 kyleclo.com

Posts Media Videos Starter Packs

Pinned

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · Mar 13

we released olmo 32b today! ☺️

🐟our largest & best fully open model to-date
🐠right up there w similar size weights-only models from big companies on popular benchmarks
🐡but we used way less compute & all our data, ckpts, code, recipe are free & open

made a nice plot of our post-trained results!✌️

1 5 38

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · 2d

flyin to #colm2025 along w bunch of the @ai2.bsky.social team

come chat w me about pretraining horror stories, data & evals, what we're cookin for next olmo, etc

made a 🔥 poster for thursday sess, come say hi

1 11

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · 2d

same flight lol I just got to airport way too early

1 1

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · 2d

5 am airport for the only direct flight from seattle to montreal #colm2025

1 12

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · 5d

ya totes, see u there!

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · 5d

synthetic data mimics real data's rough shape, modality, types, schema, etc. but with fake values. models these days are quite proficient at operating over data of this type & generating reasonable code; the main contrib here is system design to replace the repetitive data exploratory workflow

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · 5d

hehe i didnt do anythin!

core is data voyager (arxiv.org/abs/2402.13610) but local LM instead of GPT

it generates code (map-reduce-filter) that transforms data (csvs), a federated platform executes & returns some output back to system. system repeatedly interprets + generates more code

Data-driven Discovery with Large Generative Models

With the accumulation of data at an unprecedented rate, its potential to fuel scientific discovery is growing exponentially. This position paper urges the Machine Learning (ML) community to exploit th...

arxiv.org

2 1

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · 5d

not my project but I rlly like it

working w cancer research center to analyze clinical data, but private data cant leave the center.

so the team developed a tool that generates code for remote execution by the cancer center, developed on synthetic data, and now tested for realsies 🤩

1 4

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · 5d

nice post! reminds me of this ICLR2023 paper also had a short discussion about other architectures not seeing as big diff between adam/sgd as seen w modern transformers arxiv.org/abs/2304.139...

Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be

The success of the Adam optimizer on a wide array of architectures has made it the default in settings where stochastic gradient descent (SGD) performs poorly. However, our theoretical understanding o...

arxiv.org

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · 19d

had to explain to first time submitter why AC recommended accept ended up as reject 😮‍💨 been publishing long enough that i get why such things happen but can be rough

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · 19d

oh dang, missed this paper, this is rlly nice thx!

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · 19d

high dim + discrete space (tokens). back in soft prompts days, gradients made high dim easier to handle cuz continuous space. high dim search wout gradients is tough

1 4

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · 20d

LM benchmark design requires 3 decisions, how to:
🐟 select test cases
🐠 score LM on each test
🦈 aggregate scores to estimate perf

fluid benchmarking is simple:
🍣 find max informative test cases
🍥 estimate 'ability', not simple avg perf

why care? turn ur grey noisy benchmarks to red ones!

2 5

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · 25d

scathing takedown of recent K2 Think model

"evaluates on data it was trained on, relies on an external model and additional samples for its claimed performance gains, and artificially reduces the scores of compared models"

www.sri.inf.ethz.ch/blog/k2think

Debunking the Claims of K2-Think

K2-Think is a recently released LLM that claims performance on par with GPT-OSS 120B and DeepSeek v3.1, despite having fewer parameters. As we discuss below, the reported gains are overstated, relying...

www.sri.inf.ethz.ch

1 6

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · Sep 4

can also view this as just candidate selection & push all “late interaction” or anything too complex for cosine sim to neural reranker(s)

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · Sep 4

working on a similar project now actually 😮 did u happen to see if ppl do well on this test on human reasoning steps

1 2

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · Sep 4

@bcgl.bsky.social ‘s work, like arxiv.org/abs/2005.01583

The Newspaper Navigator Dataset: Extracting And Analyzing Visual Content from 16 Million Historic Newspaper Pages in Chronicling America

Chronicling America is a product of the National Digital Newspaper Program, a partnership between the Library of Congress and the National Endowment for the Humanities to digitize historic newspapers....

arxiv.org

1 4

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · Sep 4

es good read 🙏🏻

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · Sep 4

have been believer in decomposing queries to many atomic units, each triggering its own retrieval, and assembling results after. feels like this has always been the thing that works, even if less elegant than an “end to end learned” approach

arxiv.org/abs/2305.15053

Decomposing Complex Queries for Tip-of-the-tongue Retrieval

When re-finding items, users who forget or are uncertain about identifying details often rely on creative strategies for expressing their information needs -- complex queries that describe content ele...

arxiv.org

1 4

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · Sep 4

“people didn’t want to buy it because they thought that a third of a pound was less than a quarter pound because three is less than four”

lol we were clowning on models for 9.11 > 9.9 but prolly should’ve checked human baseline

Reposted by Kyle Lo @ COLM 2025 🍁

Nathan Lambert @natolambert.bsky.social · Sep 4

COLM is coming up! Very excited. I'm starting to figure out two things:
1. A small invite-only dinner for Interconnects AI (Ai2 event news later).
2. Various research chats and catchups.
Fill out the form below or email me if you're interested :) 🍁🇨🇦
Interest form: buff.ly/9nWBxZ9

1 7

Reposted by Kyle Lo @ COLM 2025 🍁

Ai2 @ai2.bsky.social · Aug 28

🎙️ Say hello to OLMoASR—our fully open, from-scratch speech-to-text (STT) model. Trained on a curated audio-text set, it boosts zero-shot ASR and now powers STT in the Ai2 Playground. 👇

1 6 19

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · Aug 20

Congrats!!

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · Aug 20

looks like the preprint has been updated to include a disclaimer that this was a class project & intentionally provocatively written 😐

1 5

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · Aug 20

we r quite strict on ourselves lolol

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · Aug 20

congrattsss jack!