Lightnews — Scholar-powered news

Reposted by Ai2

Kyle Lo

@kylelo.bsky.social

our paper on data mixing for LMs is out!

while building Olmo 3, we saw gaps between data mixing literature and real practice

🐠choosing proxy size, # runs, sampling, regression, constraints..
🐟data shifts during LM dev: can we reuse past experiments?

Olmix tackles them all!

Ai2 @ai2.bsky.social · 9h

Data mixing – determining how much web text, code, math, etc., you need for LM development – is a first-order lever on model quality. Introducing Olmix: a framework for configuring mixing methods at the start of dev & efficiently updating as data changes throughout. 🧵

February 13, 2026 at 5:30 PM

Ai2

@ai2.bsky.social

Data mixing – determining how much web text, code, math, etc., you need for LM development – is a first-order lever on model quality. Introducing Olmix: a framework for configuring mixing methods at the start of dev & efficiently updating as data changes throughout. 🧵

February 13, 2026 at 4:34 PM

Ai2

@ai2.bsky.social

Knowing which questions to ask is often the hardest part of science. Today we're releasing AutoDiscovery in AstaLabs, an AI system that starts with your data and generates its own hypotheses. 🧪

February 12, 2026 at 4:06 PM

Ai2

@ai2.bsky.social

Introducing MolmoSpaces, a large-scale, fully open platform + benchmark for embodied AI research. 🤖

230k+ indoor scenes, 130k+ object models, & 42M annotated robotic grasps—all in one ecosystem.

February 11, 2026 at 7:47 PM

Reposted by Ai2

Kyle Lo

@kylelo.bsky.social

incredibly fun project led by our intern yapei chang

we mined the web for thousands of real-world “how to do X” step by step instructions and turned it into a dataset, synth data training procedure, eval suite, etc.

Ai2 @ai2.bsky.social · 3d

LLMs often generate step-by-step instructions, from real-world tasks (how do I file taxes?) to plans for AI agents. Improving this is hard: outputs can sound fluent for steps that don't work, and current datasets cover few domains.

How2Everything evals/trains for this at scale. 🧵

February 10, 2026 at 8:34 PM

Ai2

@ai2.bsky.social

LLMs often generate step-by-step instructions, from real-world tasks (how do I file taxes?) to plans for AI agents. Improving this is hard: outputs can sound fluent for steps that don't work, and current datasets cover few domains.

How2Everything evals/trains for this at scale. 🧵

February 10, 2026 at 4:53 PM

Ai2

@ai2.bsky.social

New: A web demo to make using DR Tulu even simpler, built by our collaborators at MIT & the University of Washington.
Ask a question and watch DR Tulu plan, search, & synthesize a citation-grounded report you can share. 🔎

February 9, 2026 at 4:29 PM

Reposted by Ai2

Jeffrey Brainard

@jeffreybrainard.bsky.social

Many want to use AI to accelerate science, and utilizing it to explore the growing tsunami of research articles is getting lots of attention. Measuring the quality of AI answers to questions about science is a challenge. @science.org www.science.org/content/arti...

Open-source AI program can answer science questions better than humans

Developed by and for academics, OpenScholar aims to improve searches of the ballooning scientific literature

www.science.org

February 4, 2026 at 6:52 PM

Ai2

@ai2.bsky.social

Since launching Open Coding Agents, it's been exciting to see how quickly the community has adopted them. Today we're releasing SERA-14B – a new 14B-parameter coding model – plus a major refresh of our open training datasets. 🧵

February 3, 2026 at 5:39 PM

Ai2

@ai2.bsky.social

Introducing Theorizer: Turning thousands of papers into scientific laws 📚➡️📜

Most automated discovery systems focus on experimentation. Theorizer tackles the other half of science: theory building—compressing scattered findings into structured, testable claims. 🧵

January 28, 2026 at 6:37 PM

Reposted by Ai2

kylelwiggers.bsky.social

@kylelwiggers.bsky.social

Here's just one of the cool apps you can vibe-code with SERA, our new agentic coding model! I was lucky enough to get my hands on it early and it's quite capable via Claude Code. Give it a go today!

January 27, 2026 at 8:29 PM

Ai2

@ai2.bsky.social

Introducing Ai2 Open Coding Agents—starting with SERA, our first-ever coding models. Fast, accessible agents (8B–32B) that adapt to any repo, including private codebases. Train a powerful specialized agent for as little as ~$400, & it works with Claude Code out of the box. 🧵

January 27, 2026 at 4:13 PM

Ai2

@ai2.bsky.social

Molmo 2 (8B) is now available via @hf.co Inference Providers, courtesy of Public AI.

State-of-the-art video understanding with pointing, counting, & multi-frame reasoning. Track objects through scenes and identify where + when events occur. 🧵

January 26, 2026 at 5:16 PM

Ai2

@ai2.bsky.social

Introducing HiRO-ACE: an AI framework that makes highly detailed climate simulations dramatically more accessible. It generates decades of high-resolution precipitation data for any region in a day on a single GPU—no supercomputing cluster required. 🧵

January 21, 2026 at 7:34 PM

Ai2

@ai2.bsky.social

"We wanted to provide subject matter experts and communities that have the expertise on the ground with the tools to engage with AI without having to learn AI deeply.” - Ted Schmitt. Thanks @mongabay.com for diving into our new OlmoEarth platform. 📷

January 16, 2026 at 8:55 PM

Ai2

@ai2.bsky.social

SciArena update: our Olmo 3.1 32B Instruct scores 963.6 Elo overall at just $0.17/100 calls—ahead of OpenAI’s GPT-OSS-20B. In Engineering, it hits 1039.2 Elo, only 2.5 behind GPT-OSS-120B—a model ~4× its size. 🧵

January 16, 2026 at 5:57 PM

Ai2

@ai2.bsky.social

Molmo 2 is now available via API on @openrouter.bsky.social, courtesy of Parasail—free until 1/29.
State-of-the-art video understanding with pointing, counting, and multi-frame reasoning—track objects through scenes & identify where + when events occur.
Open. Apache 2.0. 👇

January 13, 2026 at 5:59 PM

Ai2

@ai2.bsky.social

Olmo 3.1 32B Instruct is now on @openrouter.bsky.social, hosted by DeepInfra. Built for real-world use: reliable instruction following & function calling for agentic workflows + research. Fully open & leading benchmark performance, ready to plug into your stack. 👇

January 8, 2026 at 8:00 PM

Ai2

@ai2.bsky.social

🆕 New in Asta: multi-turn report generation.
You can now have back-and-forth conversations with Asta, our agentic platform for scientific research, to refine long-form, fully cited reports instead of relying on single-shot prompts.

December 18, 2025 at 4:09 PM

Ai2

@ai2.bsky.social

Now you can use our most powerful models via API.
Olmo 3.1 32B Think, our reasoning model for complex problems, is on @openrouter.bsky.social—free through 12/22. And Olmo 3.1 32B Instruct, our flagship chat model with tool use, is available through @hf.co Inference Providers. 👇

December 17, 2025 at 9:02 PM

Ai2

@ai2.bsky.social

🎥 Introducing SAGE, an agentic system for long video reasoning on entertainment videos—sports, vlogs, & more. It learns when to skim, zoom in, & answer questions directly. On our SAGE-Bench eval, SAGE with a Molmo 2 (8B)-based orchestrator lifts accuracy from 61.8% → 66.1%. 🧵

December 17, 2025 at 5:57 PM

Ai2

@ai2.bsky.social

🎗️ Reminder, our Molmo 2 and Olmo 3 Reddit AMA begins soon at 1pm PST / 4pm EST. www.reddit.com/r/LocalLLaMA...

From the LocalLLaMA community on Reddit

Explore this post and more from the LocalLLaMA community

www.reddit.com

December 16, 2025 at 8:41 PM

Ai2

@ai2.bsky.social

Last year Molmo set SOTA on image benchmarks + pioneered image pointing. Millions of downloads later, Molmo 2 brings Molmo’s grounded multimodal capabilities to video 🎥—and leads many open models on challenging industry video benchmarks. 🧵

December 16, 2025 at 4:52 PM

Ai2

@ai2.bsky.social

🗓️ Tue Dec 16, 1–2pm PT: AMA with researchers + engineers from our Olmo & Molmo teams, hosted by r/LocalLLaMA.
💬 Ask your questions now—we’ll start answering when the AMA begins!

December 15, 2025 at 10:25 PM

Ai2

@ai2.bsky.social

Introducing Bolmo, a new family of byte-level language models built by "byteifying" our open Olmo 3—and to our knowledge, the first fully open byte-level LM to match or surpass SOTA subword models across a wide range of tasks. 🧵

December 15, 2025 at 5:19 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news