Lightnews — Scholar-powered news

Costa Huang

@vwxyzjn.bsky.social

Congrats on the launch!

Eugene Vinitsky 🍒 @eugenevinitsky.bsky.social · Oct 2

We're finally out of stealth: percepta.ai
We're a research / engineering team working together in industries like health and logistics to ship ML tools that drastically improve productivity. If you're interested in ML and RL work that matters, come join us 😀

Percepta | A General Catalyst Transformation Company

Transforming critical institutions using applied AI. Let's harness the frontier.

percepta.ai

October 2, 2025 at 5:43 PM

Costa Huang

@vwxyzjn.bsky.social

🥘 Excited to share our latest OLMo 1 B models! Almost summer RL time. We did another two-stage RL:
* The first RLVR run uses allenai/RLVR-GSM-MATH-IF-Mixed-Constraints
* The final RLVR run uses allenai/RLVR-MATH for targeted MATH improvement

Short 🧵

May 1, 2025 at 1:21 PM

Costa Huang

@vwxyzjn.bsky.social

Introducing OLMo-2-0325-32B-Instruct! It's the spring RL curve time. This time, we used GRPO for RLVR and trained a pretty nice fully open source model!

March 13, 2025 at 7:19 PM

Costa Huang

@vwxyzjn.bsky.social

🔥 allenai/Llama-3.1-Tulu-3-8B (trained with PPO) -> allenai/Llama-3.1-Tulu-3.1-8B (trained with GRPO)

We are happy to "quietly" release our latest GRPO-trained Tulu 3.1 model, which is considerably better in MATH and GSM8K!

February 12, 2025 at 5:33 PM

Costa Huang

@vwxyzjn.bsky.social

🤯 Check out our new iOS OLMoE app that runs the model on-device!

We also trained new OLMoE-1B-7B-0125 this time using the Tulu 3 recipe. Very exciting that RLVR improved gsm8k by almost 10 points for OLMoE 🔥

A quick 🧵

February 11, 2025 at 3:30 PM

Costa Huang

@vwxyzjn.bsky.social

I nerd-snipped myself over the @deepseek.bsky.social GRPO's usage of John Schulman's kl3 estimator. I can now see why:

When directly minimizing the KL loss, kl3 just appears much more numerically stable. And the >0 guarantee here is also really nice (kl1 could go negative).

January 31, 2025 at 3:21 PM

Costa Huang

@vwxyzjn.bsky.social

We released the OLMo 2 report! Ready for some more RL curves? 😏

This time, we applied RLVR iteratively! Our initial RLVR checkpoint on the RLVR dataset mix shows a low GSM8K score, so we did another RLVR on GSM8K only and another on MATH only 😆.

And it works! A thread 🧵 1/N

January 6, 2025 at 6:34 PM

Reposted by Costa Huang

Nathan Lambert

@natolambert.bsky.social

Maybe my favorite unexpected crossover model from the community.
SmolLM from HuggingFace trained with part of the Tülu 3 recipe we release a month ago :D.

Cool numerical explorations of post-training stuff. Nothing crazy.

SultanR/SmolTulu-1.7b-Instruct
https://buff.ly/41Epauv

December 16, 2024 at 12:47 AM

Reposted by Costa Huang

Valentina Pyatkin

@valentinapy.bsky.social

I'll be attending NeurIPS in Vancouver next week!

I would love to chat about LLM post-training research, the Faculty job market and anything in between - ping me if you'd like to meet up!

You can also find me at the following:

December 1, 2024 at 11:56 PM

Costa Huang

@vwxyzjn.bsky.social

So happy OLMo 2 is out! We applied the same Tülu 3 RLVR recipe and it worked very nicely for our final 13B instruct model.
Here are the gains/losses of allenai/OLMo-2-1124-13B-Instruct (RLVR's checkpoint) over allenai/OLMo-2-1124-13B-DPO. More to share soon!

November 26, 2024 at 11:19 PM

Reposted by Costa Huang

Ai2

@ai2.bsky.social

Meet OLMo 2, the best fully open language model to date, including a family of 7B and 13B models trained up to 5T tokens. OLMo 2 outperforms other fully open models and competes with open-weight models like Llama 3.1 8B — As always, we released our data, code, recipes and more 🎁

The OLMo 2 models sit at the Pareto frontier of training FLOPs vs model average performance.

November 26, 2024 at 8:51 PM

Reposted by Costa Huang

vmoens

@vmoens.bsky.social

One of my fav projects: LeanRL, a simple RL library that provides recipes for fast RL training using torch.compile and cudagraphs.
Using these, we got >6x speed-ups compared to the original CleanRL implementations.
github.com/pytorch-labs...

November 22, 2024 at 6:38 AM

Costa Huang

@vwxyzjn.bsky.social

👀 @araffin.bsky.social says he can’t @ me. What’s going on? @bsky.app

Antonin Raffin @araffin.bsky.social · Nov 23

@qgallouedec.bsky.social for SB3 and trl
Costa huang for clean rl (i couldn't manage to @ him...)
bsky.app/profile/vwxy...

bsky.app

November 23, 2024 at 8:44 PM

Reposted by Costa Huang

Nathan Lambert

@natolambert.bsky.social

I've spent the last two years scouring all available resources on RLHF specifically and post training broadly. Today, with the help of a totally cracked team, we bring you the fruits of that labor — Tülu 3, an entirely open frontier model post training recipe. We beat Llama 3.1 Instruct.

Thread.

November 21, 2024 at 5:01 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news