Lightnews — Scholar-powered news

Reposted by Michael Noukhovitch @CoLM 2025🥯

Dane Carnegie Malenfant @dvnxmvlhdf5.bsky.social · Jun 5

Preprint Alert 🚀

Multi-agent reinforcement learning (MARL) often assumes that agents know when other agents cooperate with them. But for humans, this isn’t always the case. For example, plains indigenous groups used to leave resources for others to use at effigies called Manitokan.
1/8

Manitokan are images set up where one can bring a gift or receive a gift. 1930s Rocky Boy Reservation, Montana, Montana State University photograph. Colourized with AI

1 13 35

Michael Noukhovitch @CoLM 2025🥯 @mnoukhov.bsky.social · Apr 24

@dnllvy.bsky.social @oumarkaba.bsky.social presenting cool work at #ICLR2025 on generative models for crystals leveraging symmetry ❄️🪞, repping @mila-quebec.bsky.social

1 4

Reposted by Michael Noukhovitch @CoLM 2025🥯

Sara Vera Marjanovic @saravera.bsky.social · Apr 1

Models like DeepSeek-R1 🐋 mark a fundamental shift in how LLMs approach complex problems. In our preprint on R1 Thoughtology, we study R1’s reasoning chains across a variety of tasks; investigating its capabilities, limitations, and behaviour.
🔗: mcgill-nlp.github.io/thoughtology/

A circular diagram with a blue whale icon at the center. The diagram shows 8 interconnected research areas around LLM reasoning represented as colored rectangular boxes arranged in a circular pattern. The areas include: §3 Analysis of Reasoning Chains (central cloud), §4 Scaling of Thoughts (discussing thought length and performance metrics), §5 Long Context Evaluation (focusing on information recall), §6 Faithfulness to Context (examining question answering accuracy), §7 Safety Evaluation (assessing harmful content generation and jailbreak resistance), §8 Language & Culture (exploring moral reasoning and language effects), §9 Relation to Human Processing (comparing cognitive processes), §10 Visual Reasoning (covering ASCII generation capabilities), and §11 Following Token Budget (investigating direct prompting techniques). Arrows connect the sections in a clockwise flow, suggesting an iterative research methodology.

1 16 52

Michael Noukhovitch @CoLM 2025🥯 @mnoukhov.bsky.social · Apr 7

Hope the Llama team releases more details. Until then check out my paper on async RLHF and feel free to message me to chat about it at ICLR!

bsky.app/profile/mnou...

Michael Noukhovitch @CoLM 2025🥯 @mnoukhov.bsky.social · Mar 18

Our work on Asynchronous RLHF was accepted to #ICLR2025 ! (I was so excited to announce it, I forgot to say I was excited)

Used by @ai2.bsky.social for OLMo-2 32B 🔥
New results show ~70% speedups for LLM + RL math and reasoning 🧠

🧵below or hear my DLCT talk online on March 28!

2

Michael Noukhovitch @CoLM 2025🥯 @mnoukhov.bsky.social · Apr 7

And to reviewer 2, I guess it does work in large scale distributed training! I am really curious how they did the resource balancing to account for different computational speed

1

Michael Noukhovitch @CoLM 2025🥯 @mnoukhov.bsky.social · Apr 7

Llama 4 uses async RLHF and I would just like to announce that I called it t.co/w9qJxr944C

1 5

Michael Noukhovitch @CoLM 2025🥯 @mnoukhov.bsky.social · Mar 18

Classic Benno, hanging out with his human friends John, Ṃ̵̢͍̬̘ͧ̉͆ͤ̈͆̂ä́t̢̢̡̫̻̰͈̣͚͆͛͗̈ͭ̉̕͟ͅt̛̹̰̑̓ͭ͗h̸̷̛̛̥̱͉͎̯̻̼͕͉̻̄̅̾ͣ̉̈͌̀ͮ͋ͯ͐ͮͥ̿͛ͪ͜͠͝ẹ̱̞̬̅͂ͯ̈́̆̎ͣw̵̨̧̧̥̩͔͎̬̭͚̩͉ͤ̌͢͝, and Cͧͯ_̸̨̱͙̦͍̉̒͐͐͂͋̎̂ͬ̑͜͝h͐_̮͒͢r̸̛̳̘̠̯ͣͧͦ̏͑ͯ͡i̷̡̡͔̪̟͙͖̫̩̭̳̤͕̞͙̯͚̫̯ͭͤ̌̽͋ͯ̉ͥ́ͭͧͥͦͬ̀ͨ͌̒͢͞s̺̹͛ͭ̐͗ͤͫ́̃ͤ͢͠

4

Michael Noukhovitch @CoLM 2025🥯 @mnoukhov.bsky.social · Mar 18

Thanks again to my collaborators:
@vwxyzjn.bsky.social
@sophie-xhonneux.bsky.social
@arianh.bsky.social
Rishabh and Aaron who have not yet migrated 🦋

DMs open📲let's chat about about everything LLM + RL @ ICLR and check out
Paper 📰 arxiv.org/abs/2410.18252
Code 🧑‍💻 github.com/mnoukhov/asy...

1 2

Michael Noukhovitch @CoLM 2025🥯 @mnoukhov.bsky.social · Mar 18

We also have an appendix full of fun details like "How to make RLOO work off-policy" and "Why synchronous RLHF is not feasible in the long term" from an engineering perspective 👷🛠️
Would love critiques from any engineers working on RLHF if they feel I missed something!

1 1

Michael Noukhovitch @CoLM 2025🥯 @mnoukhov.bsky.social · Mar 18

We showed great results on RLHF but reviewers wanted reasoning + math 🧠🤔 Thanks my labmates Amirhossein and Milad, we got Rho-1B training on GSM8k!
Online DPO slightly outperforms PPO on GSM8k but more importantly 1-step Async runs 68% faster than Sync and matches performance🔥

1

Michael Noukhovitch @CoLM 2025🥯 @mnoukhov.bsky.social · Mar 18

Recap⌛️RL training of LLMs is frequently online and *on-policy* but training and generation alternate and idle while waiting for the other to finish.
We run training and generation at the same time, but now we're training on samples from a previous timestep aka *off-policy* RL!

1 1

Michael Noukhovitch @CoLM 2025🥯 @mnoukhov.bsky.social · Mar 18

Our work on Asynchronous RLHF was accepted to #ICLR2025 ! (I was so excited to announce it, I forgot to say I was excited)

Used by @ai2.bsky.social for OLMo-2 32B 🔥
New results show ~70% speedups for LLM + RL math and reasoning 🧠

🧵below or hear my DLCT talk online on March 28!

1 3 13

Michael Noukhovitch @CoLM 2025🥯 @mnoukhov.bsky.social · Feb 12

Reminds me of a very similar shift towards open science by machine learning in 1999 (jmlr.org/statement.html). Nowadays we've got really great infrastructure in the form of @openreview.bsky.social! Reach out if you're considering shifting to open science and check out jmlr.org/tmlr/ for inspo :)

Transactions on Machine Learning Research

jmlr.org

2

Michael Noukhovitch @CoLM 2025🥯 @mnoukhov.bsky.social · Feb 11

Programming using an AI assistant in order to improve AI assistants is giving me strong sci-fi vibes. Specifically Isaac Asimov, who clearly invented vibe coding in 1956 users.ece.cmu.edu/~gamvrosi/th...

2

Michael Noukhovitch @CoLM 2025🥯 @mnoukhov.bsky.social · Dec 11

I'm at #NeurIPS2024 this week if anyone wants to talk about RLHF while drinking an overpriced (but excellent) pourover coffee or tea!

1 5

Michael Noukhovitch @CoLM 2025🥯 @mnoukhov.bsky.social · Nov 23

It's actually necessary because bluesky is (now officially) federated and you're on a single instance called a PDS, and in this case bsky.social. Others exist (?) or will exist soon

A technical overview steveklabnik.com/writing/how-...
And a non-technical overview
www.theverge.com/24063290/fed...

How Does BlueSky Work?

steveklabnik.com

1 2