Lightnews — Scholar-powered news

Natasha Jaques @natashajaques.bsky.social · 5d

Instead of behavior cloning, what if you asked an LLM to write code to describe how an agent was acting, and used this to predict their future behavior?

Our new paper "Modeling Others' Minds as Code" shows this outperforms BC by 2x, and reaches human-level performance in predicting human behavior.

Kunal Jha @kjha02.bsky.social · 6d

Forget modeling every belief and goal! What if we represented people as following simple scripts instead (i.e "cross the crosswalk")?

Our new paper shows AI which models others’ minds as Python code 💻 can quickly and accurately predict human behavior!

shorturl.at/siUYI%F0%9F%...

2 12

Natasha Jaques @natashajaques.bsky.social · Aug 1

My husband presenting his work on caregiving 😍

samuel mehr @mehr.nz · Jul 31

lol this may be the most cogsci cogsci slide I've ever seen, from @maxkw.bsky.social

"before I got married I had six theories about raising children, now I have six kids and no theories"......but here's another theory #cogsci2025

11

Natasha Jaques @natashajaques.bsky.social · Jul 8

By optimizing for intrinsic curiosity, the LLM learns how to ask a series of questions over the course of the conversation to improve the accuracy of its user model. This generates conversations which reveal significantly more information about the user.

4

Natasha Jaques @natashajaques.bsky.social · Jul 8

Excited to release our latest paper on a new multi-turn RL objective for training LLMs to *learn how to learn* to adapt to the user. This enables it to adapt and personalize to novel users, whereas the multi-turn RLHF baseline fails to generalize effectively to new users.

yanmingwan.bsky.social @yanmingwan.bsky.social · Jul 8

Personalization methods for LLMs often rely on extensive user history. We introduce Curiosity-driven User-modeling Reward as Intrinsic Objective (CURIO) to encourage actively learning about the user within multi-turn dialogs.
📜 arxiv.org/abs/2504.03206
🌎 sites.google.com/cs.washingto...

1 3 11

Natasha Jaques @natashajaques.bsky.social · Jul 1

This work shows the benefit of RL training for improving reasoning skills when there is no possibility for data leakage. AND how continuously evolving multi-agent competition leads to the development of emergent skills that generalize to novel tasks.

9

Natasha Jaques @natashajaques.bsky.social · Jul 1

We analyze the results and find that LLMs learn emergent reasoning patterns like case-by-case analysis and expected value calculation that transfer to improve performance on math questions.

1 7

Natasha Jaques @natashajaques.bsky.social · Jul 1

In our latest paper, we discovered a surprising result: training LLMs with self-play reinforcement learning on zero-sum games (like poker) significantly improves performance on math and reasoning benchmarks, zero-shot. Whaaat? How does this work?

Bo Liu (Benjamin Liu) @benjamin-eecs.bsky.social · Jul 1

We're excited about self-play unlocking continuously improving agents. RL selects CoT patterns from LLMs. Games=perfect testing grounds.
SPIRAL: models learn via self-competition. Kuhn Poker → +8.7% math, +18.1 Minerva Math! 🃏
Paper: huggingface.co/papers/2506....
Code: github.com/spiral-rl/spiral

2 7 59

Natasha Jaques @natashajaques.bsky.social · Jun 12

Just posted a talk I gave about this work! youtu.be/mxWJ9k2XKbk

1 11

Reposted by Natasha Jaques

Natasha Jaques @natashajaques.bsky.social · Jun 12

RLHF is the main technique for ensuring LLM safety, but it provides no guarantees that they won’t say something harmful.

Instead, we use online adversarial training to achieve theoretical safety guarantees and substantial empirical safety improvements over RLHF, without sacrificing capabilities.

Mickel Liu @mickelliu.bsky.social · Jun 12

🤔Conventional LM safety alignment is reactive: find vulnerabilities→patch→repeat
🌟We propose 𝗼𝗻𝗹𝗶𝗻𝗲 𝐦𝐮𝐥𝐭𝐢-𝐚𝐠𝐞𝐧𝐭 𝗥𝗟 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 where Attacker & Defender self-play to co-evolve, finding diverse attacks and improving safety by up to 72% vs. RLHF 🧵

1 3 16

Natasha Jaques @natashajaques.bsky.social · Jun 12

RLHF is the main technique for ensuring LLM safety, but it provides no guarantees that they won’t say something harmful.

Instead, we use online adversarial training to achieve theoretical safety guarantees and substantial empirical safety improvements over RLHF, without sacrificing capabilities.

Mickel Liu @mickelliu.bsky.social · Jun 12

🤔Conventional LM safety alignment is reactive: find vulnerabilities→patch→repeat
🌟We propose 𝗼𝗻𝗹𝗶𝗻𝗲 𝐦𝐮𝐥𝐭𝐢-𝐚𝐠𝐞𝐧𝐭 𝗥𝗟 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 where Attacker & Defender self-play to co-evolve, finding diverse attacks and improving safety by up to 72% vs. RLHF 🧵

1 3 16

Reposted by Natasha Jaques

Kunal Jha @kjha02.bsky.social · Jun 9

Oral @icmlconf.bsky.social !!! Can't wait to share our work and hear the community's thoughts on it, should be a fun talk!

Can't thank my collaborators enough: @cogscikid.bsky.social y.social @liangyanchenggg @simon-du.bsky.social @maxkw.bsky.social @natashajaques.bsky.social

Kunal Jha @kjha02.bsky.social · Apr 19

Our new paper (first one of my PhD!) on cooperative AI reveals a surprising insight: Environment Diversity > Partner Diversity.

Agents trained in self-play across many environments learn cooperative norms that transfer to humans on novel tasks.

shorturl.at/fqsNN%F0%9F%...

2 9

Reposted by Natasha Jaques

Joe Barnby @joebarnby.com · Jun 11

At @rldmdublin2025.bsky.social this week? Check out our social learning workshop from @amritalamba.bsky.social and I tomorrow! Inc. talks from @natashajaques.bsky.social, @nitalon.bsky.social, @carocharp.bsky.social, @kartikchandra.bsky.social & more!

Full schedule: sites.google.com/view/rldm202...

RLDM2025SocInfWorkshop

// RLDM 2025 Workshop \\ Reinforcement learning as a model of social behaviour and inference: progress and pitfalls 12.06.2025 // 9am-1pm

sites.google.com

2 8 17

Reposted by Natasha Jaques

Christian Guckelsberger @creativeendvs.bsky.social · Jun 3

1/4 Join us and the Autotelic Interaction Research (AIR) group @aalto.fi / Finland to work on Computational Social Intrinsic Motivation (SIM) as PhD (4y) or postdoc (2y). Job ad w project description and application instructions: bit.ly/4jyNLGv. We're looking forward to learning about you!

Doctoral Researcher and Postdoc positions to work on Computational Social Intrinsic Motivation (SIM) | Aalto University

The Autotelic Interaction Research (AIR) group at the Dept. of Computer Science, Aalto University, Finland is looking for 1 Doctoral Researcher (2+2 years) and 1 Postdoc (2 years) to work on Computational Social Intrinsic Motivation (SIM)

bit.ly

1 4 10

Natasha Jaques @natashajaques.bsky.social · Apr 19

Way to go KJ for producing such an insightful paper in the first few months of your PhD!

2

Natasha Jaques @natashajaques.bsky.social · Apr 19

Human-AI cooperation is important, but existing work trains on the same 5 Overcooked layouts, creating brittle strategies.

Instead, we find that training on billions of procedurally generated tasks trains agents to learn general cooperative norms that transfer to humans... like avoiding collision

Kunal Jha @kjha02.bsky.social · Apr 19

Our new paper (first one of my PhD!) on cooperative AI reveals a surprising insight: Environment Diversity > Partner Diversity.

Agents trained in self-play across many environments learn cooperative norms that transfer to humans on novel tasks.

shorturl.at/fqsNN%F0%9F%...

1 4 17

Reposted by Natasha Jaques

Kunal Jha @kjha02.bsky.social · Apr 19

Our new paper (first one of my PhD!) on cooperative AI reveals a surprising insight: Environment Diversity > Partner Diversity.

Agents trained in self-play across many environments learn cooperative norms that transfer to humans on novel tasks.

shorturl.at/fqsNN%F0%9F%...

1 7 25

Natasha Jaques @natashajaques.bsky.social · Apr 12

Got a weird combination of mail today.

8

Natasha Jaques @natashajaques.bsky.social · Mar 28

I had a ton of fun using this as a kid. I actually made my high school English class project a giant hypercard-based video game where I drew each frame in Paint and hid buttons behind the hand-drawn elements that let you navigate the world. That was so fun...😍

2

Natasha Jaques @natashajaques.bsky.social · Mar 27

Recorded a recent "talk" / rant about RL fine-tuning of LLMs for a guest lecture in Stanford CSE234: youtube.com/watch?v=NTSY.... Covers some of my lab's recent work on personalized RLHF, as well as some mild Schmidhubering about my own early contributions to this space

Reinforcement Learning (RL) for LLMs

YouTube video by Natasha Jaques

youtube.com

5 10 51

Reposted by Natasha Jaques

Sharon @sharonk.bsky.social · Mar 12

next Canadian government should think of boosting research funding up here and trying to grab as many American postdocs and researchers as possible

630 2.2K 17K

Natasha Jaques @natashajaques.bsky.social · Feb 16

Yes! Or you could focus on developing better MARL algorithms for the corporations that let them cooperate to solve the social dilemma more effectively. Similar to MARL benchmarks like Melting Pot but for a more impactful domain

1

Reposted by Natasha Jaques

xiaoxuanh.bsky.social @xiaoxuanh.bsky.social · Feb 13

AI has shown great potential in boosting efficiency. But can it help human society make better decisions as a whole? 🤔 In this project, using MARL, we explore this by studying the impact of an ESG disclosure mandate—a highly controversial policy. (1/6)

Carrie Yuan 袁嘉仪 @yuanjiayi.bsky.social · Feb 13

In our latest work, we introduce InvestESG, a lightweight, GPU-efficient MARL environment, designed to study incentives surrounding corporate climate mitigation and climate risks. Check out the project website: sites.google.com/view/investe...

InvestESG

TLDR: We introduce InvestESG, a lightweight, GPU-efficient MARL environment simulating company and investor responses to ESG disclosure mandates, with companies and investors modeled as two types of s...

sites.google.com

1 1 2

Natasha Jaques @natashajaques.bsky.social · Feb 13

In contrast, MARL enables testing new policies with many more agents over a long time horizon.

We hope this benchmark will enable researchers in the RL and MARL communities to develop sophisticated cooperation algorithms in the context of a societally impactful problem!

4

Natasha Jaques @natashajaques.bsky.social · Feb 13

I’m really excited about this, as I think MARL provides a new tool in the toolbox for investigating this problem. Existing work on ESG disclosures focuses on empirical studies (can’t test counterfactual policies), or analytical economics models (limited to 2 players or short time intervals)

1 1 4

Natasha Jaques @natashajaques.bsky.social · Feb 13

...providing corporations with more reliable information about climate risks — and we show that this significantly improves corporations’ ability to mitigate climate change, even without the influence of investors!

1 3