Lightnews — Scholar-powered news

Reposted by Marcel Hussing

Claas Voelcker @cvoelcker.bsky.social · 15d

I have been told I need to get more modern in my paper promotion! github.com/cvoelcker/reppo / arxiv.org/abs/2507.11019 @marcelhussing.bsky.social

Happy guy sad guy meme with sad text: USE PPO AND TUNE HYPERPARAMETER FOR WEEKS and happy text: USE REPPO AND GET A POLICY

1 2 10

Marcel Hussing @marcelhussing.bsky.social · Sep 11

Super stoked for the New York RL workshop tomorrow. Will be presenting 2 orals:
* Replicable Reinforcement Learning with Linear Function Approximation
* Relative Entropy Pathwise Policy Optimization

We already posted about the 2nd one (below), I'll get to talking about the first one in a bit here.

Claas Voelcker @cvoelcker.bsky.social · Jul 17

🔥 Presenting Relative Entropy Pathwise Policy Optimization #REPPO 🔥
Off-policy #RL (eg #TD3) trains by differentiating a critic, while on-policy #RL (eg #PPO) uses Monte-Carlo gradients. But is that necessary? Turns out: No! We show how to get critic gradients on-policy. arxiv.org/abs/2507.11019

GIF showing two plots that symbolize the REPPO algorithm. On the left side, four curves track the return of an optimization function, and on the right side, the optimization paths over the objective function are visualized. The GIF shows that monte-carlo gradient estimators have a high variance and fail to converge, while surrogate function estimators converge smoothly, but might find suboptimal solutions if the surrogate function is imprecise.

2 5

Marcel Hussing @marcelhussing.bsky.social · Aug 23

arxiv.org/abs/2207.04136 we always wondered how to discover the factored structure if not given. It's an intriguing question for which I have a few ideas but so far too little time.

1

Marcel Hussing @marcelhussing.bsky.social · Aug 16

(Maybe) unpopular opinion: There should not be *any* new experiments in a rebuttal. A rebuttal is for clarifications and incorrect statements in a review. You should not be allowed to add new content at that point. Either your paper is done or it isn't. It should not be written during rebuttals.

Eugene Vinitsky 🍒 @eugenevinitsky.bsky.social · Aug 14

An inherent problem with asking for experiments in a rebuttal is that running a baseline in a week is highly likely to be sloppy work

4

Marcel Hussing @marcelhussing.bsky.social · Aug 7

New ChatGPT data just dropped

7 39

Marcel Hussing @marcelhussing.bsky.social · Jul 17

My PhD journey started with me fine-tuning hparams of PPO which ultimately led to my research on stability. With REPPO, we've made a huge step in the right direction. Stable learning, no tuning on a new benchmark, amazing performance. REPPO has the potential to be the PPO killer we all waited for.

Claas Voelcker @cvoelcker.bsky.social · Jul 17

🔥 Presenting Relative Entropy Pathwise Policy Optimization #REPPO 🔥
Off-policy #RL (eg #TD3) trains by differentiating a critic, while on-policy #RL (eg #PPO) uses Monte-Carlo gradients. But is that necessary? Turns out: No! We show how to get critic gradients on-policy. arxiv.org/abs/2507.11019

2 7

Reposted by Marcel Hussing

Claas Voelcker @cvoelcker.bsky.social · Jul 17

🔥 Presenting Relative Entropy Pathwise Policy Optimization #REPPO 🔥
Off-policy #RL (eg #TD3) trains by differentiating a critic, while on-policy #RL (eg #PPO) uses Monte-Carlo gradients. But is that necessary? Turns out: No! We show how to get critic gradients on-policy. arxiv.org/abs/2507.11019

2 7 26

Reposted by Marcel Hussing

Claas Voelcker @cvoelcker.bsky.social · Jun 19

Works that use #VAML/ #MuZero losses often use deterministic models. But if we want to use stochastic models to measure uncertainty or because we want to leverage current SOTA models such as #transformers and #diffusion, we need to take care! Naively translating the loss functions leads to mistakes!

Anastasiia Pedan @pedanana.bsky.social · Jun 19

Would you be surprised to learn that many empirical implementations of value-aware model learning (VAML) algos, including MuZero, lead to incorrect model & value functions when training stochastic models 🤕? In our new @icmlconf.bsky.social 2025 paper, we show why this happens and how to fix it 🦾!

1 4 7

Reposted by Marcel Hussing

Dylan Foster 🐢 @djfoster.bsky.social · May 26

Dhruv Rohatgi will be giving a lecture on our recent work on comp-stat tradeoffs in next-token prediction at the RL Theory virtual seminar series (rl-theory.bsky.social) tomorrow at 2pm EST! Should be a fun talk---come check it out!!

1 5 11

Marcel Hussing @marcelhussing.bsky.social · May 26

Just arrived in Montreal for my internship at FAIR. So far Montreal has been amazing, great walkable areas, good food and nice people! Although I must say I have to get used to being addressed in French 😅

6

Marcel Hussing @marcelhussing.bsky.social · Apr 25

We'll be presenting our work on Oracle-Efficient Reinforcement Learning for Max Value Ensembles at the RL theory seminar! Been following this series for a while, super excited we get to present some of our work. 🥳

RL Theory Virtual Seminars @rl-theory.bsky.social · Apr 16

Last seminars before the summer break:

04/29: Max Simchowitz (CMU)
05/06: Jeongyeol Kwon (Univ. of Widsconsin-Madison)
05/20: Sikata Sengupta & Marcel Hussing (Univ. of Pennsylvania)
05/27: Dhruv Rohatgi (MIT)
06/03: David Janz (Univ. of Oxford)
06/10: Nneka Okolo (MIT)

1 7

Reposted by Marcel Hussing

Amir-massoud Farahmand @sologen.bsky.social · Apr 24

Many great papers from Mila!
Two by my team at the Adaptive Agents Lab (Adage) together with collaborators:

A Truncated Newton Method for Optimal Transport
openreview.net/forum?id=gWr...

MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
openreview.net/forum?id=6Rt...

#ICLR2025

Mila - Institut québécois d'IA @mila-quebec.bsky.social · Apr 23

This week, Mila researchers will present more than 90 papers at @iclr-conf.bsky.social in Singapore. Every day, we will share a schedule featuring Mila-affiliated presentations.
Day 1 👇 #ICLR2025
mila.quebec/en/news/foll...

1 3 7

Reposted by Marcel Hussing

collasconf.bsky.social @collasconf.bsky.social · Feb 20

📢 Deadline Extension Alert! 📢

Good news! We’re extending the #CoLLAs2025 submission deadlines:

📝 Abstracts: Feb 26, 2025, 23:59 AoE
📄 Papers: Mar 3, 2025, 23:59 AoE

More time to refine your work—don't miss this chance to contribute to #lifelong-learning research! 🚀

🔗 lifelong-ml.cc

CoLLAs

Collas

lifelong-ml.cc

4 3

Marcel Hussing @marcelhussing.bsky.social · Feb 22

I was very hyped about this place initially, now I come here, see 5 posts about politics, unfollow 5 people and close the website. Where are the interesting AI posts?

1

Reposted by Marcel Hussing

Aaron Roth @aaroth.bsky.social · Feb 18

Can you solve group-conditional online conformal prediction with a no-regret learning algorithm? Not with vanilla regret, but -yes- with swap regret. And algorithms from the follow-the-regularized leader family (notably online gradient descent) work really well for other reasons.

1 2 20

Reposted by Marcel Hussing

Marc Lanctot @sharky6000.bsky.social · Feb 17

Bummed out about recent politics & news drowning out AI and science you want to see on Bluesky?

Well, here is a small "sky thread" (written on a ✈️) about something I recently discovered: e-values!

They are an alternative to the standard p-values as a measure of statistical significance. 1/N

2 11 41

Marcel Hussing @marcelhussing.bsky.social · Feb 11

Throwing compute at things has proven quite powerful in other domains but until recently not as much in #ReinforcementLearning.

Excited to share that out MAD-TD paper got a spotlight at #ICLR25! Check out Claas' thread on how to get the most out of your compute/data buck when training from scratch.

Claas Voelcker @cvoelcker.bsky.social · Feb 11

Do you want to get the most out of your samples, but increasing the update steps just destabilizes RL training? Our #ICLR2025 spotlight 🎉 paper shows that using the values of unseen actions causes instability in continuous state-action domains and how to combat this problem with learned models!

1 5

Marcel Hussing @marcelhussing.bsky.social · Feb 9

I agree with the notion but I don't think "things being outdated" is always bad. I'm of the opinion that we should still teach SVMs/Kernels as they teach us a different way to think about ML. PCA is still a core tool to teaching low-dim embeddings to students. We need as many tools as possible.

Marcel Hussing @marcelhussing.bsky.social · Feb 9

Are there no spotlights this year? Do we know?

Reposted by Marcel Hussing

Aaron Roth @aaroth.bsky.social · Feb 8

EC 2025 (S)PC --- lets get ready for the Super Bowl! Every time there is a first down, bid on a paper. Field goal? Bid on two. Touchdown? Bid on 5 papers (10 if its the Eagles!) At the halftime show enter your topic preferences and conflicts. Lets go birds!

3 1 17

Marcel Hussing @marcelhussing.bsky.social · Feb 8

It's called exploratory preference optimization arxiv.org/abs/2405.21046 by @djfoster.bsky.social and others :)

Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF

Reinforcement learning from human feedback (RLHF) has emerged as a central tool for language model alignment. We consider online exploration in RLHF, which exploits interactive access to human or AI f...

arxiv.org

1 4

Reposted by Marcel Hussing

Reinforcement Learning Conference @rl-conference.bsky.social · Feb 8

🚨🚨 RLC deadline has been extended by a week! Abstract deadline is Feb. 21 with a paper deadline of Feb. 28 🚨🚨. Please spread the word!

1 13 25

Marcel Hussing @marcelhussing.bsky.social · Feb 8

This is huge, I might be able to make it now woohoo

1

Marcel Hussing @marcelhussing.bsky.social · Jan 31

6

Marcel Hussing @marcelhussing.bsky.social · Jan 29

What a future work section should be:
Oh, and here is this interesting and hard open problem that someone should solve.

Future work sections in empirical ML papers:
We leave hyperparameter optimization for future work.

1 7