Lightnews — Scholar-powered news

Gaspard Lambrechts @gsprd.be · 2d

Interestingly, this distribution can be learned from off-policy samples with a TD-like update. And this, even when encouraging the visitation of features of future states (possibly aliased).

arxiv.org/abs/2412.06655

Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures

Maximum entropy reinforcement learning integrates exploration into policy learning by providing additional intrinsic rewards proportional to the entropy of some distribution. In this paper, we propose...

arxiv.org

1

Gaspard Lambrechts @gsprd.be · 2d

4) Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures.

With Adrien Bolland and Damien Ernst, we propose a new intrinsic reward. Instead of encouraging visiting states uniformly, we encourage visiting *future* states uniformly, from every state.

1 2

Gaspard Lambrechts @gsprd.be · 2d

This view offers interesting insights for the design of intrinsic rewards, by providing four criteria.

arxiv.org/abs/2402.00162

Behind the Myth of Exploration in Policy Gradients

In order to compute near-optimal policies with policy-gradient algorithms, it is common in practice to include intrinsic exploration terms in the learning objective. Although the effectiveness of thes...

arxiv.org

1 1

Gaspard Lambrechts @gsprd.be · 2d

3) Behind the Myth of Exploration in Policy Gradients.

With Adrien Bolland and Damien Ernst, we decided to frame the exploration problem for policy-gradient methods from the optimization point of view.

1 1

Gaspard Lambrechts @gsprd.be · 2d

By adapting a finite-time bound, we uncover an interesting tradeoff between informativeness of the additional information and complexity of the resulting value function.

openreview.net/forum?id=wNV...

Informed Asymmetric Actor-Critic: Theoretical Insights and Open...

Reinforcement learning in partially observable environments requires agents to make decisions under uncertainty, based on incomplete and noisy observations. Asymmetric actor-critic methods improve...

openreview.net

1 2

Gaspard Lambrechts @gsprd.be · 2d

2) Informed Asymmetric Actor-Critic: Theoretical Insights and Open Questions.

With Daniel Ebi and Damien Ernst, we looked for a reason why asymmetric actor-critic was performing better, even when using RNN-based policies with the full observation history as input (no aliasing).

1 1

Gaspard Lambrechts @gsprd.be · 2d

In AsymAC, while the policy maintains an agent state based on observations only, the critic also takes the state as input. Its better performance is linked to eventual "aliasing" in the agent state, hurting TD learning in the symmetric case only.

arxiv.org/abs/2501.19116

A Theoretical Justification for Asymmetric Actor-Critic Algorithms

In reinforcement learning for partially observable environments, many successful algorithms have been developed within the asymmetric learning paradigm. This paradigm leverages additional state inform...

arxiv.org

1 2

Gaspard Lambrechts @gsprd.be · 2d

1) A Theoretical Justification for Asymmetric Actor-Critic Algorithms.

With Damien Ernst and Aditya Mahajan, we looked for a reason why asymmetric actor-critic algorithms are performing better than their symmetric counterparts.

1 1

Gaspard Lambrechts @gsprd.be · 2d

At #EWRL, we presented 4 papers, which we summarize below.

- A Theoretical Justification for AsymAC Algorithms.
- Informed AsymAC: Theoretical Insights and Open Questions.
- Behind the Myth of Exploration in Policy Gradients.
- Off-Policy MaxEntRL with Future State-Action Visitation Measures.

1 1 4

Reposted by Gaspard Lambrechts

Théo Vincent @theo-vincent.bsky.social · Jul 19

Had an amazing time presenting my research @cohereforai.bsky.social yesterday 🎤

In case you could not attend, feel free to check it out 👉

youtu.be/RCA22JWiiY8?...

Théo Vincent - Optimizing the Learning Trajectory of Reinforcement Learning Agents

YouTube video by Cohere

youtu.be

3 7

Reposted by Gaspard Lambrechts

Claire Vernade @claireve.bsky.social · Jul 17

Such an inspiring talk by @arkrause.bsky.social at #ICML today. The role of efficient exploration in Scientific discovery is fundamental and I really like how Andreas connects the dots with RL (theory).

2 15

Gaspard Lambrechts @gsprd.be · Jul 16

At #ICML2025, we will present a theoretical justification for the benefits of « asymmetric actor-critic » algorithms (#W1008 Wednesday at 11am).

📝 Paper: hdl.handle.net/2268/326874
💻 Blog: damien-ernst.be/2025/06/10/a...

ICML poster of the paper « A Theoretical Justification for Asymmetric Actor-Critic Algorithms » by Gaspard Lambrechts, Damien Ernst and Aditya Mahajan.

3 8

Reposted by Gaspard Lambrechts

Riccardo Zamboni @ricczamboni.bsky.social · Jul 8

🌟🌟Good news for the explorers🗺️!
Next week we will present our paper “Enhancing Diversity in Parallel Agents: A Maximum Exploration Story” with V. De Paola, @mircomutti.bsky.social and M. Restelli at @icmlconf.bsky.social!
(1/N)

1 1 4

Gaspard Lambrechts @gsprd.be · Jul 11

Last week, I gave an invited talk on "asymmetric reinforcement learning" at the BeNeRL workshop. I was happy to draw attention to this niche topic, which I think can be useful to any reinforcement learning researcher.

Slides: hdl.handle.net/2268/333931.

3 6

Gaspard Lambrechts @gsprd.be · Jun 13

Two months after my PhD defense on RL in POMDP, I finally uploaded the final version of my thesis :)

You can find it here: hdl.handle.net/2268/328700 (manuscript and slides).

Many thanks to my advisors and to the jury members.