📢 New Paper!
We replace the entropy bonus in PPO with a *complexity* bonus, encouraging structured and stochastic policies that are robust to different scaling factors and can work in environments with variable exploration needs.
Read more:
arxiv.org/abs/2509.20509w/
@mircomusolesi.bsky.social