Lightnews — Scholar-powered news

Nico Bohlinger @nicobohlinger.bsky.social · 6d

Many thanks to @jan-peters.bsky.social and @ias-tudarmstadt.bsky.social for all the support!

Nico Bohlinger @nicobohlinger.bsky.social · 6d

If you are interested in massive multi-embodiment learning, come and chat with me at:
- Today, WS Sim-to-Real Transfer for Humanoid Robots
at Humanoids2025
- Oct 20th, WS Foundation Models for Robotic Design
at IROS2025
- Oct 24th, WS Reconfigurable Modular Robots
at IROS2025

1 1

Nico Bohlinger @nicobohlinger.bsky.social · 6d

⚡️ Can one unified policy control 10 million different robots and zero-shot transfer to completely unseen robots, even humanoids?

🔗 Yes! Checkout our paper: arxiv.org/abs/2509.02815

1 2

Nico Bohlinger @nicobohlinger.bsky.social · 8d

1

Nico Bohlinger @nicobohlinger.bsky.social · 8d

👏 Huge thanks to everyone involved:
Bo Ai, Liu Dai, Dichen Li, Tongzhou Mu, Zhanxin Wu, K. Fay, Henrik I. Christensen, @jan-peters.bsky.social and Hao Su

2

Nico Bohlinger @nicobohlinger.bsky.social · 8d

🇰🇷 Conferences are about finally meeting your collaborators from all around the world!

Check out our work on Embodiment Scaling Laws @CoRL2025
We investigate cross-embodiment learning as the next axis of scaling for truly generalist policies 📈

🔗 All details: embodiment-scaling-laws.github.io

2 2 9

Reposted by Nico Bohlinger

Antonin Raffin @araffin.bsky.social · Jul 7

Need for Speed or: How I Learned to Stop Worrying About Sample Efficiency

Part II of my blog series "Getting SAC to Work on a Massive Parallel Simulator" is out!
I've included everything I tried that didn't work (and why Jax PPO was different from PyTorch PPO)

araffin.github.io/post/tune-sa...

Getting SAC to Work on a Massive Parallel Simulator: Tuning for Speed (Part II) | Antonin Raffin | Homepage

This second post details how I tuned the Soft-Actor Critic (SAC) algorithm to learn as fast as PPO in the context of a massively parallel simulator (thousands of robots simulated in parallel).

araffin.github.io

4 8 35

Nico Bohlinger @nicobohlinger.bsky.social · Jul 2

Or come to my talk @ International Symposium on Adaptive Motion of Animals and Machines and LokoAssist Symposium (AMAM) on Friday at TU Darmstadt

Thanks to @ias-tudarmstadt.bsky.social, @jan-peters.bsky.social

3

Nico Bohlinger @nicobohlinger.bsky.social · Jul 2

If you want to know how to create a neural network architecture to train one policy to control any robot embodiment, check out: nico-bohlinger.github.io/one_policy_t...

One Policy to Run Them All

nico-bohlinger.github.io

1 2

Nico Bohlinger @nicobohlinger.bsky.social · Jul 2

Robot Randomization is fun!

1 3 8

Nico Bohlinger @nicobohlinger.bsky.social · Jun 13

🦾 By combining EPVFs with massive parallelism and careful regularization, we close the gap with state-of-the-art DRL in complex environments.
🔗 Full paper: arxiv.org/abs/2502.11949
✨ Finally, many thanks to @jan-peters.bsky.social and
@ias-tudarmstadt.bsky.social for the support!

Massively Scaling Explicit Policy-conditioned Value Functions

We introduce a scaling strategy for Explicit Policy-Conditioned Value Functions (EPVFs) that significantly improves performance on challenging continuous-control tasks. EPVFs learn a value function V(...

arxiv.org

2

Nico Bohlinger @nicobohlinger.bsky.social · Jun 13

⚙️ Architecture matters
We also explored architectures like Universal Neural Functionals (UNF) and action-based representations ("Probing").
And yes, our scaled EPVFs are competitive with PPO and SAC in their final performance.

1 1

Nico Bohlinger @nicobohlinger.bsky.social · Jun 13

⚡ But it's not just about size
Key ingredients for stability and performance are weight clipping and using uniform noise scaled to the parameter magnitudes.
Our ablation studies show just how critical these components are. Without them, performance collapses.

1 1

Nico Bohlinger @nicobohlinger.bsky.social · Jun 13

📈 Massive Scaling Pays Off
We see strong scaling effects when using MJX to rollout up to 4000 differently perturbed policies in parallel.
This explores the policy space effectively and large batches drastically reduce the variance of the resulting gradients.

1 1

Nico Bohlinger @nicobohlinger.bsky.social · Jun 13

🧠 Simple & Powerful RL
This unlocks fully off-policy learning and policy parameter space exploration using any policy data and leads to the probably most simple DRL algorithm one can imagine:

1 1

Nico Bohlinger @nicobohlinger.bsky.social · Jun 13

🔍 What are EPVFs?
Imagine a value function that understands the policy's parameters directly: V(θ).
This allows for direct, gradient-based policy updates:

1 1

Nico Bohlinger @nicobohlinger.bsky.social · Jun 13

🚀 Checkout our new work at @rldmdublin2025.bsky.social today at poster#16!
We're showing how to make Explicit Policy-conditioned Value Functions V(θ) (originating from Faccio & Schmidhuber) work for more complex control tasks. The secret? Massive scaling!

1 1 8

Reposted by Nico Bohlinger

Intelligent Autonomous Systems @ias-tudarmstadt.bsky.social · Jun 12

IAS is at RLDM 2025! We have many exiting works to share (see 👇), so come to our posters and talk to us!

4 3 3

Nico Bohlinger @nicobohlinger.bsky.social · Mar 18

Many thanks to my colleagues and collaborators: Daniel Palenicek, Łukasz Antczak, @jan-peters.bsky.social and most importantly Jonathan Kinzel (@ibims1jfk.bsky.social), who interned at MAB Robotics and did the experiments.
Also thanks to MAB Robotics for providing the hardware and constant support!

1 4

Nico Bohlinger @nicobohlinger.bsky.social · Mar 18

We build on the efficient CrossQ DRL algorithm and combine it with two control architectures — Joint Target Prediction for agile maneuvers and Central Pattern Generators for stable, natural gaits — to train locomotion policies directly on the HoneyBadger quadruped robot from MAB Robotics.

1 1 5

Nico Bohlinger @nicobohlinger.bsky.social · Mar 18

Intrigued? Check out the paper and videos here: nico-bohlinger.github.io/gait_in_eigh...

Gait in Eight

nico-bohlinger.github.io

1 1 3

Nico Bohlinger @nicobohlinger.bsky.social · Mar 18

⚡️ Do you think training robot locomotion needs large scale simulation? Think again!

We train an omnidirectional locomotion policy directly on a real quadruped in just a few minutes 🚀
Top speeds of 0.85 m/s, two different control approaches, indoor and outdoor experiments, and more! 🤖🏃‍♂️

1 2 9

Nico Bohlinger @nicobohlinger.bsky.social · Mar 11

Great investigation! I hope that we can finally bridge the gap and combine the algorithmic research / advances in off-policy RL and applied robot learning with scaling on-policy RL

1

Reposted by Nico Bohlinger

Antonin Raffin @araffin.bsky.social · Mar 10

"As researchers, we tend to publish only positive results, but I think a lot of valuable insights are lost in our unpublished failures."

New blog post: Getting SAC to Work on a Massive Parallel Simulator (part I)

araffin.github.io/post/sac-mas...

Getting SAC to Work on a Massive Parallel Simulator: An RL Journey With Off-Policy Algorithms (Part I) | Antonin Raffin | Homepage

This post details how I managed to get the Soft-Actor Critic (SAC) and other off-policy reinforcement learning algorithms to work on massively parallel simulators (think Isaac Sim with thousands of ro...

araffin.github.io

5 10 29