Lightnews — Scholar-powered news

kevin-wang3290.bsky.social @kevin-wang3290.bsky.social · Mar 20

9/ Thanks to amazing collaborators @IJ_Apps, @m_bortkiewicz @tomasztrzcinsk1, and @ben_eysenbach. Please check out our paper and website for more details!
Paper: arxiv.org/abs/2503.14858
Website: wang-kevin3290.github.io/scaling-crl/
Code: github.com/wang-kevin32...

1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities

Scaling up self-supervised learning has driven breakthroughs in language and vision, yet comparable progress has remained elusive in reinforcement learning (RL). In this paper, we study building block...

arxiv.org

kevin-wang3290.bsky.social @kevin-wang3290.bsky.social · Mar 20

8/ One negative result: In preliminary experiments using OGBench, we evaluated depth scaling in offline goal-conditioned RL. We found that increasing network depth didn’t improve offline performance, hinting that its benefits online may partly arise from enhanced exploration.

1

kevin-wang3290.bsky.social @kevin-wang3290.bsky.social · Mar 20

7/ Prior work has found success in scaling model width. In our experiments, we also find that scaling width is helpful in CRL’s performance, but depth achieves greater performance and better parameter-efficiency (i.e. similar performance for 50× smaller models)

1

kevin-wang3290.bsky.social @kevin-wang3290.bsky.social · Mar 20

6/ We started this project by studying a different axis of scaling, and initially found that scaling batch size had little effect on performance. However, when revisiting the experiment, we found that scaling batch size can significantly improve performance if using deep networks

1

kevin-wang3290.bsky.social @kevin-wang3290.bsky.social · Mar 20

5/ Scaling network depth also yields improved generalization capabilities (stitching). When tested on start-goal pairs unseen during training, deeper networks succeeded on a higher fraction of tasks as compared with shallower networks.

1

kevin-wang3290.bsky.social @kevin-wang3290.bsky.social · Mar 20

4/ Deeper networks learn better contrastive representations. In this navigation task, Depth-4 networks naively approximate Q-values using Euclidean distance to the goal, while Depth-64 is able to capture the maze topology with high Q-values outlining the viable path.

1

kevin-wang3290.bsky.social @kevin-wang3290.bsky.social · Mar 20

3/ Scaling benefits are higher in complex tasks with high-dimensional inputs. In the Humanoid U-Maze environment where scaling effects were most prominent, we tested the limits of scaling and observed continued performance gains up to 1024 layers!

1

kevin-wang3290.bsky.social @kevin-wang3290.bsky.social · Mar 20

2/ As we scale network depth, novel behaviors emerge: at depth 4, the Humanoid simply falls toward the goal, while at depth 16 it walks upright. At depth 256 in the Humanoid U-Maze environment, a unique learned policy emerges: the agent learns to propel itself over the maze wall.

1

kevin-wang3290.bsky.social @kevin-wang3290.bsky.social · Mar 20

1/ While most RL methods use shallow MLPs (~2–5 layers), we show that scaling up to 1000-layers for contrastive RL (CRL) can significantly boost performance, ranging from doubling performance to 50x on a diverse suite of robotic tasks.

Webpage+Paper+Code: wang-kevin3290.github.io/scaling-crl/

1 2