@kevin-wang3290.bsky.social
3 followers 1 following 9 posts
Posts Media Videos Starter Packs
kevin-wang3290.bsky.social
8/ One negative result: In preliminary experiments using OGBench, we evaluated depth scaling in offline goal-conditioned RL. We found that increasing network depth didn’t improve offline performance, hinting that its benefits online may partly arise from enhanced exploration.
kevin-wang3290.bsky.social
7/ Prior work has found success in scaling model width. In our experiments, we also find that scaling width is helpful in CRL’s performance, but depth achieves greater performance and better parameter-efficiency (i.e. similar performance for 50× smaller models)
kevin-wang3290.bsky.social
6/ We started this project by studying a different axis of scaling, and initially found that scaling batch size had little effect on performance. However, when revisiting the experiment, we found that scaling batch size can significantly improve performance if using deep networks
kevin-wang3290.bsky.social
5/ Scaling network depth also yields improved generalization capabilities (stitching). When tested on start-goal pairs unseen during training, deeper networks succeeded on a higher fraction of tasks as compared with shallower networks.
kevin-wang3290.bsky.social
4/ Deeper networks learn better contrastive representations. In this navigation task, Depth-4 networks naively approximate Q-values using Euclidean distance to the goal, while Depth-64 is able to capture the maze topology with high Q-values outlining the viable path.
kevin-wang3290.bsky.social
3/ Scaling benefits are higher in complex tasks with high-dimensional inputs. In the Humanoid U-Maze environment where scaling effects were most prominent, we tested the limits of scaling and observed continued performance gains up to 1024 layers!
kevin-wang3290.bsky.social
2/ As we scale network depth, novel behaviors emerge: at depth 4, the Humanoid simply falls toward the goal, while at depth 16 it walks upright. At depth 256 in the Humanoid U-Maze environment, a unique learned policy emerges: the agent learns to propel itself over the maze wall.
kevin-wang3290.bsky.social
1/ While most RL methods use shallow MLPs (~2–5 layers), we show that scaling up to 1000-layers for contrastive RL (CRL) can significantly boost performance, ranging from doubling performance to 50x on a diverse suite of robotic tasks.

Webpage+Paper+Code: wang-kevin3290.github.io/scaling-crl/