Oren Neumann
orenneumann.bsky.social
Oren Neumann
@orenneumann.bsky.social
Doing RL on autonomous driving, supply chains and board games. Physics PhD from Goethe Uni Frankfurt.
There are quite a few papers on supply chain management with RL, although only on toy problems. I'm currently writing a paper on doing it with real supply chains.
January 14, 2025 at 7:30 AM
Is it all related to dormant neurons or is there other literature on why RL struggles with plasticity?
arxiv.org/abs/2302.12902
The Dormant Neuron Phenomenon in Deep Reinforcement Learning
In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing number of inactive neurons, thereby affecting network express...
arxiv.org
January 10, 2025 at 1:19 PM
Read the full paper for more details and results: 'AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws'. ⚔️
arxiv.org/abs/2412.11979
Big thanks to
@ericjmichaud.bsky.social
for sharing his wisdom! this all started with our hallway chat in ICLR 😄
X:
x.com/neumann_oren...
AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws
Neural scaling laws are observed in a range of domains, to date with no clear understanding of why they occur. Recent theories suggest that loss power laws arise from Zipf's law, a power law observed ...
arxiv.org
December 19, 2024 at 2:17 PM
There is: in those games, larger models improve overall accuracy by focusing on late-game positions, forgetting what they learned on opening positions. This directly harms performance, since mastering openings is crucial, while wrapping up a game can be done with blind MCTS.
December 19, 2024 at 2:17 PM
AlphaZero doesn't always scale nicely. On some games, Elo goes up, then sharply degrades w/ model size. We noticed this happens in games where game rules bend the Zipf curve, since end-game board positions have a high frequency. Is there a connection?
December 19, 2024 at 2:17 PM
In line with the quantization model, we see that AlphaZero agents fit board states in decreasing order of frequency. This is very surprising: high-frequency opening moves are exponentially harder to model, since they depend on downstream positions.
December 19, 2024 at 2:17 PM
There is! Chess/Go tournament games famously follow Zipf's law: the frequency of each board position scales as a power of their rank.
We find that Zipf's law emerges also in RL self-play games. It's a direct result of universal board-game rules.
December 19, 2024 at 2:17 PM
The quantization model suggests that LLM power-law scaling results from the Zipf's law of natural language:
arxiv.org/abs/2303.13506
In RL, AlphaZero has one of the few examples of power-law scaling:
arxiv.org/abs/2210.00849
But is there a Zipf's law in board games?
The Quantization Model of Neural Scaling
We propose the Quantization Model of neural scaling laws, explaining both the observed power law dropoff of loss with model and data size, and also the sudden emergence of new capabilities with scale....
arxiv.org
December 19, 2024 at 2:17 PM