Lucas Prieto
lucas-prieto.bsky.social
Lucas Prieto
@lucas-prieto.bsky.social
PhD Student at Imperial
I am excited to share our new paper: “Grokking at the Edge of Numerical stability”!
We show that floating point errors in the Softmax play a surprising role in grokking, explaining among other things, why weight decay seems necessary for grokking in most cases!
🧵
January 10, 2025 at 3:52 PM