https://vincentherrmann.github.io
📄 Paper: arxiv.org/abs/2503.13431
💻 Code: github.com/vincentherrm...
🍁 ICML: icml.cc/virtual/2025...
📄 Paper: arxiv.org/abs/2503.13431
💻 Code: github.com/vincentherrm...
🍁 ICML: icml.cc/virtual/2025...
We introduce the PHi (Prediction of Hidden states) layer and PHi Loss. High PHi loss means the model's hidden state is complex and unpredictable—a sign of interesting computation.
We introduce the PHi (Prediction of Hidden states) layer and PHi Loss. High PHi loss means the model's hidden state is complex and unpredictable—a sign of interesting computation.
A model predicting random static (the noisy TV problem 📺) has high loss but isn't doing any interesting work.
Other kinds of data might be accurately predictable, but only after difficult computation🤔
A model predicting random static (the noisy TV problem 📺) has high loss but isn't doing any interesting work.
Other kinds of data might be accurately predictable, but only after difficult computation🤔