It's often said that hippocampal replay, which helps to build up a model of the world, is biased by reward. But the canonical temporal-difference learning requires updates proportional to reward-prediction error (RPE), not reward magnitude
1/4
rdcu.be/eRxNz
#philsci #cogsky #CognitiveNeuroscience
@phaueis.bsky.social
aktuell.uni-bielefeld.de/2025/11/24/t...
#philsci #cogsky #CognitiveNeuroscience
@phaueis.bsky.social
aktuell.uni-bielefeld.de/2025/11/24/t...
That is, in Go, your reward is 0 for most time steps and only +1/-1 at end. That sound's sparse, but not from an algorithmic perspective.
It's often said that hippocampal replay, which helps to build up a model of the world, is biased by reward. But the canonical temporal-difference learning requires updates proportional to reward-prediction error (RPE), not reward magnitude
1/4
rdcu.be/eRxNz
It's often said that hippocampal replay, which helps to build up a model of the world, is biased by reward. But the canonical temporal-difference learning requires updates proportional to reward-prediction error (RPE), not reward magnitude
1/4
rdcu.be/eRxNz
Congratulations to Matt Jones & Nathan Lepora for seeing this through to the end!
www.nature.com/articles/s41...
Congratulations to Matt Jones & Nathan Lepora for seeing this through to the end!
www.nature.com/articles/s41...
Check out our latest preprint, where we tracked the activity of the same neurons throughout early postnatal development: www.biorxiv.org/content/10.1...
see 🧵 (1/?)
Check out our latest preprint, where we tracked the activity of the same neurons throughout early postnatal development: www.biorxiv.org/content/10.1...
see 🧵 (1/?)
www.sciencedirect.com/science/arti...
www.sciencedirect.com/science/arti...