Testing Stimulus Equivalence in Transformer-Based Agents.
www.mdpi.com/1999-5903/16...
It's a special form o generalization where stimulus can control other contingencies without training.
It's a special form o generalization where stimulus can control other contingencies without training.
New talk! Forecasting the Alpaca moment for reasoning models and why the new style of RL training is a far bigger deal than the emergence of RLHF.
YouTube: https://buff.ly/41bVRPp
New talk! Forecasting the Alpaca moment for reasoning models and why the new style of RL training is a far bigger deal than the emergence of RLHF.
YouTube: https://buff.ly/41bVRPp
These last years have been extremely fun, and I am very lucky to have collaborated with and met so many great people😄
These last years have been extremely fun, and I am very lucky to have collaborated with and met so many great people😄
We introduce IPRO (Iterated Pareto Referent Optimisation)—a principled approach to solving multi-objective problems.
🔗 Paper: arxiv.org/abs/2402.07182
💻 Code: github.com/wilrop/ipro
We introduce IPRO (Iterated Pareto Referent Optimisation)—a principled approach to solving multi-objective problems.
🔗 Paper: arxiv.org/abs/2402.07182
💻 Code: github.com/wilrop/ipro
Testing Stimulus Equivalence in Transformer-Based Agents.
www.mdpi.com/1999-5903/16...
Testing Stimulus Equivalence in Transformer-Based Agents.
www.mdpi.com/1999-5903/16...