PhD in Artificial Intelligence, University of Manchester (UK)
Founder of elsci.org
New talk! Forecasting the Alpaca moment for reasoning models and why the new style of RL training is a far bigger deal than the emergence of RLHF.
YouTube: https://buff.ly/41bVRPp
New talk! Forecasting the Alpaca moment for reasoning models and why the new style of RL training is a far bigger deal than the emergence of RLHF.
YouTube: https://buff.ly/41bVRPp
Policy gradient chapter is coming together. Plugging away at the book every day now.
rlhfbook.com/c/11-policy-...
Policy gradient chapter is coming together. Plugging away at the book every day now.
rlhfbook.com/c/11-policy-...