Philip Osborne
philiposborne.bsky.social
Philip Osborne
@philiposborne.bsky.social
Improving Reinforcement Learning with language interaction and humans in the loop.
PhD in Artificial Intelligence, University of Manchester (UK)
Founder of elsci.org
Reposted by Philip Osborne
Trying to tell the story behind this explosion of research we are in. An unexpected RL Renaissance.
New talk! Forecasting the Alpaca moment for reasoning models and why the new style of RL training is a far bigger deal than the emergence of RLHF.
YouTube: https://buff.ly/41bVRPp
An unexpected RL Renaissance
New talk! Forecasting the Alpaca moment for reasoning models and why the new style of RL training is a far bigger deal than the emergence of RLHF.
www.interconnects.ai
February 13, 2025 at 3:42 PM
Reposted by Philip Osborne
Replace "github" with "gitingest" in the url, and you get the whole repo as a single string that you can then paste in your LLMs
February 14, 2025 at 3:12 AM
Reposted by Philip Osborne
Since everyone wants to learn RL for language models now post DeepSeek, reminder that I've been working on this book quietly in the background for months.

Policy gradient chapter is coming together. Plugging away at the book every day now.

rlhfbook.com/c/11-policy-...
February 1, 2025 at 10:05 PM