Lightnews — Scholar-powered news

Philip Osborne

@philiposborne.bsky.social

Improving Reinforcement Learning with language interaction and humans in the loop.
PhD in Artificial Intelligence, University of Manchester (UK)
Founder of elsci.org

Posts Replies Media Videos

Reposted by Philip Osborne

Nathan Lambert

@natolambert.bsky.social

Trying to tell the story behind this explosion of research we are in. An unexpected RL Renaissance.
New talk! Forecasting the Alpaca moment for reasoning models and why the new style of RL training is a far bigger deal than the emergence of RLHF.
YouTube: https://buff.ly/41bVRPp

An unexpected RL Renaissance

New talk! Forecasting the Alpaca moment for reasoning models and why the new style of RL training is a far bigger deal than the emergence of RLHF.

www.interconnects.ai

February 13, 2025 at 3:42 PM

Reposted by Philip Osborne

Sung Kim

@sungkim.bsky.social

Replace "github" with "gitingest" in the url, and you get the whole repo as a single string that you can then paste in your LLMs

February 14, 2025 at 3:12 AM

Reposted by Philip Osborne

Nathan Lambert

@natolambert.bsky.social

Since everyone wants to learn RL for language models now post DeepSeek, reminder that I've been working on this book quietly in the background for months.

Policy gradient chapter is coming together. Plugging away at the book every day now.

rlhfbook.com/c/11-policy-...

February 1, 2025 at 10:05 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news