Tom Schaul
@schaul.bsky.social
3.2K followers 290 following 28 posts
RL researcher at DeepMind https://schaul.site44.com/ 🇱🇺
Posts Media Videos Starter Packs
schaul.bsky.social
Where do some of Reinforcement Learning's great thinkers stand today?

Find out! Keynotes of the RL Conference are online:
www.youtube.com/playlist?lis...

Wanting vs liking, Agent factories, Theoretical limit of LLMs, Pluralist value, RL teachers, Knowledge flywheels
(guess who talked about which!)
Reposted by Tom Schaul
aditimavalankar.bsky.social
On my way to #ICML2025 to present our algorithm that strongly scales with inference compute, in both performance and sample diversity! 🚀

Reach out if you’d like to chat more!
schaul.bsky.social
Deadline to apply is this Wednesday!
schaul.bsky.social
Ever thought of joining DeepMind's RL team? We're recruiting for a research engineering role in London:
job-boards.greenhouse.io/deepmind/job...
Please spread the word!
Research Engineer, Reinforcement Learning
London, UK
job-boards.greenhouse.io
schaul.bsky.social
The RL team is a small team led by David Silver. We build RL algorithms and solve ambitious research challenges. As one of DeepMind's oldest teams, it has been instrumental in building DQN, AlphaGo, Rainbow, AlphaZero, MuZero, AlphaStar, AlphaProof, Gemini, etc. Help us build the next big thing!
schaul.bsky.social
Ever thought of joining DeepMind's RL team? We're recruiting for a research engineering role in London:
job-boards.greenhouse.io/deepmind/job...
Please spread the word!
Research Engineer, Reinforcement Learning
London, UK
job-boards.greenhouse.io
schaul.bsky.social
When faced with a challenge (like debugging) it helps to think back to examples of how you've overcome challenges in the past. Same for LLMs!

The method we introduce in this paper is efficient because examples are chosen for their complementarity, leading to much steeper inference-time scaling! 🧪
schaul.bsky.social
Some extra motivation for those of you in RLC deadline mode: our line-up of keynote speakers -- as all accepted papers get a talk, they may attend yours!

@rl-conference.bsky.social
RLC Keynote speakers: Leslie Kaelbling, Peter Dayan, Rich Sutton, Dale Schuurmans, Joelle Pineau, Michael Littman
schaul.bsky.social
200 great visualisations: 200 facets and nuances of 1 planetary story.
nathanielbullard.com
My annual decarbonization presentation is here.

200 slides, covering everything from water levels in Lake Gatún to sulfur dioxide emissions to ESG fund flows to Chinese auto exports to artificial intelligence. www.nathanielbullard.com/presentations
schaul.bsky.social
The sound of two users joining per second: "tik", "tok"...
Reposted by Tom Schaul
rl-conference.bsky.social
Excited to announce the first RLC 2025 keynote speaker, a researcher who needs little introduction, whose textbook we've all read, and who keeps pushing the frontier on RL with human-level sample efficiency
Announcement of Richard S. Sutton as RLC 2025 keynote speaker
schaul.bsky.social
Could language games (and playing many of them) be the renewable energy that Ilya was hinting at yesterday? They do address two core challenges of self-improvement -- let's discuss!

My talk is today at 11:40am, West Meeting Room 220-222, #NeurIPS2024
language-gamification.github.io/schedule/
schaul.bsky.social
Are there limits to what you can learn in a closed system? Do we need human feedback in training? Is scale all we need? Should we play language games? What even is "recursive self-improvement"?

Thoughts about this and more here:
arxiv.org/abs/2411.16905
Boundless Socratic Learning with Language Games
An agent trained within a closed system can master any desired capability, as long as the following three conditions hold: (a) it receives sufficiently informative and aligned feedback, (b) its covera...
arxiv.org
schaul.bsky.social
Don't get to talk enough about RL during #neurips2024? Then join us for more, tomorrow night at The Pearl!
rl-conference.bsky.social
If you're at NeurIPS, RLC is hosting an RL event from 8 till late at The Pearl on Dec. 11th. Join us, meet all the RL researchers, and spread the word!
schaul.bsky.social
Dynamic programming has a fun origin story. In 1950, Bellman wanted to coin a term that "was something not even a Congressman could object to".
See here:
pubsonline.informs.org/doi/pdf/10.1...
pubsonline.informs.org
schaul.bsky.social
This year's (first-ever) RL conference was a breath of fresh air! And now that it's established, the next edition is likely to be even better: Consider sending your best and most original RL work there, and then join us in Edmonton next summer!
rl-conference.bsky.social
The call for papers for RLC is now up! Abstract deadline of 2/14, submission deadline of 2/21!
Please help us spread the word.
rl-conference.cc/callforpaper...
RLJ | RLC Call for Papers
rl-conference.cc
schaul.bsky.social
Ohh... good morning to you too!

Clearly this got off on the wrong foot: do you want to try again, maybe more constructively (in the spirit of bluesky not being the other place)? This is a preprint, so I'd be happy to hear your suggestions for making it less "ignorant"...
schaul.bsky.social
Either one or many players. For "improvement" to be well-defined, one agent must be special (see footnote 6), but the multi-agent setting has many benefits.
schaul.bsky.social
1: open-ended means that it will keep producing novel and learnable artifacts (see the definition here: arxiv.org/abs/2406.04268), on the timescale of interest for the observer.

2: I think as a thought experiment it is valid, as it could work in principle, but of course it hasn't been built?
Open-Endedness is Essential for Artificial Superhuman Intelligence
In recent years there has been a tremendous surge in the general capabilities of AI systems, mainly fuelled by training foundation models on internetscale data. Nevertheless, the creation of openended...
arxiv.org
schaul.bsky.social
In section 5 (second paragraph), there's about a dozen references to language games people are already using (one per paper), some with ingenious ways to provide feedback.

Also, I suspect the workshop will ultimately have the poster abstracts online with plenty of additional material!
schaul.bsky.social
Are there limits to what you can learn in a closed system? Do we need human feedback in training? Is scale all we need? Should we play language games? What even is "recursive self-improvement"?

Thoughts about this and more here:
arxiv.org/abs/2411.16905
Boundless Socratic Learning with Language Games
An agent trained within a closed system can master any desired capability, as long as the following three conditions hold: (a) it receives sufficiently informative and aligned feedback, (b) its covera...
arxiv.org
schaul.bsky.social
@colah.bsky.social: with a few years' hindsight, how do you see the Distill space now? Is there a chance for a reboot or a rebirth in another form?
schaul.bsky.social
I think the Distill journal was really valuable in this space, but unfortunately is no longer around to help...

distill.pub
Distill — Latest articles about machine learning
Articles about Machine Learning
distill.pub