Tom Everitt
@tom4everitt.bsky.social
1.1K followers 350 following 96 posts
AGI safety researcher at Google DeepMind, leading causalincentives.com Personal website: tomeveritt.se
Posts Media Videos Starter Packs
Pinned
tom4everitt.bsky.social
What if LLMs are sometimes capable of doing a task but don't try hard enough to do it?

In a new paper, we use subtasks to assess capabilities. Perhaps surprisingly, LLMs often fail to fully employ their capabilities, i.e. they are not fully *goal-directed* 🧵

arxiv.org/abs/2504.118...
tom4everitt.bsky.social
Interesting. Could the measure also be applied to the human, assessing changes to their empowerment over time?
tom4everitt.bsky.social
Interesting, does the method rely on being able to set different goals for the LLM?
Reposted by Tom Everitt
tobyord.bsky.social
Evaluating the Infinite
🧵
My latest paper tries to solve a longstanding problem afflicting fields such as decision theory, economics, and ethics — the problem of infinities.
Let me explain a bit about what causes the problem and how my solution avoids it.
1/N
arxiv.org/abs/2509.19389
Evaluating the Infinite
I present a novel mathematical technique for dealing with the infinities arising from divergent sums and integrals. It assigns them fine-grained infinite values from the set of hyperreal numbers in a ...
arxiv.org
tom4everitt.bsky.social
Interesting. I recall Rich Sutton made a similar suggestion in the 3rd edition of his RL book, arguing we should optimize average reward rather than discount
Reposted by Tom Everitt
egrefen.bsky.social
Do you have a PhD (or equivalent) or will have one in the coming months (i.e. 2-3 months away from graduating)? Do you want to help build open-ended agents that help humans do humans things better, rather than replace them? We're hiring 1-2 Research Scientists! Check the 🧵👇
Reposted by Tom Everitt
yoshuabengio.bsky.social
digital-strategy.ec.europa.eu/en/policies/... The Code also has two other, separate Chapters (Copyright, Transparency). The Chapter I co-chaired (Safety & Security) is a compliance tool for the small number of frontier AI companies to whom the “Systemic Risk” obligations of the AI Act apply.
2/3
The General-Purpose AI Code of Practice
The Code of Practice helps industry comply with the AI Act legal obligations on safety, transparency and copyright of general-purpose AI models.
digital-strategy.ec.europa.eu
Reposted by Tom Everitt
vkrakovna.bsky.social
As models advance, a key AI safety concern is deceptive alignment / "scheming" – where AI might covertly pursue unintended goals. Our paper "Evaluating Frontier Models for Stealth and Situational Awareness" assesses whether current models can scheme. arxiv.org/abs/2505.01420
Reposted by Tom Everitt
skiandsolve.bsky.social
First position paper I ever wrote. "Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence" arxiv.org/abs/2506.23908 Background: I'd like LLMs to help me do math, but statistical learning seems inadequate to make this happen. What do you all think?
Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence
Sound deductive reasoning -- the ability to derive new knowledge from existing facts and rules -- is an indisputably desirable aspect of general intelligence. Despite the major advances of AI systems ...
arxiv.org
Reposted by Tom Everitt
davidlindner.bsky.social
Can frontier models hide secret information and reasoning in their outputs?

We find early signs of steganographic capabilities in current frontier models, including Claude, GPT, and Gemini. 🧵
tom4everitt.bsky.social
This is an interesting explanation. But surely boys falling behind is nevertheless an important and underrated problem?
tom4everitt.bsky.social
Interesting. But is case 2 *real* introspection? It infers its internal temperature based on its external output, which feels more like inference based on exospection rather than proper introspection. (I know human "intro"spection often works like this too, but still)
tom4everitt.bsky.social
Thought provoking
anilseth.bsky.social
1/ Can AI be conscious? My Behavioral & Brain Sciences target article on ‘Conscious AI and Biological Naturalism’ is now open for commentary proposals. Deadline is June 12. Take-home: real artificial consciousness is very unlikely along current trajectories. www.cambridge.org/core/journal...
Call for Commentary Proposals - Conscious artificial intelligence and biological
Call for Commentary Proposals - Conscious artificial intelligence and biological naturalism
www.cambridge.org
tom4everitt.bsky.social
… and many more! Check out our paper arxiv.org/pdf/2506.01622, or come chat to @jonrichens.bsky.social, @dabelcs.bsky.social or Alexis Bellot at #ICML2025
arxiv.org
tom4everitt.bsky.social
Causality. In previous work we showed a causal world model is needed for robustness. It turns out you don’t need as much causal knowledge of the environment for task generalization. There is a causal hierarchy, but for agency and agent capabilities, rather than inference!
tom4everitt.bsky.social
Emergent capabilities. To minimize training loss across many goals, agents must learn a world model, which can solve tasks the agent was not explicitly trained on. Simple goal-directedness gives rise to many capabilities (social cognition, reasoning about uncertainty, intent…).
tom4everitt.bsky.social
Safety. Several approaches to AI safety require accurate world models, but agent capabilities could outpace our ability to build them. Our work gives a theoretical guarantee: we can extract world models from agents, and the model fidelity increases with the agent's capabilities.
tom4everitt.bsky.social
Extracting world knowledge from agents. We derive algorithms that recover a world model given the agent’s policy and goal (policy + goal -> world model). These algorithms complete the triptych of planning (world model + goal -> policy) and IRL (world model + policy -> goal).
tom4everitt.bsky.social
Fundamental limitations on agency. In environments where the dynamics are provably hard to learn, or where long-horizon prediction is infeasible, the capabilities of agents are fundamentally bounded.
tom4everitt.bsky.social
No model-free path. If you want to train an agent capable of a wide range of goal-directed tasks, you can’t avoid the challenge of learning a world model. And to improve performance or generality, agents need to learn increasingly accurate and detailed world models.
tom4everitt.bsky.social
These results have several interesting consequences, from emergent capabilities to AI safety… 👇
tom4everitt.bsky.social
And to achieve lower regret, or more complex goals, agents must learn increasingly accurate world models. Goal-conditioned policies are informationally equivalent to world models! But only for goals over mutli-step horizons, myopic agents do not need to learn world models.
tom4everitt.bsky.social
Specifically, we show it’s possible to recover a bounded error approximation of the environment transition function from any goal-conditional policy that satisfies a regret bound across a wide enough set of simple goals, like steering the environment into a desired state.
tom4everitt.bsky.social
Turns out there’s a neat answer to this question. We prove that any agent capable of generalizing to a broad range of simple goal-directed tasks must have learned a predictive model capable of simulating its environment. And this model can always be recovered from the agent.
tom4everitt.bsky.social
World models are foundational to goal-directedness in humans, but are hard to learn in messy open worlds. We're now seeing generalist, model-free agents (Gato, PaLM-E, Pi-0…). Do these agents learn implicit world models, or have they found another way to generalize to new tasks?
tom4everitt.bsky.social
Are world models necessary to achieve human-level agents, or is there a model-free short-cut?
Our new #ICML2025 paper tackles this question from first principles, and finds a surprising answer, agents _are_ world models… 🧵
arxiv.org/abs/2506.01622