Mattie Fellows
@mattieml.bsky.social
1.7K followers 110 following 19 posts
Reinforcement Learning Postdoc at FLAIR, University of Oxford @universityofoxford.bsky.social All opinions are my own.
Posts Media Videos Starter Packs
Pinned
mattieml.bsky.social
1/2 Offline RL has always bothered me. It promises that by exploiting offline data, an agent can learn to behave near-optimally once deployed. In real life, it breaks this promise, requiring large amount of online samples for tuning and has no guarantees of behaving safely to achieve desired goals.
mattieml.bsky.social
FLAIR WINTER/SPRING INTERNSHIP!
We're looking for two exceptional students to join us on research projects in Oxford from January! Please share with anyone who would be interested. Details below :)
Internship - Winter/Spring 2026
We are looking for two talented students to join us for an internship working in FLAIR for 6 months. Students will get the chance to work on current FLAIR projects at the University of Oxford, gaining...
foersterlab.com
Reposted by Mattie Fellows
pcastr.bsky.social
PQN, a recently introduced value-based method (bsky.app/profile/matt...) has a similar data-collection as PPO. Although we see a similar trend as with PPO, but much less pronounced. It is possible our findings are more correlated with policy-based methods.
9/
mattieml.bsky.social
2/2 🚀 Our new paper below tackles two major issues of high online sample complexity and lack of online performance guarantees in offline RL, obtaining accurate regret estimation and achieving competitive performance with the best online hyperparameter tuning methods, both
using only offline data! 👇
arxiv.org
mattieml.bsky.social
1/2 Offline RL has always bothered me. It promises that by exploiting offline data, an agent can learn to behave near-optimally once deployed. In real life, it breaks this promise, requiring large amount of online samples for tuning and has no guarantees of behaving safely to achieve desired goals.
mattieml.bsky.social
2/2 🚀 Our new paper below tackles two major issues of high online sample complexity and lack of online performance guarantees in offline RL, obtaining accurate regret estimation and achieving competitive performance with the best online hyperparameter tuning methods, both
using only offline data! 👇
arxiv.org
mattieml.bsky.social
If you're struggling with the bs Overleaf outage, you can try going to: www.overleaf.com/project/[PROJECTID]/download/zip. to download the zip. It seems to sometimes work after a few minutes
mattieml.bsky.social
Excited to be presenting our spotlight ICLR paper Simplifying Deep Temporal Difference Learning today! Join us in Hall 3 + Hall 2B Poster #123 from 3pm :)
arxiv.org
mattieml.bsky.social
The techniques used by our work and Bhandari are a standard technique in the analysis of stochastic approximation algorithms and have been around for a long time. Moreover the point of a blog was an expositional tool that acts as a complete analysis of TD. But sure, I'll add even more references...
mattieml.bsky.social
In our paper we quite clearly state at several points including ` convergence of TD methods has been studied extensively (Watkins
& Dayan, 1992; Tsitsiklis & Van Roy, 1997; Dalal et al., 2017; Bhandari et al., 2018; Srikant &
Ying, 2019)' ` our proof is similar to Bhandari et al. (2018).'
mattieml.bsky.social
Crucially, techniques that study linear function approximation could not be used to understand things like LayerNorm
mattieml.bsky.social
As far as I'm aware, and please correct me if I'm wrong, I've never seen the derivation of the path mean Jacobian, which really is a key contribution of our analysis as it allows us to study nonlinear systems (i.e. ACTUAL neural nets used in practice) that many papers like Bhandari etc. can't.
mattieml.bsky.social
we cite said papers several times in our work and the blogs...
Reposted by Mattie Fellows
jfoerst.bsky.social
PQN puts Q-learning back on the map and now comes with a blog post + Colab demo! Also, congrats to the team for the spotlight at #ICLR2025
mattieml.bsky.social
PQN blog 3/3 👉take a look at Matteo's 5-minute blog covering PQN’s key features, plus a Colab demo with JAX & PyTorch implementations mttga.github.io/posts/pqn/

🔎 For a deeper dive into the theory:
blog.foersterlab.com/fixing-td-pa...
blog.foersterlab.com/fixing-td-pa...

See you in Singapore! 🇸🇬
Simplifying Deep Temporal Difference Learning
A modern implementation of Deep Q-Network without target networks and replay buffers.
mttga.github.io
mattieml.bsky.social
PQN blog 3/3 👉take a look at Matteo's 5-minute blog covering PQN’s key features, plus a Colab demo with JAX & PyTorch implementations mttga.github.io/posts/pqn/

🔎 For a deeper dive into the theory:
blog.foersterlab.com/fixing-td-pa...
blog.foersterlab.com/fixing-td-pa...

See you in Singapore! 🇸🇬
Simplifying Deep Temporal Difference Learning
A modern implementation of Deep Q-Network without target networks and replay buffers.
mttga.github.io
mattieml.bsky.social
There are so many great places in the world, if anything it would be a positive to regularly see more conferences in countries other than US/Austria/Canada
mattieml.bsky.social
PQN Blog 2/3: In this blog we show how to overcome `deadly triad' and stabilise TD using regularisation techniques such as LayerNorm and/or l_2 regularisation, deriving a provably stable deep Q learning update WITHOUT ANY REPLAY BUFFER OR TARGET NETWORKS @jfoerst.bsky.social @flair-ox.bsky.social
Fixing TD Pt II: Overcoming the Deadly Triad
blog.foersterlab.com
Reposted by Mattie Fellows
numcog.bsky.social
Are academic conferences in the US a thing of the past?
mattieml.bsky.social
PQN Blog 1/3: TD methods are the bread and butter of RL, yet can have convergence issues when used in practice. This has always annoyed me. Find out below why TD is so unstable and how can we understand this instability better using the TD Jacobian. @flair-ox.bsky.social @jfoerst.bsky.social
Fixing TD Pt I: Why is Temporal Difference Learning so Unstable?
blog.foersterlab.com
mattieml.bsky.social
Super excited to share our paper, Simplifying Deep Temporal Difference Learning has been accepted as a spotlight at ICLR! My fab collaborator Matteo Gallici and I have written a three part blog on the work, so stay tuned for that! :)
@flair-ox.bsky.social
arxiv.org/pdf/2407.04811
arxiv.org
Reposted by Mattie Fellows
eugenevinitsky.bsky.social
If you're an RL researcher or RL adjacent, pipe up to make sure I've added you here!
go.bsky.app/3WPHcHg