Glen Berseth
@glenberseth.bsky.social
1.9K followers 81 following 180 posts
Assistant Prof at @UMontreal @mila-quebec.bsky.social @MontrealRobots . CIFAR AI Chair, RL_Conference chair. Creating generalist problem-solving agents for the real world. He/him/il.
Posts Media Videos Starter Packs
Pinned
glenberseth.bsky.social
Creating Generalist robotics policies (GRPs) is tricky. In this video (and code) I share how to create a GRP from scratch from some basic transformer code. This is the first step in my plan to create a course on large models and scaling for RL and Robotics.
glenberseth.bsky.social
The same could be said for science.
eugenevinitsky.bsky.social
The single biggest epistemic challenge in the internet era is remaining calibrated about what "normal" people think while the internet throws up an infinite wall of crazy. Thousands of people sharing an absurd opinion on the internet tells you very little!
glenberseth.bsky.social
There are many ways to learn or compute a critic that can help score the performance of different actions. This is not the full story. If you want more details, go read rlhfbook.com/c/11-policy-...
glenberseth.bsky.social
GRPO is more like REINFORCE than PPO.
1) It does not train a critic (no need with small variance)
2) The SCORE FUNCTION (difficult to call this an advantage) is over a batch using the same initial prompt (similar to the vine sample method from TRPO)
glenberseth.bsky.social
On my way to South Korea for a week packed with robotics at the conference on Robot Learning, Humanoids2025, and the global forum on mechanical engineering.
glenberseth.bsky.social
Very exciting! Congratulations.
glenberseth.bsky.social
Make a plan for the next 2-3 months
1. Have clear goals/claims
2. Have a clear way to measure progress
3. Share the plan and get hashtag#consensus with your collaborators.

Without a plan, how does one know they are making progress 🤔
glenberseth.bsky.social
One of the most common logical fallacies I see is "GPUs are cooking, therefore progress." I see people with 1/10th the compute get 10x more progress because... they have a more thorough plan. #Moretimethinkinglesstimeburning
glenberseth.bsky.social
Because you do so much awesome stuff!
glenberseth.bsky.social
I suggest going out and talking to real people. They provide a much richer signal.
eugenevinitsky.bsky.social
The single biggest epistemic challenge in the internet era is remaining calibrated about what "normal" people think while the internet throws up an infinite wall of crazy. Thousands of people sharing an absurd opinion on the internet tells you very little!
glenberseth.bsky.social
Maybe one of the best use cases.
glenberseth.bsky.social
We compare different checkpoints during the training process.
Vision-Language-Action Planning and Search (VLAPS) significantly outperforms VLA-only baselines on simulated, language-specified robotic tasks, improving success rates by up to 67 percentage points.
glenberseth.bsky.social
VLAs offer an avenue for generalist robot policies; however, naively following the action predictions leads to brittle or unsafe behaviours. We introduce VLAPS, which integrates model-based search with pre-trained VLA policies to improve performance without additional training.
glenberseth.bsky.social
Efficiency may be the most important. If we can't make these tools economical, they will not last.
jeffdean.bsky.social
AI efficiency is important. The median Gemini Apps text prompt in May 2025 used 0.24 Wh of energy (<9 seconds of TV watching) & 0.26 mL (~5 drops) of water. Over 12 months, we reduced the energy footprint of a median text prompt 33x, while improving quality:
cloud.google.com/blog/product...
glenberseth.bsky.social
My lab at @montrealrobotics.bsky.social was honoured to present our recent work to @mark-carney.bsky.social and Even Solomon explaining how AI enables new robotics that will drive innovation in Canada. It was a pleasure getting into the details with a quick dive into deterministic policy gradients!
glenberseth.bsky.social
Another fantastic Montreal Robotics Summer School! Thanks to our sponsors, organizers, and @mila-quebec.bsky.social, we doubled in size this year. Congratulations again to all the students who make this school happen, and for your progress in machine learning and robotics.
glenberseth.bsky.social
The team is already growing
glenberseth.bsky.social
Last, rliable has a measure of optimality gap between expert and learned policy. But, a poor gap aliases the exploration and exploitation issues. Our new measure better measures the exploitation issues and indicates that PPO is the better algorithm compared to DQN.
glenberseth.bsky.social
Scaling issues could be the result of narrow exploration from complex distributions or optimization issues. This method estimates that the difference is large, indicating a larger exploitation issues with larger models.
glenberseth.bsky.social
Intrinsic rewards, which are designed to help RL algorithms explore, actually increase the difference agrivating exploitation issues. This is troublesome because as we develop new exploration methods, they may be generating better experience, but the optimization may ignore it.