vmoens
@vmoens.bsky.social
2.1K followers 640 following 92 posts
Member of technical staff @periodiclabs Open-source/open science advocate Maintainer of torchrl / tensordict / leanrl Former MD - Neuroscience PhD https://github.com/vmoens
Posts Media Videos Starter Packs
Pinned
vmoens.bsky.social
One of my fav projects: LeanRL, a simple RL library that provides recipes for fast RL training using torch.compile and cudagraphs.
Using these, we got >6x speed-ups compared to the original CleanRL implementations.
github.com/pytorch-labs...
vmoens.bsky.social
Happy to announce that I've joined Periodic Labs as member of technical staff. We're a mission driven startup aimed at accelerating scientific discovery using AI, with a strong focus on material science (discovery of new materials such as superconductors and such). We're hiring: periodic.com
Periodic Labs
From bits to atoms.
periodic.com
vmoens.bsky.social
Follow me for more CS cuisine!
vmoens.bsky.social
The fact that all LLM libraries don't have the same data format is as surprising as the fact that there is more than one sign language dialect
vmoens.bsky.social
Ray is an excellent way of testing if all your `__repr__` are coded properly (but it shouldn't be)
vmoens.bsky.social
Just stumbled upon RouteRL: a multiagent RL framework to facilitate the testing and development of efficient route choice strategies

coexistence-project.github.io/RouteRL/
Looks pretty cool!
RouteRL 1.0.0 documentationContentsMenuExpandLight modeDark modeAuto light/dark, in light modeAuto light/dark, in dark mode
coexistence-project.github.io
Reposted by vmoens
ngxson.hf.co
What is GGUF, Safetensors, PyTorch, ONNX?

In this blog post, let's discover common formats for storing an AI model.

huggingface.co/blog/ngxson/...
Common AI Model Formats
A Blog post by Xuan-Son Nguyen on Hugging Face
huggingface.co
vmoens.bsky.social
MLGym makes it super easy to set up complex tasks to be solved by LLMs. Honestly one of the most intuivite APIs I have ever seen in that space!
vmoens.bsky.social
After that, your LLM reads these instructions, and outputs prompts with some thoughts. The commands are executed in the docker's bash, and the result is returned to the agent.
vmoens.bsky.social
Today we're opensourcing MLGym, an API for AI research agents.

MLGym relies on a gym environment that wraps a docker image. Each env has a task specified as a YAML file, telling in plain english what you want your LLM to achieve
👇
vmoens.bsky.social
Good old cProfile with snakeviz is pretty cool too jiffyclub.github.io/snakeviz/
Again, not for cuda ops, and not as fine-grained as line-profiler but quite useful for macro-tracking of compute time
SnakeViz
SnakeViz is a browser based graphical viewer for the output of Python's cProfile module.
jiffyclub.github.io
vmoens.bsky.social
torch.utils.benchmark.Timer is amazing to assess the runtime of a whole isolated piece of code, but be mindful that the way it plays with global variables isn't always obvious and may differ from time.time() on occasions
vmoens.bsky.social
I use line_profiler to check the code line-by-line (careful: cuda ops re async, do not trust it for these!) - very useful to check cpu-overhead pypi.org/project/line...
line-profiler
Line-by-line profiler
pypi.org
vmoens.bsky.social
The profilers I use: PyTorch profiler to view the time spend doing the various ops of my code. It can reliably show you what's going on for a single iteration of your function. pytorch.org/tutorials/re...
PyTorch Profiler — PyTorch Tutorials 2.6.0+cu124 documentation
pytorch.org
vmoens.bsky.social
In general, in-place operations are not preferable to regular ones (you won't gain much mem improvement or speed-ups). Don't load your code with ReLU(inplace=True), mul_, add_ if not absolutely necessary.
vmoens.bsky.social
Using hydra or similar fancy config objects: Avoid calling cfg.attribute often in the code. Instead, cache the args values in your script as global workspace variables.
vmoens.bsky.social
If you have a tiny model (robotics, RL) cpu-overhead bound, avoid frequent calls to eval() or train() in eager mode, or model.parameters() or anything that goes through your model. Prefer cached versions of these calls.
vmoens.bsky.social
Avoid calling tensor.item() in between cuda operations. This triggers a cuda synchronization and blocks your code. Do the logging after all code (forward / backward / optim) has completed. See how to find sync points here)
vmoens.bsky.social
Avoid pinning memory in your code unless you thoroughly tested that it accelerates runtime (see this tutorial for more info). As an aside, pin_memory is also less safe! pytorch.org/tutorials/in...
A guide on good usage of non_blocking and pin_memory() in PyTorch — PyTorch Tutorials 2.6.0+cu124 documentation
pytorch.org
vmoens.bsky.social
Don't send tensors to device using to(device) if you can instantiate them directly there. For instance, prefer randn((), device=device) to randn(()).to(device)
vmoens.bsky.social
A few tips I share when I talk about perf with PyTorch in eager mode (with focus on small models): 🪢
vmoens.bsky.social
I guess my point was that a proper name + definition is necessary to write good code. When I see “policy”, “critic”, “replay buffer”, “env” I know exactly what does and doesn’t belong to them. With agent is systematically a “hm yeah why not” - then you end up with ill-defined monster classes
vmoens.bsky.social
If your agent is a policy call it policy, if it's a trainer call it trainer! If it's just a big undefined collection of methods, consider refactoring it...
vmoens.bsky.social
Every time I meet with people and someone talks about agent, there's at least one person who asks "what do you mean by agent?" or "you should not call that an agent".