Karim Farid
@kifarid.bsky.social
2.2K followers 790 following 22 posts
PhD. Student @ELLIS.eu @UniFreiburg with Thomas Brox and Cordelia Schmid Understanding intelligence and cultivating its societal benefits https://kifarid.github.io
Posts Media Videos Starter Packs
Reposted by Karim Farid
phillipisola.bsky.social
Over the past year, my lab has been working on fleshing out theory + applications of the Platonic Representation Hypothesis.

Today I want to share two new works on this topic:

Eliciting higher alignment: arxiv.org/abs/2510.02425
Unpaired learning of unified reps: arxiv.org/abs/2510.08492

1/9
kifarid.bsky.social
Orbis shows that the objective matters.
Continuous modeling yields more stable and generalizable world models, yet true probabilistic coverage remains a challenge.

Immensely grateful to my co-authors @arianmousakhan.bsky.social, Sudhanshu Mittal, and Silvio Galesso, and to @thomasbrox.bsky.social
kifarid.bsky.social
Under the hood 🧠

Orbis uses a hybrid tokenizer with semantic + detail tokens that work in both continuous and discrete spaces.
The world model then predicts the next frame by gradually denoising or unmasking it, using past frames as context.
kifarid.bsky.social
Realistic and Diverse Rollouts 4/4
kifarid.bsky.social
Realistic and Diverse Rollouts 3/4
kifarid.bsky.social
Realistic and Diverse Rollouts 2/4
kifarid.bsky.social
Realistic and Diverse Rollouts 1/4
kifarid.bsky.social
While other models drift or blur on turns, Orbis stays on track — generating realistic, stable futures beyond the training horizon.

On our curated nuPlan-turns dataset, Orbis achieves better FVD, precision, and recall, capturing both visual and dynamics realism.
kifarid.bsky.social
We ask how continuous vs. discrete models and their tokenizers shape long-horizon behavior.

Findings:
Continuous models (Flow Matching) are
• Far less brittle to design choices
• Produce realistic, stable rollouts up to 20s
• And generalize better to unseen driving conditions

Continuous > Discrete
kifarid.bsky.social
Driving world models look good for a few frames, then they drift, blur, or freeze, especially when a turn or complex scene appears. These failures reveal a deeper issue: models aren’t capturing real dynamics. We introduce new metrics to measure such breakdowns.
kifarid.bsky.social
Our work Orbis goes to #NeurIPS2025!

A continuous autoregressive driving world model that outperforms Cosmos, Vista, and GEM with far less compute.

469M parameters
Trained on ~280h of driving videos

📄 arxiv.org/pdf/2507.13162
🎬 lmb-freiburg.github.io/orbis.github...
💻 github.com/lmb-freiburg...
kifarid.bsky.social
The question raised here is whether this approach is a generalist or a specialist that cannot transcend to the G-foundation state.
kifarid.bsky.social
I think HRM is quite great too. I would say they contributed the main idea (deep supervision) behind TRM.
kifarid.bsky.social
Transformers do not need to have something like "gradient descent" as an emergent property when it is kind of baked into it.
kifarid.bsky.social
The TRM works because it has an optimization algorithm as an inductive bias to find the answer. Can't say anything about this work but brilliant.
kifarid.bsky.social
We should normalize having the ‘Ideas That Failed’ section. It would save enormous amounts of compute and time otherwise spent rediscovering stuff that doesn’t work.
Reposted by Karim Farid
atabb.bsky.social
I stumbled on @eugenevinitsky.bsky.social 's blog and his "Personal Rules of Productive Research" is very good. I now do a lot of things in the post, & wish I had done them when I was younger.

I share my "mini-paper" w ppl I hope will be co-authors.

www.eugenevinitsky.com/posts/person...
Eugene Vinitsky
www.eugenevinitsky.com
Reposted by Karim Farid
Reposted by Karim Farid
eugenevinitsky.bsky.social
My major realization of the past year of teaching is that a lot is forgiven if students believe you genuinely care about them and the topic
Reposted by Karim Farid
phillipisola.bsky.social
Possible challenge: getting a model of {X,Y,Z,...} that is much better than independent models of each individual modality {X}, {Y}, {Z}, ... i.e. where the whole is greater than the sum of the parts.
kifarid.bsky.social
I also really hope that the LAM from V1 is still there!
kifarid.bsky.social
Inspiring! Genie incentives generative models to learn actionable latent states by enforcing a latent action model. Action spaces and actionable states are entangled, so more causal WMs. However, I was wondering why would you call the “counterfactuals” counterfactual? Sounds more like interventional
kifarid.bsky.social
Nice! There was some skepticism around diffusion models representation learning capacity as they do not optimize for an explicit abstraction loss as other SSL models.

I guess the work would benefit a lot from a comparison with SODA, what do you think?

arxiv.org/abs/2311.17901
SODA: Bottleneck Diffusion Models for Representation Learning
We introduce SODA, a self-supervised diffusion model, designed for representation learning. The model incorporates an image encoder, which distills a source view into a compact representation, that, i...
arxiv.org
Reposted by Karim Farid
abhishekunique7.bsky.social
I'm excited about scaling up robot learning! We’ve been scaling up data gen with RL in realistic sims generated from crowdsourced videos. Enables data collection far more cheaply than real world teleop. Importantly, data becomes *cheaper* with more environments and transfers to real robots! 🧵 (1/N)