Benno Krojer
@bennokrojer.bsky.social
2.6K followers 960 following 1.8K posts
AI PhDing at Mila/McGill (prev FAIR intern). Happily residing in Montreal 🥯❄️ Academic: language grounding, vision+language, interp, rigorous & creative evals, cogsci Other: many sports, urban explorations, puzzles/quizzes bennokrojer.com
Posts Media Videos Starter Packs
Pinned
bennokrojer.bsky.social
Restarting an old routine "Daily Dose of Good Papers" together w @vaibhavadlakha.bsky.social

Sharing my notes and thoughts here 🧵
bennokrojer.bsky.social
I'll be at COLM!

Excited to chat about about anything vision+language, interpretability, cogsci/psych, embedding spaces, visual reasoning, video/world models
bennokrojer.bsky.social
I used it recently for the first time and was blown away by the speed. Should switch!
bennokrojer.bsky.social
Devoured this book in 18 hours, usually not a big fan of audio books!

It covered lots from crowdworker rights, the ideologies (doomers, EA, ...) and the silicon valley startup world to the many big egos and company-internal battles

Great work by @karenhao.bsky.social
Reposted by Benno Krojer
elinorpd.bsky.social
Congratulations @bennokrojer.bsky.social on passing your PhD proposal exam! A great presentation and exciting work!
bennokrojer.bsky.social
very happy to see the trend of a Behind the Scenes section catching on! transparent & honest science 👌

love the detailed montreal spots mentioned

consider including such a section in your next appendix!

(paper by @a-krishnan.bsky.social arxiv.org/pdf/2504.050...)
bennokrojer.bsky.social
Super cool work on quantifying with NLP how language evolves through generations

In linguistics, the "apparent time hypothesis" famously discusses this but never empirically tests it
grvkamath.bsky.social
Our new paper in #PNAS (bit.ly/4fcWfma) presents a surprising finding—when words change meaning, older speakers rapidly adopt the new usage; inter-generational differences are often minor.

w/ Michelle Yang, ‪@sivareddyg.bsky.social‬ , @msonderegger.bsky.social‬ and @dallascard.bsky.social‬👇(1/12)
bennokrojer.bsky.social
Maybe he'd otherwise miss the Alpes and skiing too much
bennokrojer.bsky.social
Done the same in the past when I felt little motivation for the PhD! It's been a while since I've read one... Maybe I'll pick it up again
Reposted by Benno Krojer
vernadankers.bsky.social
I miss Edinburgh and its wonderful people already!! Thanks to @tallinzen.bsky.social and @edoardo-ponti.bsky.social for inspiring discussions during the viva! I'm now exchanging Arthur's Seat for Mont Royal to join @sivareddyg.bsky.social's wonderful lab @mila-quebec.bsky.social 🤩
agostinacal.bsky.social
Huge congratulations to Dr. @vernadankers.bsky.social for passing her viva today! 🥳🎓

It's truly been an honour sharing the PhD journey with you. I wasn’t ready for the void your sudden departure left (in the office and in my life!).
Your new colleagues are lucky to have you! 🥺🥰
bennokrojer.bsky.social
Also check out our previous two episodes! They didn't have a single guest, instead:

1) we introduce the podcast and how Tom and I got into research in Ep 00
2) we interview several people at Mila just before the Neurips deadline about their submissions in Ep 01
bennokrojer.bsky.social
Started a new podcast with @tomvergara.bsky.social !

Behind the Research of AI:
We look behind the scenes, beyond the polished papers 🧐🧪

If this sounds fun, check out our first "official" episode with the awesome Gauthier Gidel
from @mila-quebec.bsky.social :

open.spotify.com/episode/7oTc...
02 | Gauthier Gidel: Bridging Theory and Deep Learning, Vibes at Mila, and the Effects of AI on Art
Behind the Research of AI · Episode
open.spotify.com
bennokrojer.bsky.social
Turns out condensing your research into 3min is very hard but also teaches you a lot

Finally the video from Mila's speed science competition is on YouTube!

From a soup of raw pixels to abstract meaning

t.co/RDpu1kR7jM
bennokrojer.bsky.social
This is part of a larger effort at meta to significantly improve physical world modeling so check out the other works in this blog post!

ai.meta.com/blog/v-jepa-...
ai.meta.com
bennokrojer.bsky.social
Some reflections at the end:
There's a lot of talk about math reasoning these days, but this project made me appreciate what simple reasoning we humans take for granted, arising in our first months and years of living

As usual i also included "Behind The Scenes" in the Appendix:
bennokrojer.bsky.social
I am super grateful to my smart+kind collaborators at Meta who made this a very enjoyable project :)

(Mido Assran Nicolas Ballas @koustuvsinha.com @candaceross.bsky.social @quentin-garrido.bsky.social Mojtaba Komeili)

The Montreal office in general is a very fun place 👇
bennokrojer.bsky.social
The hardest tasks for current models are still intuitive physics tasks where performance is often below random (In line with the prev. literature)

We encourage the community to use MVPBench to check if the latest VideoLLMs possess a *real* understanding of the physical world!
bennokrojer.bsky.social
On the other hand even the strongest sota models perform around random chance, with only 2-3 models significantly above random
bennokrojer.bsky.social
The questions in MVPBench are conceptually simple: relatively short videos with little linguistic or cultural knowledge needed. As a result humans have no problem with these questions, e.g. it is known that even babies do well on various intuitive physics tasks
bennokrojer.bsky.social
By automating the pairing of highly similar video pairs pairs and unifying different datasets, as well filtering out examples that models can solve with a single-frame, we end up with (probably) the largest and most diverse dataset of its kind:
bennokrojer.bsky.social
So a solution we propose a 3-step curation framework that results in the Minimal Video Pairs benchmark (MVPBench)
bennokrojer.bsky.social
We show that seemingly “high-performing” VideoLLMs take various shortcuts on video tasks meant to test physical understanding, such as models falling back to single-frame biases.

In total we analyze 4 such shortcuts and find that model scores often don't change much:
bennokrojer.bsky.social
What was the motivation behind MVPBench?

Our starting point is a skepticism about recent “successes” of multi-modal LLMs on video understanding benchmarks