Paul Gavrikov
@paulgavrikov.bsky.social
42 followers 62 following 48 posts
PostDoc Tübingen AI Center | Machine Learning & Computer Vision paulgavrikov.github.io
Posts Media Videos Starter Packs
paulgavrikov.bsky.social
Agree 100%! I think this paper does a great job of outlining issues in the original paper.
paulgavrikov.bsky.social
If you think of texture as the material/surface property (which I think is the original perspective), then the ablation in this paper is insufficient to suppress the cue.
paulgavrikov.bsky.social
I really liked the thoroughness of this paper but I’m afraid that the results are building on a shaky definition of „texture“. If you replace texture in the original paper with local details it’s virtually the same finding.
paulgavrikov.bsky.social
lol, a #WACV AC just poorly rephrased the weaknesses I raised in my review as their justification and ignored all other reviews ... I feel bad for the authors ...
paulgavrikov.bsky.social
4) Models answer consistently for easy questions ("Is it day?": yes, "Is it night?": no) but fall back to guessing for hard tasks such as reasoning. Concerningly, some models even fall below random chance, hinting at shortcuts.
paulgavrikov.bsky.social
3) Similar trends for OCR. Our OCR questions contain constraints (e.g., the fifth word) that models often fail to consider. Minor errors include a strong tendency to autocorrect typos or to hallucinate more common spellings, especially in non-Latin/English.
paulgavrikov.bsky.social
2) Models cannot count in dense scenes, and the performance gets worse the larger the number of objects; they typically "undercount" and errors are massive. Here is the distribution over all models:
paulgavrikov.bsky.social
1) Our benchmark is hard: the best model (o3) achieves an accuracy of 69.5% in total, but only 19.6% on the hardest split. We observe significant performance drops on some tasks.
paulgavrikov.bsky.social
Our questions are built on top of a fresh dataset of 150 high-resolution and detailed scenes probing core vision skills in 6 categories: counting, OCR, reasoning, activity/attribute/global scene recognition. The ground truth is private, and our eval server is live!
paulgavrikov.bsky.social
🚨 New paper out!
"VisualOverload: Probing Visual Understanding of VLMs in Really Dense Scenes"
👉 arxiv.org/abs/2509.25339
We test 37 VLMs on 2,700+ VQA questions about dense scenes.
Findings: even top models fumble badly—<20% on the hardest split and key failure modes in counting, OCR & consistency.
paulgavrikov.bsky.social
Joint work with Wei Lin, Jehanzeb Mirza, Soumya Jahagirdar, Muhammad Huzaifa, Sivan Doveh, James Glass, and Hilde Kuehne.
paulgavrikov.bsky.social
🤖 We tested 37 models. Results?
Even top VLMs break down on “easy” tasks in overloaded scenes.

Best model (o3):
• 19.8% accuracy (hardest split)
• 69.5% overall
paulgavrikov.bsky.social
📊 VisualOverload =
• 2,720 Q–A pairs
• 6 vision tasks
• 150 fresh, high-res, royalty-free artworks
• Privately held ground-truth responses
paulgavrikov.bsky.social
Is basic image understanding solved in today’s SOTA VLMs? Not quite.

We present VisualOverload, a VQA benchmark testing simple vision skills (like counting & OCR) in dense scenes. Even the best model (o3) only scores 19.8% on our hardest split.
Reposted by Paul Gavrikov
keuper-labs.bsky.social
Congratulations to @paulgavrikov.bsky.social for an excellent PhD defense today!
paulgavrikov.bsky.social
Yesterday, I had the great honor of delivering a talk on feature biases in vision models at the VAL Lab at the Indian Institute of Science (IISc). I covered our ICLR 2025 paper and a few older works in the same realm.
youtu.be/9efpCs1ltcM
Paul Gavrikov - Feature Biases in Vision Models (Research Talk @ IISc, Bengaluru)
YouTube video by Paul Gavrikov
youtu.be
paulgavrikov.bsky.social
It was truly special reconnecting with old friends and making so many new ones. Beyond the conference halls, we had some unforgettable adventures — exploring the city, visiting the woodlands, and singing our hearts out at karaoke nights. 🎤🦁🌳
paulgavrikov.bsky.social
What an incredible week at #ICLR 2025! 🌟
I had an amazing time presenting our poster "Can We Talk Models Into Seeing the World Differently?" with
@jovitalukasik.bsky.social. Huge thanks to everyone who stopped by — your questions, insights, and conversations made it such a rewarding experience.
paulgavrikov.bsky.social
Looking forward to meet you!
paulgavrikov.bsky.social
Today at 3pm - poster #328. See you there!
paulgavrikov.bsky.social
On Thursday, I'll be presenting our paper "Can We Talk Models Into Seeing the World Differently?" (#328) at ICLR 2025 in Singapore! If you're attending or just around Singapore, l'd love to connect—feel free to reach out! Also, I’m exploring postdoc or industry opportunities–happy to chat!
paulgavrikov.bsky.social
On Thursday, I'll be presenting our paper "Can We Talk Models Into Seeing the World Differently?" (#328) at ICLR 2025 in Singapore! If you're attending or just around Singapore, l'd love to connect—feel free to reach out! Also, I’m exploring postdoc or industry opportunities–happy to chat!