Come chat! 🎤 I'll be presenting this work at #CogSci2025: 📍 Poster Number P1-B-8 🗓️ Poster Session: Poster Session 1 🧠 Poster title: “Spot the Ball: Evaluating Visual Causal Inference in VLMs under Occlusion”
We also built: ✅ An inpainting-based image generation pipeline ✅ A public demo where you can test your visual inference skills ✅ A dataset of 3000+ labeled soccer images for future work
Results: Humans outperform all models—even with chain-of-thought scaffolding. GPT-4o gets closer with explicit pose/gaze cues, but still falls short in many cases.
The task is mapped to a 6×10 grid → a 60-class classification problem. We benchmark humans and models (GPT-4o, Gemini, LLaMA, Qwen) on soccer, basketball, and volleyball.
In high-stakes, real-world scenes, humans infer what's missing, a crucial skill in driving, robotics, and sports. We isolate this in a simple but rich task: spot the masked ball from a single frame.
The Spot the Ball game has been around for decades. 🗓️ It began in the UK in the 1970s as a popular newspaper contest 👥 At its peak, over 3 million people played weekly Players had to guess where the ball had been removed from a photo—just like our benchmark does today.
🧠⚽ Spot the ball! New benchmark for visual scene understanding! We ask: Can people and models locate a hidden ball in sports images using only visual context and reasoning? 🕹️ Try the task: v0-new-project-9b5vt6k9ugb.vercel.app #CogSci2025