@nbalamur.bsky.social
10 followers 5 following 8 posts
Posts Media Videos Starter Packs
nbalamur.bsky.social
Come chat! 🎤
I'll be presenting this work at #CogSci2025:
📍 Poster Number P1-B-8
🗓️ Poster Session: Poster Session 1
🧠 Poster title: “Spot the Ball: Evaluating Visual Causal Inference in VLMs under Occlusion”
nbalamur.bsky.social
We also built:
✅ An inpainting-based image generation pipeline
✅ A public demo where you can test your visual inference skills
✅ A dataset of 3000+ labeled soccer images for future work
nbalamur.bsky.social
Results:
Humans outperform all models—even with chain-of-thought scaffolding.
GPT-4o gets closer with explicit pose/gaze cues, but still falls short in many cases.
nbalamur.bsky.social
Three prompt types, increasing in reasoning complexity:
🔹 Basic: “Which grid cell contains the ball?”
🔹 Implicit: Encourages attention to pose/gaze
🔹 Chain-of-thought: Step-by-step inference
nbalamur.bsky.social
The task is mapped to a 6×10 grid → a 60-class classification problem.
We benchmark humans and models (GPT-4o, Gemini, LLaMA, Qwen) on soccer, basketball, and volleyball.
nbalamur.bsky.social
In high-stakes, real-world scenes, humans infer what's missing, a crucial skill in driving, robotics, and sports.
We isolate this in a simple but rich task: spot the masked ball from a single frame.
nbalamur.bsky.social
The Spot the Ball game has been around for decades.
🗓️ It began in the UK in the 1970s as a popular newspaper contest
👥 At its peak, over 3 million people played weekly
Players had to guess where the ball had been removed from a photo—just like our benchmark does today.
nbalamur.bsky.social
🧠⚽ Spot the ball! New benchmark for visual scene understanding!
We ask: Can people and models locate a hidden ball in sports images using only visual context and reasoning?
🕹️ Try the task: v0-new-project-9b5vt6k9ugb.vercel.app
#CogSci2025