Lightnews — Scholar-powered news

Mark Ibrahim

@markibrahim.bsky.social

62 followers 120 following 21 posts

Researching the dark arts of deep learning at Meta's FAIR (Fundamental AI Research) Lab https://markibrahim.me/

Posts Replies Media Videos

Mark Ibrahim

@markibrahim.bsky.social

Want to teach AI agents to use apps like humans? Get started with digital agents research using OpenApps, our new Python-based environment.

December 10, 2025 at 3:44 PM

Mark Ibrahim

@markibrahim.bsky.social

Despite saturating single image perception, Common-O establishes a new challenging multimodal benchmark. The best performing model only achieves 35% on Common-O and on Common-O Complex, consisting of more complex scenes, the best model achieves only 1%.

🧵2/3

November 7, 2025 at 8:55 PM

Mark Ibrahim

@markibrahim.bsky.social

We introduce, Common-O, a new multimodal benchmark for hallucination when reasoning across scenes.

We find leading multimodal LLMs can reliably identify objects, yet hallucinate when reasoning across scenes.

🧵1/3

November 7, 2025 at 8:55 PM

Mark Ibrahim

@markibrahim.bsky.social

One can manipulate LLM rankings to put any model in the lead—only by modifying the single character separating demonstration examples. Learn more in our new paper arxiv.org/abs/2510.05152
w/ Jingtong Su, Jianyu Zhang, @karen-ullrich.bsky.social , and Léon Bottou.
🧵

October 9, 2025 at 2:32 PM

Mark Ibrahim

@markibrahim.bsky.social

A good language model should say “I don’t know” by reasoning about the limits of its knowledge. Our new work AbstentionBench carefully measures this overlooked skill in an open-codebase others can build on!

We find frontier reasoning degrades models’ ability to know when NOT to answer.

🧵1/2

June 17, 2025 at 6:32 PM

Mark Ibrahim

@markibrahim.bsky.social

Recently, we also applied the same MLM-U objective to maze navigation. We find when training parameter-matched transformers on identical data, MLM-U without any tweaks outperforms standard next token training across all maze grid sizes (up to 30x30).

December 11, 2024 at 6:42 PM

Mark Ibrahim

@markibrahim.bsky.social

Can we boost transformers’ ability to retrieve knowledge and plan in maze navigation by only tweaking the learning objective?

We emphatically say YES in our #NeurIPS 2024 study! 🧵

w/ Ouail Kitouni, Niklas Nolte, Diane Bouchacourt, Adina Williams, and Mike Rabbat

Paper arxiv.org/abs/2406.05183

December 11, 2024 at 6:32 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news