Mark Ibrahim
@markibrahim.bsky.social
58 followers 120 following 13 posts
Researching the dark arts of deep learning at Meta's FAIR (Fundamental AI Research) Lab https://markibrahim.me/
Posts Media Videos Starter Packs
markibrahim.bsky.social
We explain how good delimiters steer attention heads to key input tokens and offer practical recommendations for prompts and delimiter choices to get the best performance from your LLM—tldr; use “!” or “\n”.
markibrahim.bsky.social
- MMLU performance can vary by +/- 23% depending on the choice of delimiter across leading open model families (Llama, Qwen, and Gemma).
- Closed models, GPT-4o, are also brittle to the choice of delimiter.

🧵
markibrahim.bsky.social
One can manipulate LLM rankings to put any model in the lead—only by modifying the single character separating demonstration examples. Learn more in our new paper arxiv.org/abs/2510.05152
w/ Jingtong Su, Jianyu Zhang, @karen-ullrich.bsky.social , and Léon Bottou.
🧵
markibrahim.bsky.social
Open-weights for our Llip multimodal vision-language model led by @lavoiems.bsky.social are public!

LLIP proposes new pre-training objective to capture the many ways to describe an image leading to strong performance across a suite of 22-zero shot benchmarks.

bsky.app/profile/lavo...
markibrahim.bsky.social
We also find better models are not necessarily better at abstention, suggesting the skill of abstention is an open-research question.

w/ @polkirichenko.bsky.social Sam Bell Kamalika Chaudhuri

Paper: arxiv.org/abs/2506.09038
Code: github.com/facebookrese...

bsky.app/profile/polk...

🧵2/2
markibrahim.bsky.social
A good language model should say “I don’t know” by reasoning about the limits of its knowledge. Our new work AbstentionBench carefully measures this overlooked skill in an open-codebase others can build on!

We find frontier reasoning degrades models’ ability to know when NOT to answer.

🧵1/2
markibrahim.bsky.social
Join us as a PhD research intern at FAIR w/ @polkirichenko.bsky.social and Kamalika Chaudhuri

to start this summer or fall with a focus on open science into multimodal models, agents and beyond! Email [email protected] with the title [Prospective Intern 2025] and attach your CV if interested!
markibrahim.bsky.social
We found MLM-U training can even outperform transformers trained with additional supervision from A* search traces, showing the promise of alternative learning objectives.

Learn more on our site and code at facebookresearch.github.io/maze_navigat...
MLM-U
facebookresearch.github.io
markibrahim.bsky.social
Recently, we also applied the same MLM-U objective to maze navigation. We find when training parameter-matched transformers on identical data, MLM-U without any tweaks outperforms standard next token training across all maze grid sizes (up to 30x30).
markibrahim.bsky.social
We find MLM-U training improves knowledge retrieval on Wikipedia-based questions and even outperforms a pretrained 7B Mistral model with a much smaller 100M parameter transformer trained from scratch!

Come by our NeurIPS poster Exhibit Halls A-C #3204 11am PST Thursday to learn more.
markibrahim.bsky.social
We show training with a factorization agnostic objective, MLM-U (a variable ratio BERT-style loss with links to discrete diffusion), that predicts multiple tokens ahead and back can significantly mitigate the reversal curse!
markibrahim.bsky.social
Problem: Language models struggle with the “reversal curse:” an inability to answer reformulations of a question. We show this stems from the standard next token learning objective in what we call “the factorization curse.”
markibrahim.bsky.social
Can we boost transformers’ ability to retrieve knowledge and plan in maze navigation by only tweaking the learning objective?

We emphatically say YES in our #NeurIPS 2024 study! 🧵

w/ Ouail Kitouni, Niklas Nolte, Diane Bouchacourt, Adina Williams, and Mike Rabbat

Paper arxiv.org/abs/2406.05183