Gabriele Sarti
banner
gsarti.com
Gabriele Sarti
@gsarti.com
Postdoc @ Northeastern, @ndif-team.bsky.social with @davidbau.bsky.social. Interpretability ∩ HCI ∩ #NLProc. Creator of @inseq.org. Prev: PhD @gronlp.bsky.social, ML @awscloud.bsky.social & Aindo

gsarti.com
Definitely impressive and a sign that we have crossed the superhuman threshold in data-rich bounded domains that AI developers care about, such as math! Hard to tell though whether progress will be achievable at the same speed & scale in fuzzier areas, less resources areas
January 31, 2026 at 3:59 AM
Limited high-quality expert data in narrow domains I guess. I do believe that in this context self-play is the bet that has the potential to go beyond the data curse, but hard to tell whether that will scale to progressively less bounded domains.
January 31, 2026 at 3:50 AM
I think from current trends it's not unreasonable to think we'll soon cap out at the sharp edges of the jagged capability frontier. Still, there's a long tail of sub-human-level tasks that will see steady progress for years to come.
January 31, 2026 at 3:45 AM
No notes, matches expectations!
January 30, 2026 at 3:30 AM
The gap between abstracts and actual submissions was roughly 9k papers for ICML, with ICLR authors being incentivized to submit an abstract because of unreleased decisions. It's probably possible to get to a rough estimate from there!
January 30, 2026 at 12:29 AM
Great writeup, I thoroughly enjoyed it!
January 23, 2026 at 1:03 PM
This work aims to contribute to AI safety by developing detection methods, characterizing which architectures are prone to these behaviors, and creating resources for the broader research community. See more in the proposal: sparai.org/projects/sp2...
Monitoring and Attributing Implicit Personalization in Conversational Agents - SPAR Project
This project investigates implicit personalization, i.e. how conversational models form implicit beliefs about their users, focusing in particular how these bel...
sparai.org
January 9, 2026 at 2:09 PM
You can find relevant references in the project description, including excellent work by Transluce, @veraneplenbroek.bsky.social, @arianna-bis.bsky.social @wattenberg.bsky.social , etc. building on recent advances in extracting latent user representations for understanding personalization behaviors.
January 9, 2026 at 2:09 PM
Mentees will take a leading role in defining research questions, reviewing literature, conducting technical work, adapting codebases, training decoders, or building evaluation pipelines. We'll prioritize 1-2 directions based on your background and interests.
January 9, 2026 at 2:09 PM
This project aims to expand our understanding of implicit personalization in LLMs: how models form user beliefs, which elements in prompts/training drive these behaviors, and how we can leverage interpretability methods for control beyond simple detection.
January 9, 2026 at 2:09 PM
When language models interact with users, they implicitly infer user attributes (expertise, demographics, beliefs) that influence responses in ways users neither expect nor endorse. This hidden personalization can lead to sycophancy, deception, and demographic bias.
January 9, 2026 at 2:09 PM
For more info, see ndif.us, and check out the amazing NNSight toolkit for extracting and analyzing the internals of any Torch-compatible model! nnsight.net
NSF National Deep Inference Fabric
NDIF is a research computing project that enables researchers and students to crack open the mysteries inside large-scale AI systems.
ndif.us
January 4, 2026 at 6:44 PM