Gabriele Sarti
@gsarti.com
1.6K followers
860 following
120 posts
PhD Student at @gronlp.bsky.social 🐮, core dev @inseq.org. Interpretability ∩ HCI ∩ #NLProc.
gsarti.com
Posts
Media
Videos
Starter Packs
Pinned
Gabriele Sarti
@gsarti.com
· Jan 15
Reposted by Gabriele Sarti
Reposted by Gabriele Sarti
Reposted by Gabriele Sarti
Reposted by Gabriele Sarti
Gabriele Sarti
@gsarti.com
· 19d
Reposted by Gabriele Sarti
Yoav Goldberg
@yoavgo.bsky.social
· Aug 27
Humans Perceive Wrong Narratives from AI Reasoning Texts
A new generation of AI models generates step-by-step reasoning text before producing an answer. This text appears to offer a human-readable window into their computation process, and is increasingly r...
arxiv.org
Reposted by Gabriele Sarti
Gabriele Sarti
@gsarti.com
· Aug 21
Reposted by Gabriele Sarti
Reposted by Gabriele Sarti
Marianne de Heer Kloots
@mdhk.net
· Aug 19
Gabriele Sarti
@gsarti.com
· Aug 19
Gabriele Sarti
@gsarti.com
· Aug 19
Subliminal Learning: Language models transmit behavioral traits via hidden signals in data
We study subliminal learning, a surprising phenomenon where language models transmit behavioral traits via semantically unrelated data. In our main experiments, a "teacher" model with some trait T (su...
arxiv.org