Lightnews — Scholar-powered news

Daniel Scalena @danielsc4.it · Aug 20

I’ll be attending the NEMI 2025 workshop this Friday and presenting a poster👇.

Happy to chat about cool interpretability stuff there!

David Bau @davidbau.bsky.social · Aug 18

This Friday NEMI 2025 is at Northeastern in Boston, 8 talks, 24 roundtables, 90 posters; 200+ attendees. Thanks to
goodfire.ai/ for sponsoring! nemiconf.github.io/summer25/

If you can't make it in person, the livestream will be here:
www.youtube.com/live/4BJBis...

New England Mechanistic Interpretability Workshop

About:The New England Mechanistic Interpretability (NEMI) workshop aims to bring together academic and industry researchers from the New England and surround...

www.youtube.com

1

Daniel Scalena @danielsc4.it · May 23

📝 Paper: arxiv.org/abs/2505.16612
🔗 Code: github.com/DanielSc4/st...

Thanks to my amazing co-authors:
@gsarti.com , @arianna-bis.bsky.social , Elisabetta Fersini, @malvinanissim.bsky.social
7/7

Steering Large Language Models for Machine Translation Personalization

High-quality machine translation systems based on large language models (LLMs) have simplified the production of personalized translations reflecting specific stylistic constraints. However, these sys...

arxiv.org

3

Daniel Scalena @danielsc4.it · May 23

🔍 What’s happening in the model?
We find that SAE steering and multi-shot prompting impact internal representations similarly, suggesting insight from user examples are summarized with extra interpretability potential (look at latents) and better efficiency (no long context) 6/

1 1

Daniel Scalena @danielsc4.it · May 23

🌍 Across 7 languages, our SAE-based method matches or outperforms traditional prompting methods! Our method obtains better human-like translations (H) personalization accuracy (P), and maintains translation quality (Comet ☄️ @nunonmg.bsky.social) especially for smaller LLMs. 5/

1 1

Daniel Scalena @danielsc4.it · May 23

💡 We compare prompting (zero and multi-shot + explanations) and inference-time interventions (ActAdd, REFT and SAEs).

Following SpARE (@yuzhaouoe.bsky.social @alessiodevoto.bsky.social), we propose ✨ contrastive SAE steering ✨ with mutual info to personalize literary MT by tuning latent features 4/

1 2 4

Daniel Scalena @danielsc4.it · May 23

📈 But can models recognize and replicate individual translator styles?:
✓ Classifiers can find styles with high acc. (humans kinda don’t)
✓ Multi-shot prompting boosts style a lot
✓ We can detect strong style traces in activations (esp. mid layers) 3/

1 2

Daniel Scalena @danielsc4.it · May 23

📘 Literary translation isn't just about accuracy, but also creatively conveying meaning across languages. But LLMs prompted for MT are very literal. Prompting & steering to the rescue!

Can we personalize LLM’s MT when few examples are available, without further tuning? 🔍 2/

1 2

Daniel Scalena @danielsc4.it · May 23

📢 New paper: Applied interpretability 🤝 MT personalization!

We steer LLM generations to mimic human translator styles on literary novels in 7 languages. 📚

SAE steering can beat few-shot prompting, leading to better personalization while maintaining quality.

🧵1/

1 5 17

Daniel Scalena @danielsc4.it · Dec 4

Hellooo 👀

1 1

Daniel Scalena @danielsc4.it · Nov 28

Hey hello! 👋

1

Daniel Scalena @danielsc4.it · Nov 21

Now on 🦋!

GroNLP @gronlp.bsky.social · Nov 21

Hallo wereld 🐮! We are the Computational Linguistics group at the University of Groningen, follow us for updates about our research in natural language processing, machine learning, speech technology, digital humanities and more!

go.bsky.app/UDf92a2

3

Daniel Scalena @danielsc4.it · Nov 19

Hello!

1

Daniel Scalena @danielsc4.it · Nov 19

👋

1

Daniel Scalena @danielsc4.it · Nov 17

It was great, I'm starting to get tickets for next year!

Gabriele Sarti @gsarti.com · Nov 17

Inaugurating my bsky account by calling #EMNLP2024 a wrap! Had lots of fun presenting our work with @danielsc4.bsky.social and Jirui, and partied hard at the RiTA 🇮🇹 meetup (60+ people joined!). See you next year in Suzhou! 🇨🇳

Model Internals-based Answer Attribution for Trustworthy Retrieval Augemented Generation, https://aclanthology.org/2024.emnlp-main.347/

Multi-property Steering of Large Language Models with Dynamic Activation Composition, https://aclanthology.org/2024.blackboxnlp-1.34/

1

Daniel Scalena @danielsc4.it · Nov 16

👀🙋‍♂️

1