Daniel Scalena
@danielsc4.it
420 followers 190 following 15 posts
PhDing @unimib 🇮🇹 & @gronlp.bsky.social 🇳🇱, interpretability et similia danielsc4.it
Posts Media Videos Starter Packs
danielsc4.it
I’ll be attending the NEMI 2025 workshop this Friday and presenting a poster👇.

Happy to chat about cool interpretability stuff there!
danielsc4.it
🔍 What’s happening in the model?
We find that SAE steering and multi-shot prompting impact internal representations similarly, suggesting insight from user examples are summarized with extra interpretability potential (look at latents) and better efficiency (no long context) 6/
danielsc4.it
🌍 Across 7 languages, our SAE-based method matches or outperforms traditional prompting methods! Our method obtains better human-like translations (H) personalization accuracy (P), and maintains translation quality (Comet ☄️ @nunonmg.bsky.social) especially for smaller LLMs. 5/
danielsc4.it
💡 We compare prompting (zero and multi-shot + explanations) and inference-time interventions (ActAdd, REFT and SAEs).

Following SpARE (@yuzhaouoe.bsky.social @alessiodevoto.bsky.social), we propose ✨ contrastive SAE steering ✨ with mutual info to personalize literary MT by tuning latent features 4/
danielsc4.it
📈 But can models recognize and replicate individual translator styles?:
✓ Classifiers can find styles with high acc. (humans kinda don’t)
✓ Multi-shot prompting boosts style a lot
✓ We can detect strong style traces in activations (esp. mid layers) 3/
danielsc4.it
📘 Literary translation isn't just about accuracy, but also creatively conveying meaning across languages. But LLMs prompted for MT are very literal. Prompting & steering to the rescue!

Can we personalize LLM’s MT when few examples are available, without further tuning? 🔍 2/
danielsc4.it
📢 New paper: Applied interpretability 🤝 MT personalization!

We steer LLM generations to mimic human translator styles on literary novels in 7 languages. 📚

SAE steering can beat few-shot prompting, leading to better personalization while maintaining quality.

🧵1/
danielsc4.it
Hey hello! 👋
danielsc4.it
Now on 🦋!
gronlp.bsky.social
Hallo wereld 🐮! We are the Computational Linguistics group at the University of Groningen, follow us for updates about our research in natural language processing, machine learning, speech technology, digital humanities and more!

go.bsky.app/UDf92a2
danielsc4.it
It was great, I'm starting to get tickets for next year!
gsarti.com
Inaugurating my bsky account by calling #EMNLP2024 a wrap! Had lots of fun presenting our work with @danielsc4.bsky.social and Jirui, and partied hard at the RiTA 🇮🇹 meetup (60+ people joined!). See you next year in Suzhou! 🇨🇳
Welcome reception @ Hyatt Regency Hotel Model Internals-based Answer Attribution for Trustworthy Retrieval Augemented Generation,  https://aclanthology.org/2024.emnlp-main.347/ Multi-property Steering of Large Language Models with Dynamic Activation Composition, https://aclanthology.org/2024.blackboxnlp-1.34/ Social Dinner @ Frost Science Museum
danielsc4.it
👀🙋‍♂️