We propose a method to automatically find position-aware circuits, improving faithfulness while keeping circuits compact. 🧵👇
Huge thanks to my collaborators🙏
@hadasorgad.bsky.social
@davidbau.bsky.social
@amuuueller.bsky.social
@boknilev.bsky.social
See you in Vienna! 🇦🇹 #ACL2025 @aclmeeting.bsky.social
Huge thanks to my collaborators🙏
@hadasorgad.bsky.social
@davidbau.bsky.social
@amuuueller.bsky.social
@boknilev.bsky.social
See you in Vienna! 🇦🇹 #ACL2025 @aclmeeting.bsky.social
📆 Review period: May 24-June 7
If you're passionate about making interpretability useful and want to help shape the conversation, we'd love your input.
💡🔍 Self-nominate here:
docs.google.com/forms/d/e/1F...
📆 Review period: May 24-June 7
If you're passionate about making interpretability useful and want to help shape the conversation, we'd love your input.
💡🔍 Self-nominate here:
docs.google.com/forms/d/e/1F...
Only 5 days left ⏰!
Got a paper accepted to ICML that fits our theme?
Submit it to our conference track!
👉 @actinterp.bsky.social
Only 5 days left ⏰!
Got a paper accepted to ICML that fits our theme?
Submit it to our conference track!
👉 @actinterp.bsky.social
The First Workshop on 𝐀𝐜𝐭𝐢𝐨𝐧𝐚𝐛𝐥𝐞 𝐈𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭𝐚𝐛𝐢𝐥𝐢𝐭𝐲 will be held at ICML 2025 in Vancouver!
📅 Submission Deadline: May 9
Follow us >> @ActInterp
🧠Topics of interest include: 👇
The First Workshop on 𝐀𝐜𝐭𝐢𝐨𝐧𝐚𝐛𝐥𝐞 𝐈𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭𝐚𝐛𝐢𝐥𝐢𝐭𝐲 will be held at ICML 2025 in Vancouver!
📅 Submission Deadline: May 9
Follow us >> @ActInterp
🧠Topics of interest include: 👇
Interpretability research sheds light on how models work—but too often, those insights don’t translate into actions that improve them.
Our workshop aims to challenge the interpretability community to go further.
> Follow @actinterp.bsky.social
> Website actionable-interpretability.github.io
@talhaklay.bsky.social @anja.re @mariusmosbach.bsky.social @sarah-nlp.bsky.social @iftenney.bsky.social
Paper submission deadline: May 9th!
Interpretability research sheds light on how models work—but too often, those insights don’t translate into actions that improve them.
Our workshop aims to challenge the interpretability community to go further.
We propose a method to automatically find position-aware circuits, improving faithfulness while keeping circuits compact. 🧵👇
We propose a method to automatically find position-aware circuits, improving faithfulness while keeping circuits compact. 🧵👇
Ever wonder whether verbalized CoTs correspond to the internal reasoning process of the model?
We propose a novel parametric faithfulness approach, which erases information contained in CoT steps from the model parameters to assess CoT faithfulness.
arxiv.org/abs/2502.14829
Ever wonder whether verbalized CoTs correspond to the internal reasoning process of the model?
We propose a novel parametric faithfulness approach, which erases information contained in CoT steps from the model parameters to assess CoT faithfulness.
arxiv.org/abs/2502.14829