Lightnews — Scholar-powered news

Tal Haklay

@talhaklay.bsky.social

53 followers 330 following 28 posts

NLP | Interpretability | PhD student at the Technion

Posts Media Videos Starter Packs

Pinned

Tal Haklay @talhaklay.bsky.social · Mar 6

1/13 LLM circuits tell us where the computation happens inside the model—but the computation varies by token position, a key detail often ignored!
We propose a method to automatically find position-aware circuits, improving faithfulness while keeping circuits compact. 🧵👇

1 7 25

Tal Haklay @talhaklay.bsky.social · May 22

Project page >> peap-circuits.github.io
Arxiv >> arxiv.org/abs/2502.04577

Position-aware Automatic Circuit Discovery – Project Page

peap-circuits.github.io

Tal Haklay @talhaklay.bsky.social · May 22

Our paper "Position-Aware Automatic Circuit Discovery" got accepted to ACL! 🎉

Huge thanks to my collaborators🙏
@hadasorgad.bsky.social
@davidbau.bsky.social
@amuuueller.bsky.social
@boknilev.bsky.social

See you in Vienna! 🇦🇹 #ACL2025 @aclmeeting.bsky.social

1 2 13

Reposted by Tal Haklay

Actionable Interpretability Workshop ICML2025 @actinterp.bsky.social · May 20

🚨 We're looking for more reviewers for the workshop!
📆 Review period: May 24-June 7

If you're passionate about making interpretability useful and want to help shape the conversation, we'd love your input.

💡🔍 Self-nominate here:
docs.google.com/forms/d/e/1F...

5 6

Tal Haklay @talhaklay.bsky.social · May 14

Website & CFP >> actionable-interpretability.github.io

General Information

July 19 - ICML 2025 - Vancouver

actionable-interpretability.github.io

Tal Haklay @talhaklay.bsky.social · May 14

We knew many of you wanted to submit to our Actionable Interpretability workshop, but we didn’t expect to crash Overleaf! 😏🍃

Only 5 days left ⏰!
Got a paper accepted to ICML that fits our theme?
Submit it to our conference track!
👉 @actinterp.bsky.social

1 2 5

Reposted by Tal Haklay

Aaron Mueller @amuuueller.bsky.social · Apr 23

This was a huge collaboration with many great folks! If you get a chance, be sure to talk to Atticus Geiger, @sarah-nlp.bsky.social, @danaarad.bsky.social, Iván Arcuschin, @adambelfki.bsky.social, @yiksiu.bsky.social, Jaden Fiotto-Kaufmann, @talhaklay.bsky.social, @michaelwhanna.bsky.social, ...

1 1 8

Tal Haklay @talhaklay.bsky.social · Apr 7

Website >> actionable-interpretability.github.io

General Information

ICML 2025 - Vancouver

actionable-interpretability.github.io

Tal Haklay @talhaklay.bsky.social · Apr 7

6. Position papers: Critical discussions on the feasibility, limitations, and future directions of actionable interpretability research. We also invite perspectives that question whether actionability should be a goal of interpretability research.

Tal Haklay @talhaklay.bsky.social · Apr 7

5. Developing realistic benchmarking and assessment methods to measure the real-world impact of interpretability insights, particularly in production environments and large-scale models.

Tal Haklay @talhaklay.bsky.social · Apr 7

4. Incorporating interpretability–often focusing on micro-level decision analysis–into more complex scenarios, like reasoning processes or multi-turn interactions.

Tal Haklay @talhaklay.bsky.social · Apr 7

3. New model architectures, training paradigms or design choices informed by interpretability findings.

Tal Haklay @talhaklay.bsky.social · Apr 7

2. Comparative analyses of interpretability-based approaches versus alternative techniques like fine-tuning, prompting, and more.

Tal Haklay @talhaklay.bsky.social · Apr 7

1.Practical applications of interpretability insights to address key challenges in AI such as hallucinations, biases, and adversarial robustness, as well as applications in high-stakes, less-explored domains like healthcare, finance, and cybersecurity.

1 1

Tal Haklay @talhaklay.bsky.social · Apr 7

🚨 Call for Papers is Out!

The First Workshop on 𝐀𝐜𝐭𝐢𝐨𝐧𝐚𝐛𝐥𝐞 𝐈𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭𝐚𝐛𝐢𝐥𝐢𝐭𝐲 will be held at ICML 2025 in Vancouver!

📅 Submission Deadline: May 9
Follow us >> @ActInterp

🧠Topics of interest include: 👇

1 3 5

Tal Haklay @talhaklay.bsky.social · Mar 31

Amazing news: our workshop was accepted to ICML 2025!

Interpretability research sheds light on how models work—but too often, those insights don’t translate into actions that improve them.
Our workshop aims to challenge the interpretability community to go further.

Mor Geva @megamor2.bsky.social · Mar 31

🎉 Our Actionable Interpretability workshop has been accepted to #ICML2025! 🎉
> Follow @actinterp.bsky.social
> Website actionable-interpretability.github.io

@talhaklay.bsky.social @anja.re @mariusmosbach.bsky.social @sarah-nlp.bsky.social @iftenney.bsky.social

Paper submission deadline: May 9th!

Tal Haklay @talhaklay.bsky.social · Mar 6

13/13 This work was done in collaboration with @hadasorgad.bsky.social , @davidbau.bsky.social , @amuuueller.bsky.social and @boknilev.bsky.social.

💡 Thoughts? Questions? Let’s discuss!
Website >> peap-circuits.github.io
Arxiv >> arxiv.org/abs/2502.04577

Position-aware Automatic Circuit Discovery

A widely used strategy to discover and understand language model mechanisms is circuit analysis. A circuit is a minimal subgraph of a model's computation graph that executes a specific task. We identi...

arxiv.org

Tal Haklay @talhaklay.bsky.social · Mar 6

12/13 We evaluate our automatic pipeline across three datasets and two models, demonstrating that:

1️⃣ Our pipeline discovers circuits with a better tradeoff between size and faithfulness compared to EAP.
2️⃣ Our pipeline produces results comparable to those obtained when human experts define a schema.

1 1

Tal Haklay @talhaklay.bsky.social · Mar 6

11/13 But where does this schema come from? And how do we determine the boundaries of each span within each example? Sounds like we just added more work for researchers! 😅
Actually, we show that an LLM (Claude) can do a pretty decent job at defining a schema and tagging all examples accordingly.

1 2

Tal Haklay @talhaklay.bsky.social · Mar 6

10/13 After defining a schema, we construct an abstract computation graph where each span type corresponds to a single token position. We then map attribution scores from example-specific computation graphs to the abstract graph and identify circuits within it.

1 1

Tal Haklay @talhaklay.bsky.social · Mar 6

9/13 To address this problem, we introduce the concept of a 𝙙𝙖𝙩𝙖𝙨𝙚𝙩 𝙨𝙘𝙝𝙚𝙢𝙖, which defines token spans with similar semantics across examples in the dataset.

1 1

Tal Haklay @talhaklay.bsky.social · Mar 6

8/13 But you may notice an issue...
What if the examples in a dataset vary in length and structure?
Discovering a circuit in such cases is not straightforward, leading many researchers to focus only on datasets with uniform length and structure.

1 1

Tal Haklay @talhaklay.bsky.social · Mar 6

7/13 First improvement :
We introduce 𝗣𝗼𝘀𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗘𝗱𝗴𝗲 𝗔𝘁𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻 𝗣𝗮𝘁𝗰𝗵𝗶𝗻𝗴 (𝗣𝗘𝗔𝗣)
—an extension of EAP that allows us to discover circuits that differentiate between token positions. The key advancement? Our approach uncovers "attention edges", revealing dependencies missed by previous methods.

1 1

Tal Haklay @talhaklay.bsky.social · Mar 6

6/13 The Problem:
Automatic circuit discovery methods like Edge Attribution Patching (EAP) and EAP-IP implicitly assume that circuits are position-invariant—they do not differentiate between components at different token positions.

As a result, the circuit may include irrelevant components.

1 1