💡 Thoughts? Questions? Let’s discuss!
Website >> peap-circuits.github.io
Arxiv >> arxiv.org/abs/2502.04577
💡 Thoughts? Questions? Let’s discuss!
Website >> peap-circuits.github.io
Arxiv >> arxiv.org/abs/2502.04577
1️⃣ Our pipeline discovers circuits with a better tradeoff between size and faithfulness compared to EAP.
2️⃣ Our pipeline produces results comparable to those obtained when human experts define a schema.
1️⃣ Our pipeline discovers circuits with a better tradeoff between size and faithfulness compared to EAP.
2️⃣ Our pipeline produces results comparable to those obtained when human experts define a schema.
Actually, we show that an LLM (Claude) can do a pretty decent job at defining a schema and tagging all examples accordingly.
Actually, we show that an LLM (Claude) can do a pretty decent job at defining a schema and tagging all examples accordingly.
What if the examples in a dataset vary in length and structure?
Discovering a circuit in such cases is not straightforward, leading many researchers to focus only on datasets with uniform length and structure.
What if the examples in a dataset vary in length and structure?
Discovering a circuit in such cases is not straightforward, leading many researchers to focus only on datasets with uniform length and structure.
We introduce 𝗣𝗼𝘀𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗘𝗱𝗴𝗲 𝗔𝘁𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻 𝗣𝗮𝘁𝗰𝗵𝗶𝗻𝗴 (𝗣𝗘𝗔𝗣)
—an extension of EAP that allows us to discover circuits that differentiate between token positions. The key advancement? Our approach uncovers "attention edges", revealing dependencies missed by previous methods.
We introduce 𝗣𝗼𝘀𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗘𝗱𝗴𝗲 𝗔𝘁𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻 𝗣𝗮𝘁𝗰𝗵𝗶𝗻𝗴 (𝗣𝗘𝗔𝗣)
—an extension of EAP that allows us to discover circuits that differentiate between token positions. The key advancement? Our approach uncovers "attention edges", revealing dependencies missed by previous methods.
Automatic circuit discovery methods like Edge Attribution Patching (EAP) and EAP-IP implicitly assume that circuits are position-invariant—they do not differentiate between components at different token positions.
As a result, the circuit may include irrelevant components.
Automatic circuit discovery methods like Edge Attribution Patching (EAP) and EAP-IP implicitly assume that circuits are position-invariant—they do not differentiate between components at different token positions.
As a result, the circuit may include irrelevant components.