claudia shi
@claudiashi.bsky.social
490 followers 69 following 18 posts
machine learning, causal inference, science of llm, ai safety, phd student @bleilab, keen bean https://www.claudiashi.com/
Posts Media Videos Starter Packs
claudiashi.bsky.social
Our tests reveal gaps between the idealized version of the circuit representation and what we find in practice. By formalizing desirable properties, we hope to refine the circuit hypothesis, addressing questions such as what is the "optimal" level of granularity
claudiashi.bsky.social
Findings: Synthetic circuits align with all the ideal criteria. Semi-synthetic circuits pass some of the idealized tests. Circuits in the wild pass none of the idealized tests
claudiashi.bsky.social

We apply our tests to six benchmark circuits from the literature: two synthetic circuits, two semi-synthetic circuits (circuits discovered on toy transformer models), and two circuits in the wild (circuits discovered on transformer models such as GPT-2).
claudiashi.bsky.social
We compare the candidate circuit against random circuits drawn from a reference distribution. We vary the reference distribution to change the hardness of the test.
claudiashi.bsky.social
The idealized tests are stringent, so we developed two flexible tests that quantify:

Sufficiency Test: How faithful is faithful enough?
Partial Necessity Test: How much knockdown effect is significant?
claudiashi.bsky.social

Independence Test: Removing the circuit renders the model output independent of that of the circuit

Minimality Test: All edges in the circuit are necessary for the task
claudiashi.bsky.social
We translate these properties into three idealized tests:

Equivalence Test: The circuit and the original model have the same chance of outperforming each other
claudiashi.bsky.social
We formalize three criteria of an idealized circuit and develop hypothesis tests for them:
1️⃣ Mechanism Preservation: The circuit should preserve the model's behavior
2️⃣ Localization: Removing the circuit disables the task
3️⃣ Minimality: The circuit contains no redundant parts
claudiashi.bsky.social
The circuit hypothesis proposes that LLM capabilities emerge from small subnetworks within the model. But how can we actually test this? 🤔

joint work with @velezbeltran.bsky.social @maggiemakar.bsky.social @anndvision.bsky.social @bleilab.bsky.social Adria @far.ai Achille and Caro
claudiashi.bsky.social
I'd love to be added to the starter pack! I work on causal inference.
claudiashi.bsky.social
Hi Rob, I'd love to added to the starter pack.
claudiashi.bsky.social
I'd love to be added! i am bayesian adjacent!
claudiashi.bsky.social
i'd love to be added!
claudiashi.bsky.social
Hi! I'd love to be added to the starter pack!
claudiashi.bsky.social
Hi! could you also add me to the mech interp list? I do mech interp research.
claudiashi.bsky.social
@datatherapist.bsky.social i'd love to be added to the new one! thank you