Julian Minder
@jkminder.bsky.social
1.1K followers
380 following
30 posts
PhD at EPFL with Robert West, Master at ETHZ
Mainly interested in Language Model Interpretability and Model Diffing.
MATS 7.0 Winter 2025 Scholar w/ Neel Nanda
jkminder.ch
Posts
Media
Videos
Starter Packs
Julian Minder
@jkminder.bsky.social
· Sep 5
Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences — AI Alignment Forum
This is a preliminary research update. We are continuing our investigation and will publish a more in-depth analysis soon. The work was done as part…
www.alignmentforum.org
Julian Minder
@jkminder.bsky.social
· Sep 5
Julian Minder
@jkminder.bsky.social
· Sep 5
Reposted by Julian Minder
Tiago Pimentel
@tpimentel.bsky.social
· Jul 14
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
The concept of causal abstraction got recently popularised to demystify the opaque decision-making processes of machine learning models; in short, a neural network can be abstracted as a higher-level ...
arxiv.org
Reposted by Julian Minder
Julian Minder
@jkminder.bsky.social
· Jun 30
Julian Minder
@jkminder.bsky.social
· Jun 30
Reposted by Julian Minder
Julian Minder
@jkminder.bsky.social
· Nov 22
Controllable Context Sensitivity and the Knob Behind It
When making predictions, a language model must trade off how much it relies on its context vs. its prior knowledge. Choosing how sensitive the model is to its context is a fundamental functionality, a...
arxiv.org