Lightnews — Scholar-powered news

Laura Kopf

@lkopf.bsky.social

320 followers 380 following 17 posts

PhD student in Interpretable Machine Learning at TU Berlin & BIFOLD

Posts Media Videos Starter Packs

Pinned

Laura Kopf @lkopf.bsky.social · Jun 19

🔍 When do neurons encode multiple concepts?

We introduce PRISM, a framework for extracting multi-concept feature descriptions to better understand polysemanticity.

📄 Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework
arxiv.org/abs/2506.15538

🧵 (1/7)

Reposted by Laura Kopf

Nils Feldhus @nfel.bsky.social · 6d

🔍 Are you curious about uncovering the underlying mechanisms and identifying the roles of model components (neurons, …) and abstractions (SAEs, …)?

We provide the first survey of concept description generation and evaluation methods.

Joint effort w/ @lkopf.bsky.social

📄 arxiv.org/abs/2510.01048

Overview of descriptions for model components (neurons, attention heads) and model abstractions (SAE features, circuits).

Laura Kopf @lkopf.bsky.social · 19d

Many thanks as well to the institutions that supported this research:
@tuberlin.bsky.social
@bifold.berlin
UMI Lab
@fraunhoferhhi.bsky.social
@unipotsdam.bsky.social
@leibnizatb.bsky.social

Laura Kopf @lkopf.bsky.social · 19d

I’m very grateful to my amazing collaborators @nfel.bsky.social, @kirillbykov.bsky.social, @philinelb.bsky.social, Anna Hedström, Marina M.-C. Höhne, and @eberleoliver.bsky.social 🙏

Laura Kopf @lkopf.bsky.social · 19d

Happy to share that our PRISM paper has been accepted at #NeurIPS2025 🎉

In this work, we introduce a multi-concept feature description framework that can identify and score polysemantic features.

📄 Paper: arxiv.org/abs/2506.15538

#NeurIPS #MechInterp #XAI

Laura Kopf @lkopf.bsky.social · Jun 19

Grateful to the institutions that supported this work:
@tuberlin.bsky.social
@bifold.berlin
UMI Lab
@fraunhoferhhi.bsky.social
@unipotsdam.bsky.social
@leibnizatb.bsky.social

(7/7)

Laura Kopf @lkopf.bsky.social · Jun 19

Many thanks to my amazing co-authors:
@nfel.bsky.social
@kirillbykov.bsky.social
@philinelb.bsky.social
Anna Hedström
Marina M.-C. Höhne
@eberleoliver.bsky.social

(6/7)

Laura Kopf @lkopf.bsky.social · Jun 19

Our results highlight that the PRISM framework not only provides multiple human interpretable descriptions for neurons but also aligns with the human interpretation of polysemanticity. (5/7)

Laura Kopf @lkopf.bsky.social · Jun 19

In exploring the concept space, we use PRISM to characterize more complex components, finding and interpreting patterns that specific attention heads or groups of neurons respond to. (4/7)

Laura Kopf @lkopf.bsky.social · Jun 19

We benchmark PRISM across layers and architectures, showing how polysemanticity and interpretability shift through the model. (3/7)

Laura Kopf @lkopf.bsky.social · Jun 19

PRISM samples sentences from the top percentile activation distribution, clusters them in embedding space, and uses an LLM to generate labels for each concept cluster. (2/7)

Laura Kopf @lkopf.bsky.social · Jun 19

🔍 When do neurons encode multiple concepts?

We introduce PRISM, a framework for extracting multi-concept feature descriptions to better understand polysemanticity.

📄 Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework
arxiv.org/abs/2506.15538

🧵 (1/7)

Laura Kopf @lkopf.bsky.social · Dec 13

Huge thanks to my incredible supervisor
@kirillbykov.bsky.social, who laid the foundation for this project and provided brilliant guidance 🙏, and to @philinelb.bsky.social and Sebastian Lapuschkin, who unfortunately couldn’t be there.

Laura Kopf @lkopf.bsky.social · Dec 13

Still overwhelmed by the amazing response to our poster session at @neuripsconf.bsky.social with Anna Hedström and Marina Höhne! It was incredible to have such lively and inspiring discussions with brilliant people whose work I admire. ✨

Laura Kopf @lkopf.bsky.social · Dec 12

Thanks for putting together this amazing list Margaret! I would love to be added if you still have space :)

Laura Kopf @lkopf.bsky.social · Dec 11

Want to know more about CoSy?
📄 Paper: arxiv.org/abs/2405.20331
💻 Code: github.com/lkopf/cosy
🔗 Poster: neurips.cc/virtual/2024...

#NeurIPS2024 #MechInterp #ExplainableAI #Interpretability

Laura Kopf @lkopf.bsky.social · Dec 11

Special thanks to our supporting institutions: UMI Lab, @xtraexer.bsky.social, @tuberline.bsky.social, Uni Potsdam, ATB Potsdam, and Fraunhofer Heinrich-Hertz-Institut.

Laura Kopf @lkopf.bsky.social · Dec 11

My co-authors Anna Hedström and Marina Höhne will also be at @neuripsconf.bsky.social. A big thank you to my other co-authors @kirillbykov.bsky.social, @philinelb.bsky.social and Sebastian Lapuschkin, who unfortunately couldn’t be there.

Laura Kopf @lkopf.bsky.social · Dec 11

I’ll be presenting our work at @neuripsconf.bsky.social in Vancouver! 🎉
Join me this Thursday, December 12th, in East Exhibit Hall A-C, Poster #3107, from 11 a.m. PST to 2 p.m. PST. I'll be discussing our paper “CoSy: Evaluating Textual Explanations of Neurons.”