Lightnews — Scholar-powered news

Neehar Kondapaneni @therealpaneni.bsky.social · Jul 8

This work was done in collaboration with @oisinmacaodha and @PietroPerona. It builds on our earlier related work RSVC (ICLR 2025). Check out our project page here nkondapa.github.io/rdx-page/ and our preprint here arxiv.org/abs/2505.23917.

Representational Difference Explanations (RDX)

Isolating and creating explanations of representational differences between two vision models.

nkondapa.github.io

Neehar Kondapaneni @therealpaneni.bsky.social · Jul 8

TLDR: ٍٍRDX is a new method for isolating representational differences and leads to insights about subtle, yet, important differences between models. We test it on vision models, but the method is general and can be applied to any representational space.

1

Neehar Kondapaneni @therealpaneni.bsky.social · Jul 8

Due to these issues we took a graph-based approach for RDX that does not use combinations of concept vectors. That means the explanation grid and the concept are equivalent -- what you see is what you get. This makes it much simpler to interpret RDX outputs.

1

Neehar Kondapaneni @therealpaneni.bsky.social · Jul 8

Even on a simple MNIST model, it is essentially impossible to anticipate that a weighted sum over these explanations results in this normal-looking five. Linear combinations of explanation grids are tricky to understand!

1

Neehar Kondapaneni @therealpaneni.bsky.social · Jul 8

Notably, we noticed two challenges with applying DL methods to model comparison. Explanations from DL methods are a grid of images (for vision). These grids (1) can overly simplify the underlying concept and/or (2) must be interpreted as part of a linear combination of concepts.

1

Neehar Kondapaneni @therealpaneni.bsky.social · Jul 8

We compare RDX to several popular dictionary-learning (DL) methods (like SAEs and NMF) and find that the DL methods struggle. In the spotted wing (SW) comparison experiment, we find that NMF shows model similarities rather than differences.

1

Neehar Kondapaneni @therealpaneni.bsky.social · Jul 8

After demonstrating that RDX works when there are known differences, we compare models with unknown differences. For example, when comparing DINO and DINOv2, we find that DINOv2 has learned a color based categorization of gibbons that is not present in DINO.

1

Neehar Kondapaneni @therealpaneni.bsky.social · Jul 8

We apply RDX on trained models with known differences and show that it isolates the core differences. For example, we compare model representations with and w/out a “spotted wing” (SW) concept and find that RDX shows that only one model groups birds according to this feature.

1

Neehar Kondapaneni @therealpaneni.bsky.social · Jul 8

Model comparison allows us to subtract away shared knowledge, revealing interesting concepts that explain model differences. Our method, RDX, isolates differences by answering the question: what does Model A consider similar that Model B does not?
nkondapa.github.io/rdx-page/

Representational Difference Explanations (RDX)

Isolating and creating explanations of representational differences between two vision models.

nkondapa.github.io

1

Neehar Kondapaneni @therealpaneni.bsky.social · Jul 8

You’ve generated 10k concepts with your favorite XAI method -- now what? Many concepts you’ve found are fairly obvious and uninteresting. What if you could 𝑠𝑢𝑏𝑡𝑟𝑎𝑐𝑡 obvious concepts away and focus on the more complex ones? We tackle this in our latest preprint!

1 2 1

Neehar Kondapaneni @therealpaneni.bsky.social · Apr 24

The poster will actually be presented at Saturday 10am (Singapore time). Please ignore the previous time.

Neehar Kondapaneni @therealpaneni.bsky.social · Apr 11

If you’re attending ICLR, stop by our poster April 25, 3PM (Singapore time).
I’ll also be presenting a workshop poster, pushing further in this direction at the Bi-Align Workshop bialign-workshop.github.io#/ .

1 2

Neehar Kondapaneni @therealpaneni.bsky.social · Apr 11

We found these unique and important concepts to be fairly complex, requiring deep analysis. We use ChatGPT-4o to analyze the concept collages and find that it gives detailed and clear explanations about the differences between models. More examples here -- nkondapa.github.io/rsvc-page/

1 2

Neehar Kondapaneni @therealpaneni.bsky.social · Apr 11

We then look at “in-the-wild” models. We compare ResNets and ViTs trained on ImageNet. We measure concept importance and concept similarity. Do models learn unique and important concepts? Yes, sometimes they do!

1 2

Neehar Kondapaneni @therealpaneni.bsky.social · Apr 11

We first show this approach can recover known differences. We train Model 1 to use a pink square to make classification decisions and Model 2 to ignore it. Our method, RSVC, isolates this difference.

1 2

Neehar Kondapaneni @therealpaneni.bsky.social · Apr 11

We tackle this question by (i) extracting concepts for each model, (ii) using one model to predict the other’s concepts, (iii) and measuring the quality of the prediction.

1 3

Neehar Kondapaneni @therealpaneni.bsky.social · Apr 11

Have you ever wondered what makes two models different?
We all know the ViT-Large performs better than the Resnet-50, but what visual concepts drive this difference? Our new ICLR 2025 paper addresses this question! nkondapa.github.io/rsvc-page/

1 9 25

Neehar Kondapaneni @therealpaneni.bsky.social · Mar 1

Great work! I am curious what the reconstruction error is? Does the model behavior change significantly when using the reconstructed activations?

Reposted by Neehar Kondapaneni

Angelina Wang @angelinawang.bsky.social · Feb 17

Our new piece in Nature Machine Intelligence: LLMs are replacing human participants, but can they simulate diverse respondents? Surveys use representative sampling for a reason, and our work shows how LLM training prevents accurate simulation of different human identities.

9 35 150

Neehar Kondapaneni @therealpaneni.bsky.social · Nov 28

I had 1/5 reviewers respond, does that put me in the “has discussion” bucket? Are you checking number of reviewers who respond as well?