Sushrut Thorat
@sushrutthorat.bsky.social
1.1K followers 190 following 240 posts
Recurrent computations and lifelong learning. Postdoc at IKW-UOS@DE with @timkietzmann.bsky.social Prev. Donders@NL‬, ‪CIMeC@IT‬, IIT-B@IN
Posts Media Videos Starter Packs
sushrutthorat.bsky.social
Also to your "not present in the training distribution" point. Neither the Geirhos stimuli nor your diagnostic stimuli are part of the train set either. Extreme generalization is what we usually resort to for interpretability and that is fine, no?
sushrutthorat.bsky.social
Regardless, the claim that it's not shape (outline, bulk, etc.) but something related to color, texture, etc. that ANNs rely on for classification is a hard one to refute, wouldn't you say? (coz people reading your paper's title would think you're speaking directly against this claim)
sushrutthorat.bsky.social
careful declaring that ANNs are not biased toward textures. Also it is worth mentioning that texture, in the Geirhos setting, is also much loosely defined (includes color, etc.) than how you or psychophysics expts refer to it. This is exactly where your controlled expts are a great addition.
sushrutthorat.bsky.social
I think your controlled analysis is great and important — sorry that is not being reflected in the directness of my arguments. What I'm trying to say is given the stark demonstration of what seems to be an inability to rely on global shape when embedded in a conflicting texture, I'd be (cont..)
sushrutthorat.bsky.social
Preference vs reliability, yes. But lets say shape was more salient; ok humans got it but ANNs did not; lets say texture was more salient, humans latch onto shape, and ANNs not (or on texture). Either way, "humans lean more towards shape OR ANNs cannot / do not rely on shape" is the conclusion, no?
sushrutthorat.bsky.social
CNNs relying on low-freq non-contour, etc. components can't use them anymore?
sushrutthorat.bsky.social
This could be because of reliance on the non-contour, low-freq components I mentioned earlier, no? yes, the relative comparisons signals a difference, but it could be signaling a different difference: humans biased towards shape might switch to local-texture-based decisions, whereas (cont..)
sushrutthorat.bsky.social
But what remains is shape parts AND texture, and humans/models could rely on either or both of them - hard to disentangle, no?
sushrutthorat.bsky.social
Thanks for engaging :)
In Geirhos's cc images, the texture doesn't have to only be high-freq. The gram matrices are aligned across all layers - in later layers the RF sizes are huge so the correlations needn't necessarily only reflect small-scale variation, as seen in my post.
sushrutthorat.bsky.social
Would be cool to discuss this with the authors of the paper. The only author I could find was @paolorota.bsky.social though. I'm curious, when all these observations are taken into account, what we come out with as a conclusion.
sushrutthorat.bsky.social
5. another example highlighting a texture/background bias comes from a forest vs trees - like benchmark - bsky.app/profile/timk...
... idk I'm, given the above concerns, and these observations, not convinced that "ImageNet-trained CNNs are not biased towards texture"
timkietzmann.bsky.social
Result 2: DVD-training enabled abstract shape recognition in cases where AI frontier models, despite being explicitly prompted, fail spectacularly.

t-SNE nicely visualises the fundamentally different approach of DVD-trained models. 6/
sushrutthorat.bsky.social
4. the famous cat-elephant cue-conflict image DOES tell us about the ability and preference of ANNs (another example w/ then sota VLMs - bsky.app/profile/sush...). There's clearly a reliance on texture. Now if controlled analysis employed are not in line with this, perhaps we need better controls?
sushrutthorat.bsky.social
... (Claude/Gemini do the same)
sushrutthorat.bsky.social
2. the global shape manipulation also preserves textures so humans and ANNs might be relying on different cues to solve the task.
3. noone says humans cannot/do not rely on texture (as seen w/ the local shape condition), but the Geirhos stimuli are about gauging a shape "bias", preference vs ability
sushrutthorat.bsky.social
hmm,
1. the way they quantify "texture" is based solely on high-freq components. but, there are low-freq components which do not signal meaningful information about shape either and could influence classification (suppl fig. from upcoming rev. of arxiv.org/abs/2507.03168)
Reposted by Sushrut Thorat
brialong.bsky.social
We’re recruiting a postdoctoral fellow to join our team! 🎉

I’m happy to share that I’ve opened back up the search for this position (it was temporarily closed due to funding uncertainty).

See lab page and doc below for details!
sushrutthorat.bsky.social
Thanks! Will check it out 😇
sushrutthorat.bsky.social
also I was referring to what LoRA et al. end up doing - modifying the transformation with a low-rank adapter - which is similar to the work I linked wherein a top-down attention signal, which is low-rank from the perspective of the weights, modifies the transformation.
sushrutthorat.bsky.social
counterpoint #1: bsky.app/profile/nico...
nicolecrust.bsky.social
I get it. Similar thoughts inspired me to write a book. I began pessimistic, like this author, but I came out on the other side with renewed optimism.

As a counterpoint to the blog below, this podcast could be titled, "Why Nicole Rust stayed"

www.fchampalimaud.org/news/episode...
sushrutthorat.bsky.social
oh? which one is that? I'm unaware of it. Also note that the aim of the work I linked above wasn't explicitly to do what LoRA/ICL does but I think the spirit is similar - contextual modulation one way or the other.
sushrutthorat.bsky.social
"Also: It is my opinion that neuroscience has stagnated over the past one or two decades."
"My field is spinning wheels with new data and new publications without new findings or conclusions."
relatable and guilty
Reposted by Sushrut Thorat
shahabbakht.bsky.social
Interesting paper suggesting a mechanism for why in-context learning happens in LLMs.

They show that LLMs implicitly apply an internal low-rank weight update adjusted by the context. It’s cheap (due to the low-rank) but effective for adapting the model’s behavior.

#MLSky

arxiv.org/abs/2507.16003
Learning without training: The implicit dynamics of in-context learning
One of the most striking features of Large Language Models (LLM) is their ability to learn in context. Namely at inference time an LLM is able to learn new patterns without any additional weight updat...
arxiv.org
sushrutthorat.bsky.social
and the low-D part has been on the horizon since a bit now - proceedings.neurips.cc/paper/2019/h... - given complex numbers you can go loooowwww haha (O(1)). Also this is linked to top-down attention: arxiv.org/abs/1907.12309 , arxiv.org/abs/2502.15634 - which is a low-D modulation (O(N) vs O(N^2)).
Superposition of many models into one
proceedings.neurips.cc
sushrutthorat.bsky.social
In a way, I'm wondering "how else" functionally would this work? There's, of course, an equivalence b/w ICL and finetuning from the perspective of the feedforward processing of the current token. The crazy/hard bit is showing HOW exactly this "contextual modulation" manifests.