Sam Gershman
@gershbrain.bsky.social
10K followers 66 following 690 posts
Professor, Department of Psychology and Center for Brain Science, Harvard University https://gershmanlab.com/
Posts Media Videos Starter Packs
gershbrain.bsky.social
My lab started doing molecular biology experiments precisely to try to answer these questions. This is actually why I started getting interested in these phenomena again (we have evidence for anti-contiguity effects in single cells). Stay tuned!
gershbrain.bsky.social
Gallistel has developed multiple models to explain the effects of timing on conditioning, although these are not really mechanistic. Here's a recent one:
brill.com/view/journal...
And here is my more mechanistic take:
gershmanlab.com/pubs/Gershma...
gershmanlab.com
gershbrain.bsky.social
I agree with the general logic of this point, but then you would have to explain why the minimum processing time varies by orders of magnitude across preparations. In the flavor aversion example, it's extremely unlikely that animals need to taste the flavor for an hour.
gershbrain.bsky.social
Hm, I think you are using a different definition of contiguity from the one that appears (for example) in Rescorla's paper, which discusses the interval between CS onset and US onset. I think that is the most relevant definition for associative theory.
gershbrain.bsky.social
I don't think so, because this implies that if you hold T fixed to a small value and reduce I sufficiently, then you should get faster learning, but in fact this slows learning (i.e., the effect of I is typically monotonic). Ths figure is also from Rescorla's 1988 paper, using Gibbon's data.
gershbrain.bsky.social
There's this paper on the definition of learning:
link.springer.com/article/10.3...
but that doesn't really provide a taxonomy. You can find taxonomies in some specific domains, but nothing super broad as far as I know.
link.springer.com
gershbrain.bsky.social
I think for that account to have legs, you'd have to be more specific about what you mean by attentional mechanism.
gershbrain.bsky.social
Great question. I don't know! I don't think it's reducible to some kind of simple biophysical constraint, because it seems to operate over multiple timescales. It could be a kind of causal prior, perhaps. To get at that, you'd want to see if you could manipulate it through training.
gershbrain.bsky.social
I'm teaching a class on theories of learning, and we had a great discussion about what makes associative learning truly associative, and I pointed to these data as evidence that what we call associative learning isn't actually associative. Most students had never seen the data before.
gershbrain.bsky.social
The anti-contiguity effect violates Gallistel & Gibbon's rate estimation theory, which is notable given that its evident in Gibbon's own data (the top middle plot in Rescorla's figure). I'm not aware of any theory that explains this (but please someone correct me if I'm wrong!).
gershbrain.bsky.social
There's another, often less appreciated violation of contiguity, which was nicely documented in this figure from Rescorla (1988). For very short interstimulus intervals, the conditioned response and learning speed *increase* with the interval. This is an anti-contiguity effect!
gershbrain.bsky.social
This is a great idea except for the fact that it's wrong. You can obtain the same learning speed across dramatically different interstimulus intervals (T) as long as the intertrial interval (I) is kept in a fixed ratio with the interstimulus interval (Gallistel & Gibbon, 2000).
gershbrain.bsky.social
Everyone knows that temporal contiguity is important for associative learning. As the interval between a cue (e.g., a light) and an outcome (e.g., shock) gets longer, the conditioned response (e.g., freezing to the tone) is acquired less quickly.
gershbrain.bsky.social
"War on unicorns"
or
"War by unicorns"
?
Reposted by Sam Gershman
ashleyjthomas.bsky.social
Interested in understanding how young humans think about social relationships? I am reading PhD applications this year! **Please note**, that Harvard now requires the GRE. More information here: www.ashleyjthomas.com/workwithme
WANT TO WORK WITH ME? | Mysite
www.ashleyjthomas.com
Reposted by Sam Gershman
merriam-webster.com
We are thrilled to announce that our NEW Large Language Model will be released on 11.18.25.
Reposted by Sam Gershman
mcxfrank.bsky.social
Ever wonder how habituation works? Here's our attempt to understand:

A stimulus-computable rational model of visual habituation in infants and adults doi.org/10.7554/eLif...

This is the thesis of two wonderful students: @anjiecao.bsky.social @galraz.bsky.social, w/ @rebeccasaxe.bsky.social
infant data from experiment 1 conceptual schema for different habituation models title page results from experiment 2 with adults
gershbrain.bsky.social
I suggested that play is an example where the metric maximized by adults in the ARC games (task completion) is precisely the metric minimized by children during play: the ideal imaginative play is never completed. "Success" is failure. Learning "efficiency" in this setting is irrelevant.
gershbrain.bsky.social
In the same symposium, Junyi Chu and Laura Schulz gave wonderful talks showcasing exactly how children defy Chollet's definition and the ARC operationalization. This led to an interesting discussion of whether it's possible to devise benchmarks that capture true play.
gershbrain.bsky.social
I'm more sympathetic to a "diverse intelligences" viewpoint, where different umwelts and ecological niches require different inductive biases that lead to different patterns of skill acquisition efficiency. Each is efficient over a particular distribution of skills.
gershbrain.bsky.social
What this excludes is the possibility that you could have different intelligent agents that generalize over disjoint skill sets. You could imagine agents that are really good at generalizing over skills that humans are bad at. In fact, you don't need to look far to find examples.
gershbrain.bsky.social
Chollet responded that this was "universal" intelligence, not "general" intelligence. To summarize (if I've understood correctly): general intelligence = generalizing like humans; universal intelligence [mathematically impossible] = generalizing over all skills.
gershbrain.bsky.social
I had always assumed that "general intelligence" refers to intelligent systems that show efficiency on essentially all tasks (that's the "general" part). This is why I also always assumed that (by the NFL theorems) general intelligence is impossible.
gershbrain.bsky.social
What surprised me is that Chollet defined the reference distribution of skills as precisely those skills humans are efficient at acquiring. By this definition, general intelligence is human intelligence. I find this problematic as a definition.
gershbrain.bsky.social
I asked Chollet for what skill distribution an intelligent system was expected to be efficient. To the extent that efficiency is derived from learning (i.e., it's a form of generalization), the no free lunch theorems say that no system can be equally efficient for all skills.