Lightnews — Scholar-powered news

Mason Kamb @masonkamb.bsky.social · Jul 5

I am curious if you have ever tried to compiling all of your disparate observations about the impacts of changing various hyperparameters in your models. Having followed your work for a bit, it seems like you have a wealth of knowledge about this that would be interesting to a lot of people.

Reposted by Mason Kamb

Surya Ganguli @suryaganguli.bsky.social · Jun 30

A great @quantamagazine.bsky.social article on our theory of creativity in convolutional diffusion models lead by @masonkamb.bsky.social See also our paper with new results in version 2: arxiv.org/abs/2412.20292 to be presented as an oral at @icmlconf.bsky.social #icml25

1 4 21

Mason Kamb @masonkamb.bsky.social · Jun 30

Also, see this explainer thread for more details:
bsky.app/profile/maso...

Mason Kamb @masonkamb.bsky.social · Dec 31

Excited to finally share this work w/ @suryaganguli.bsky.social Tl;dr: we find the first closed-form analytical theory that replicates the outputs of the very simplest diffusion models, with median pixel wise r^2 values of 90%+. arxiv.org/abs/2412.20292

Mason Kamb @masonkamb.bsky.social · Jun 30

If you're interested, you can also:
- read our paper (now with faces!): arxiv.org/pdf/2412.202...
- use our code + weights:
github.com/Kambm/convol...

1 1

Mason Kamb @masonkamb.bsky.social · Jun 30

Honored to have had my recent work with
@suryaganguli.bsky.social on the mechanisms behind creativity in diffusion models featured in this lovely article by
Webb Wright for Quanta magazine!

Quanta Magazine @quantamagazine.bsky.social · Jun 30

A new paper shows that the “creativity” of certain AI may actually be a direct, inevitable consequence of how they are built. Webb Wright reports:
www.quantamagazine.org/researchers-...

Researchers Uncover Hidden Ingredients Behind AI Creativity | Quanta Magazine

Image generators are designed to mimic their training data, so where does their apparent creativity come from? A recent study suggests that it’s an inevitable by-product of their architecture.

www.quantamagazine.org

1 2 10

Reposted by Mason Kamb

Kempner Institute at Harvard University @kempnerinstitute.bsky.social · Jun 12

NEW: Mason Kamb ‪( @masonkamb.bsky.social ‬) from @stanford.edu‬ presents a predictive theory of combinatorial creativity in diffusion models.

Watch the video: youtu.be/DP_kGt0-2cg

#NeuroAI2025 #AI #ML #NeuroAI

An Analytic Theory of Creativity in Convolutional Diffusion Models with Mason Kamb

Mason Kamb from Stanford University joined the Frontiers of NeuroAI Symposium on June 6, 2025, to discuss "An Analytic Theory of Creativity in Convolutional ...

youtu.be

1 2

Mason Kamb @masonkamb.bsky.social · May 4

Came for the political ripostes and stayed for the diffusion models

Reposted by Mason Kamb

Sean Carroll @seanmcarroll.bsky.social · Mar 27

The DOGE etc. damage to US science will have enormous effects that will linger for decades. But they will be sufficiently gradual and diffuse that people who want to pretend the cause wasn't obvious will be able to do so.

Alexandra Witze @alexwitze.bsky.social · Mar 27

We polled Nature readers to ask if they were thinking of leaving the US for jobs abroad. Three-quarters of them (who said they were US-based scientists) said yes. 🧪

www.nature.com/articles/d41...

75% of US scientists who answered Nature poll consider leaving

More than 1,600 readers answered our poll; many said they were looking for jobs in Europe and Canada.

www.nature.com

15 31 150

Reposted by Mason Kamb

Will Oremus @willoremus.com · Mar 24

In another blow to legacy media, I'm hearing that the Trump administration plans to remove The Atlantic from its war-plans group chat. The outlet will be replaced in the chat by the Gateway Pundit.

31 320 2.3K

Reposted by Mason Kamb

ryan cooper @ryanlcooper.com · Feb 19

real instructive that just by paying attention to the background hum of regular small plane crashes the media has created a perception of a sharp increase

64 130 1.3K

Reposted by Mason Kamb

Scott H. Hawley @drscotthawley.bsky.social · Jan 11

Finally got to reading the fascinating & excellent paper by Kamb and Ganguli, which makes a significant contribution to diffusion/GenAI literature & will likely become one of the most-cited works in this space. Unlike many "theoretical" ML studies, theirs is high-dimensional and practical.. 1/n

Mason Kamb @masonkamb.bsky.social · Dec 31

Excited to finally share this work w/ @suryaganguli.bsky.social Tl;dr: we find the first closed-form analytical theory that replicates the outputs of the very simplest diffusion models, with median pixel wise r^2 values of 90%+. arxiv.org/abs/2412.20292

2 1 8

Mason Kamb @masonkamb.bsky.social · Jan 11

Wow, thank you for this very charitable review! Happy to answer any questions/discussion points if you have them.

Code should be out soonish, working to bring the repo into a fit state for public consumption (currently it's a bit spaghettified). Colab not yet in the works, but perhaps it should be…

1

Mason Kamb @masonkamb.bsky.social · Jan 1

*replicate for MNIST that is. Different datasets have different characteristics in this regard.

Mason Kamb @masonkamb.bsky.social · Jan 1

Interesting question. On a patch level I don't have a specific answer. Formally at the largest scales the answer is probably "all of them." On a whole-image level I've found that you can approximately replicate the generated images you get with the whole dataset with only a few hundred examples.

1 1

Mason Kamb @masonkamb.bsky.social · Jan 1

You're also never precisely at t=0 due to discretization, which mitigates the blowup issue as well.

1

Mason Kamb @masonkamb.bsky.social · Jan 1

The NN generated outputs will not obey this consistency condition because they don't blow up. In practice this doesn't affect the output a whole lot. The intuition is that if you have a lot of patches into the dataset, the aforementioned consistency condition becomes very mild.

1

Mason Kamb @masonkamb.bsky.social · Jan 1

Good question. The effect of this explosion for the ELS machine ends up being that it enforces the consistency condition in theorem 4.1 (each pixel should match the center pixel of the l2-nearest patch). Intuition here is that these are the only points where the score fails to explode.

1 1

Reposted by Mason Kamb

Surya Ganguli @suryaganguli.bsky.social · Dec 31

Our new paper! "Analytic theory of creativity in convolutional diffusion models" lead expertly by @masonkamb.bsky.social
arxiv.org/abs/2412.20292
Our closed-form theory needs no training, is mechanistically interpretable & accurately predicts diffusion model outputs with high median r^2~0.9

5 31 130

Mason Kamb @masonkamb.bsky.social · Dec 31

We’re excited to push the envelope of deep learning theory to encompass minimal examples of realistic diffusion models in this paper. We hope that this work will lay a foundation for detailed investigations into more sophisticated models, including those with self-attention.

1 4

Mason Kamb @masonkamb.bsky.social · Dec 31

The images from the Attention-enabled model bear strong qualitative resemblance to the ELS machine, but exhibit *just enough* nonlocal coordination to be semantically meaningful.

1 1

Mason Kamb @masonkamb.bsky.social · Dec 31

Our theory is tailored to models that have strong locality biases, such as CNNs. However, we find that our theory (bottom rows) is still moderately predictive for a simple diffusion model *with* self-Attention layers (top rows), which explicitly break equivariance/locality.

1 2

Mason Kamb @masonkamb.bsky.social · Dec 31

Diffusion models are notorious for getting the wrong numbers of fingers, legs, etc. Our theory is able to recapitulate this behavior, and provides for the first time a clear mechanistic explanation for these failures as a consequence of excessive locality.

1 1 5

Mason Kamb @masonkamb.bsky.social · Dec 31

This simple model of diffusion model creativity is remarkably predictive-- we find that, after calibrating a single time-dependent hyperparameter (the locality scale), we can replicate the behavior of trained fully-convolutional diffusion models on a case-by-case basis

1 3

Mason Kamb @masonkamb.bsky.social · Dec 31

Under optimal *equivariant+local* denoising, each pixel can be drawn towards *any* training patch from *anywhere* in the training set, rather than only the ones that are drawn from the same pixel location. We call this model the Equivariant Local Score (ELS) Machine.

1 5

Mason Kamb @masonkamb.bsky.social · Dec 31

Under optimal *local* denoising, each *pixel* forms an independent Bayesian estimate for the probability of each training example, based on the information visible in the receptive field, rather than the entire image.

1 3