Andrew Gordon Wilson
@andrewgwils.bsky.social
2.6K followers 200 following 98 posts
Machine Learning Professor https://cims.nyu.edu/~andrewgw
Posts Media Videos Starter Packs
andrewgwils.bsky.social
My full interview with MLStreetTalk has just been posted. I really enjoyed this conversation! We talk about the bitter lesson, scientific discovery, Bayesian inference, mysterious phenomena, and key principles for building intelligent systems. www.youtube.com/watch?v=M-jT...
The Real Reason Huge AI Models Actually Work
YouTube video by Machine Learning Street Talk
www.youtube.com
andrewgwils.bsky.social
I'm excited to be giving a keynote talk at the AutoML conference 9-10 am at Cornell Tech tomorrow! I'm presenting "Prescriptions for Universal Learning". I'll talk about how we can enable automation, which I'll argue is the defining feature of ML. 2025.automl.cc/program/
andrewgwils.bsky.social
Research doesn't go in circles, but in spirals. We return to the same ideas, but in a different and augmented form.
Reposted by Andrew Gordon Wilson
andrewgwils.bsky.social
I have a confession to make. After 6 years, I stopped teaching belief propagation (but still cover graphical models). It felt like tedious bookkeeping around orderings of sums and notation. Have I strayed?
andrewgwils.bsky.social
Regardless of whether you plan to use them in applications, everyone should learn about Gaussian processes, and Bayesian methods. They provide a foundation for reasoning about model construction and all sorts of deep learning behaviour that would otherwise appear mysterious.
andrewgwils.bsky.social
A common takeaway from "the bitter lesson" is we don't need to put effort into encoding inductive biases, we just need compute. Nothing could be further from the truth! Better inductive biases mean better scaling exponents, which means exponential improvements with computation.
andrewgwils.bsky.social
Gould mostly recorded baroque and early classical. He only recorded a single Chopin piece, as a one-off broadcast. But like many of his efforts, it's profoundly thought provoking, the end product as much Gould as it is Chopin. I love the last mvt (20:55+). www.youtube.com/watch?v=NAHE...
Glenn Gould plays Chopin Piano Sonata No. 3 in B minor Op.58
YouTube video by The Piano Experience
www.youtube.com
andrewgwils.bsky.social
I don't think those things seem boring. But most research directions honestly are quite boring, because they are geared towards people pleasing --- going with the herd, seeking approval from others, and taking no risks. It's a great way to avoid making a contribution that changes any minds.
andrewgwils.bsky.social
Whatever you do, just don't be boring.
andrewgwils.bsky.social
I had a great time presenting "It's Time to Say Goodbye to Hard Constraints" at the Flatiron Institute. In this talk, I describe a philosophy for model construction in machine learning. Video now online! www.youtube.com/watch?v=LxuN...
It's Time to Say Goodbye to Hard (equivariance) Constraints - Andrew Gordon Wilson
YouTube video by LoG Meetup NYC
www.youtube.com
andrewgwils.bsky.social
Excited to be presenting my paper "Deep Learning is Not So Mysterious or Different" tomorrow at ICML, 11 am - 1:30 pm, East Exhibition Hall A-B, E-500. I made a little video overview as part of the ICML process (viewable from Chrome): recorder-v3.slideslive.com#/share?share...
andrewgwils.bsky.social
While scaling laws typically predict the final loss, we show that good scaling rules enable accurate predictions of entire loss curves of larger models from smaller ones! Shikai Qiu did an amazing job leading the paper, in collaboration with L. Xiao, J. Pennington, A. Agarwala. 3/3
andrewgwils.bsky.social
In particular, scaling collapse allows us to transfer insights from experiments conducted at a very small scale to much larger models! Much more in the paper, including supercollapse: collapse between curves less than the noise floor of per-model loss curves across seeds. 2/3
andrewgwils.bsky.social
Our new ICML paper discovers scaling collapse: through a simple affine transformation, whole training loss curves across model sizes with optimally scaled hypers collapse to a single universal curve! We explain the collapse, providing a diagnostic for model scaling.
arxiv.org/abs/2507.02119
1/3
andrewgwils.bsky.social
It sometimes feels like papers are written this way. <Make claim that may or may not be true but aligns with the paper's narrative> <find arbitrary reference that supposedly supports that claim, but may be making a different point entirely>. I guess grammarly is giving the people what they want?
andrewgwils.bsky.social
Excited about our new ICML paper, showing how algebraic structure can be exploited for massive computational gains in population genetics.
alannawzadamin.bsky.social
We can make population genetics studies more powerful by building priors of variant effect size from features like binding. But we’ve been stuck on linear models! We introduce DeepWAS to learn deep priors on millions of variants! #ICML2025 Andres Potapczynski, @andrewgwils.bsky.social 1/7
andrewgwils.bsky.social
Machine learning is perhaps the only discipline that has become less mature over time. A reverse metamorphosis, from butterfly to caterpillar.
andrewgwils.bsky.social
AI this, AI that, the implications of AI for X... can we just never talk about AI again?
andrewgwils.bsky.social
Really excited about our new paper, "Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion". We explain the mysterious success of masking diffusion to propose new diffusion models that work well in a variety settings, including proteins, images, and text!
alannawzadamin.bsky.social
There are many domain-specific noise processes for discrete diffusion, but masking dominates! Why? We show masking exploits a key property of discrete diffusion, which we use to unlock the potential of those structured processes and beat masking! w/ Nate Gruver and @andrewgwils.bsky.social 1/7
andrewgwils.bsky.social
What's irrational is the idea that some group of authors writing a paper about something foundational also should be the team of people to put it into production in the real world and demonstrate its impact, all in one paper. That happens over years, and involves different interests and skills.
andrewgwils.bsky.social
I find that this position is often more emotionally rooted than rational. It makes no sense to expect a paper on foundations to demonstrate significant real-world impact. As you say, it's a cumulative process carried out by different people, over time.
andrewgwils.bsky.social
Sorry to miss it! Currently in Cambridge UK for a Newton Institute programme on uncertainty representation.