Phillip Isola
@phillipisola.bsky.social
5.4K followers 89 following 63 posts
Associate Professor in EECS at MIT. Neural nets, generative models, representation learning, computer vision, robotics, cog sci, AI. https://web.mit.edu/phillipi/
Posts Media Videos Starter Packs
phillipisola.bsky.social
This work is with an amazing team including @sophielwang.bsky.social, @thisismyhat.bsky.social, Sharut Gupta, @shobsund.bsky.social, Chenyu Wang, and Stefanie Jegelka.

9/9
phillipisola.bsky.social
More broadly, I think confusion has been created by forming hard distinctions between different modalities, especially between text and sensory data. These distinctions can obscure commonalities. We take the rhetorical stance of erasing the distinctions, and seeing where this leads.

8/9
phillipisola.bsky.social
This work was partially inspired by Ilya Sutskever's talk here: www.youtube.com/watch?v=AKMu...

If you concatenate datasets, the model “should” figure out all the synergies and cross-modal relationships, then exploit them to make better inferences. We now have some evidence this can happen.

7/9
An Observation on Generalization
YouTube video by Simons Institute for the Theory of Computing
www.youtube.com
phillipisola.bsky.social
Suppose you have separate datasets X, Y, Z, without known correspondences.

We do the simplest thing: just train a model (e.g., a next-token predictor) on all elements of the concatenated dataset [X,Y,Z].

You end up with a better model of dataset X than if you had trained on X alone!

6/9
Architecture for Unpaired Multimodal Learner.
phillipisola.bsky.social
In “Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models,” we study a question I’ve wanted to make progress on for years: can you learn useful multimodal representations from *unpaired* data?

5/9
Diagram showing paired vs unpaired data
phillipisola.bsky.social
In short: you can “just ask” an LLM to act (a bit) like an image model or an audio model.

This tells us that LLMs know more about the sensory world than we might suspect; you just have to find ways to elicit the knowledge.

4/9
phillipisola.bsky.social
In “Words That Make Language Models Perceive,” we find if you ask an LLM to “imagine seeing,” then how it processes text becomes more like how a vision system would represent that same scene.

If you ask it to “imagine hearing,” its representation becomes more like that of an auditory model.

3/9
Diagram showing how prompts can steer an LLM toward kernel structure that better matches that of sensory encoders.
phillipisola.bsky.social
For context, this work stems from the idea that all data modalities (images, sounds, text, etc) are views of the same underlying world, and that treating them as such is useful.

We are interested in identifying commonalities between different models and modalities, and providing unifications.

2/9
Platonic representation diagram
phillipisola.bsky.social
Over the past year, my lab has been working on fleshing out theory + applications of the Platonic Representation Hypothesis.

Today I want to share two new works on this topic:

Eliciting higher alignment: arxiv.org/abs/2510.02425
Unpaired learning of unified reps: arxiv.org/abs/2510.08492

1/9
phillipisola.bsky.social
Oh I think you are right about the review process at least. Sometimes it rewards the inverse of my metric: a fancy new technique that doesn't actually achieve any new result / understanding :)
phillipisola.bsky.social
I think papers like that are great! One of my personal metrics for paper quality is: delta in capability / delta in technique. A paper that only changes one parameter and achieves much better results should get a best paper award by this metric :)
phillipisola.bsky.social
Interesting reaction from ChatGPT to the HHS mRNA memo. It finds it so implausible that it thinks it's fake. From the perspective of a ~2024(?) trained model, 2025 policies are so absurd as to be unbelievable...

chatgpt.com/share/689364...
phillipisola.bsky.social
Unless it turns out it that capable intelligence is actually not so simple!
phillipisola.bsky.social
Yeah, it helps me to consider that much of the history of science has been about finding a simpler-than-expected explanation of something that previously seemed magical: life (evolution), motion of the planets (law of gravitation), etc. Now those are among our most celebrated discoveries.
phillipisola.bsky.social
Of course, personally, I think we need not shy away from this possibility. Maybe intelligence is simpler than we thought, and there's a beauty in that too.
phillipisola.bsky.social
I think part of it is that people might be overestimating the complexity of intelligence, and it's hard not to.

How weird it would be if an LLM (a Markov chain!) could explain "thinking".

It feels like it makes us less special, like Copernicus placing the sun at the center, rather than the Earth.
phillipisola.bsky.social
I enjoy your posts! I hope you keep at it.
phillipisola.bsky.social
One reason is that GT may be finite or, yes, wrong. A regression model fit to GT can potentially generalize beyond the GT and correct errors.

I like to think of this as: the data is a bad model of the world.
phillipisola.bsky.social
To me it’s more like there exist some sci fi that predicted pretty well each of the things we are seeing, although no sci fi got them all right. But that still just seems incredible given that what happened is an infinitesimal point in the space of all possibilities…
phillipisola.bsky.social
Yeah and relatedly it’s odd to me when people dismiss future predictions as “pure science fiction” as if that means they are wrong or unlikely. The overwhelming feeling I have when reading old sci fi is how accurately it often predicted what came to pass.
phillipisola.bsky.social
I agree, maybe we need to qualify novelty wrt the audience: like novelty(x | me), novelty(x | biologists), novelty(x | 6th graders), novelty(x | world’s top expert), etc. All are useful. Some are more like “research” others are more like “teaching”, all are quite related.
phillipisola.bsky.social
Ah weird, yeah you are probably right for CS but for the natural sciences I think “novelty” often means “novel finding” rather than “novel method”, as in we discovered something new about the world. I like that definition more! Agree that most papers need not have any new algorithm to be worthwhile.
phillipisola.bsky.social
I’m more in the “pro novelty” camp but I think maybe it’s because I see novelty differently. I think, for example, that showing that known method X solves open problem Y is hugely novel. For me novelty is basically: did I learn something new and important from this work.
Reposted by Phillip Isola
cvprconference.bsky.social
#CVPR2025 provided coaching for all orals. Do you think the talks were improved compared to last year?

* Better than last year
* About the same
* Worse than last year

Share your thoughts in the thread!