Lightnews — Scholar-powered news

Reposted by Samuel Lavoie

Guillaume Lajoie @glajoie.bsky.social · Aug 19

Compositionality is a central desideratum for intelligent systems...but it's a fuzzy concept and difficult to quantify. In this blog post, lab member @ericelmoznino.bsky.social outlines ideas toward formalizing it & surveys recent work. A must-read for interested researchers in AI and Neuro

Eric Elmoznino @ericelmoznino.bsky.social · Aug 18

Very excited to release a new blog post that formalizes what it means for data to be compositional, and shows how compositionality can exist at multiple scales. Early days, but I think there may be significant implications for AI. Check it out! ericelmoznino.github.io/blog/2025/08...

Defining and quantifying compositional structure

What is compositionality? For those of us working in AI or cognitive neuroscience this question can appear easy at first, but becomes increasingly perplexing the more we think about it. We aren’t shor...

ericelmoznino.github.io

5 21

Samuel Lavoie @lavoiems.bsky.social · Jul 22

This work wouldn’t exist without my amazing co-authors:
@mnoukhov.bsky.social & @AaronCourville🙏

Samuel Lavoie @lavoiems.bsky.social · Jul 22

Code & Models are open source:
💾 github.com/lavoiems/Dis...
📜https://arxiv.org/pdf/2507.12318

Reproduce, remix, build your own DLC-powered models.

GitHub - lavoiems/DiscreteLatentCode: Official repository for the article Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models (https://arxiv.org/abs/2507.12318)

Official repository for the article Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models (https://arxiv.org/abs/2507.12318) - lavoiems/DiscreteLatentCode

github.com

1

Samuel Lavoie @lavoiems.bsky.social · Jul 22

Example: There are no “teapots on mountains” in ImageNet.

We verify this via nearest-neighbor search in DinoV2 space.
But our model can still create them—by composing concepts it learned separately.

1

Samuel Lavoie @lavoiems.bsky.social · Jul 22

LLMs can speak in DLC!

We fine-tune a language model to sample DLC tokens from text, giving us a pipeline:
Text → DLC → Image
This also enables generation beyond ImageNet.

1

Samuel Lavoie @lavoiems.bsky.social · Jul 22

DLCs are compositional.
Swap tokens between two images (🐕 Komodor + 🍝 Carbonara) → the model produces coherent hybrids never seen during training.

1

Samuel Lavoie @lavoiems.bsky.social · Jul 22

🚀 Results:

DiT-XL/2 + DLC → FID 1.59 on unconditional ImageNet

Works well with and without classifier-free guidance

Learns faster and better than prior works using pre-trained encoders

🤯

1

Samuel Lavoie @lavoiems.bsky.social · Jul 22

Unconditional generation pipeline:
Sample a DLC (e.g., with SEDD)

Decode it into an image (e.g., with DiT)

This ancestral sampling approach is simple but powerful.

1 1

Samuel Lavoie @lavoiems.bsky.social · Jul 22

DLCs enables exactly this.
Images → sequences of discrete tokens via a Simplicial Embedding (SEM) encoder

We take the argmax over token distributions → get the DLC sequence

Think of it as “tokenizing” images—like words for LLMs.

1

Samuel Lavoie @lavoiems.bsky.social · Jul 22

Text models don’t have this problem! LLMs can model internet scale corpus.

So… can we improve image generation of highly-modal distributions by decomposing it into:

1. Generating discrete tokens - p(c)
2. Decoding tokens into images - p(x|c)

1

Samuel Lavoie @lavoiems.bsky.social · Jul 22

Modeling highly multimodal distributions in continuous space is hard.
Even a simple 2D Gaussian mixture with a large number of modes may be tricky to model directly. Good conditioning solves this!

Could this be why large image generative models are almost always conditional? 🤔

1

Samuel Lavoie @lavoiems.bsky.social · Jul 22

🧵 Everyone is chasing new diffusion models—but what about the representations they model from?
We introduce Discrete Latent Codes (DLCs):
- Discrete representation for diffusion models
- Uncond. gen. SOTA FID (1.59 on ImageNet)
- Compositional generation
- Integrates with LLM
🧱

1 3 5

Samuel Lavoie @lavoiems.bsky.social · Jul 17

The code and model weights for Llip are finally out! I hope you will find this model useful!
Paper: arxiv.org/abs/2405.00740
Code: github.com/facebookrese...
Models:
- ViT-G: huggingface.co/lavoies/llip...
- ViT-B: huggingface.co/lavoies/llip...

Modeling Caption Diversity in Contrastive Vision-Language Pretraining

There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by mapping an image and its caption to a single vector -- limiting how well CLIP-like mo...

arxiv.org

1 6

Samuel Lavoie @lavoiems.bsky.social · Dec 5

Congrats Lucas! Looking forward to see what will come out of your lab in Zurich!

1