Samuel Lavoie
@lavoiems.bsky.social
230 followers 110 following 13 posts
PhD candidate @Mila_quebec, @UMontreal. Ex: FAIR @AIatMeta. Learning representations, minimizing free energy, running.
Posts Media Videos Starter Packs
Reposted by Samuel Lavoie
glajoie.bsky.social
Compositionality is a central desideratum for intelligent systems...but it's a fuzzy concept and difficult to quantify. In this blog post, lab member @ericelmoznino.bsky.social outlines ideas toward formalizing it & surveys recent work. A must-read for interested researchers in AI and Neuro
ericelmoznino.bsky.social
Very excited to release a new blog post that formalizes what it means for data to be compositional, and shows how compositionality can exist at multiple scales. Early days, but I think there may be significant implications for AI. Check it out! ericelmoznino.github.io/blog/2025/08...
Defining and quantifying compositional structure
What is compositionality? For those of us working in AI or cognitive neuroscience this question can appear easy at first, but becomes increasingly perplexing the more we think about it. We aren’t shor...
ericelmoznino.github.io
lavoiems.bsky.social
This work wouldn’t exist without my amazing co-authors:
@mnoukhov.bsky.social & @AaronCourville🙏
lavoiems.bsky.social
Example: There are no “teapots on mountains” in ImageNet.

We verify this via nearest-neighbor search in DinoV2 space.
But our model can still create them—by composing concepts it learned separately.
lavoiems.bsky.social
LLMs can speak in DLC!

We fine-tune a language model to sample DLC tokens from text, giving us a pipeline:
Text → DLC → Image
This also enables generation beyond ImageNet.
lavoiems.bsky.social
DLCs are compositional.
Swap tokens between two images (🐕 Komodor + 🍝 Carbonara) → the model produces coherent hybrids never seen during training.
lavoiems.bsky.social
🚀 Results:

DiT-XL/2 + DLC → FID 1.59 on unconditional ImageNet

Works well with and without classifier-free guidance

Learns faster and better than prior works using pre-trained encoders

🤯
lavoiems.bsky.social
Unconditional generation pipeline:
Sample a DLC (e.g., with SEDD)

Decode it into an image (e.g., with DiT)

This ancestral sampling approach is simple but powerful.
lavoiems.bsky.social
DLCs enables exactly this.
Images → sequences of discrete tokens via a Simplicial Embedding (SEM) encoder

We take the argmax over token distributions → get the DLC sequence

Think of it as “tokenizing” images—like words for LLMs.
lavoiems.bsky.social
Text models don’t have this problem! LLMs can model internet scale corpus.

So… can we improve image generation of highly-modal distributions by decomposing it into:

1. Generating discrete tokens - p(c)
2. Decoding tokens into images - p(x|c)
lavoiems.bsky.social
Modeling highly multimodal distributions in continuous space is hard.
Even a simple 2D Gaussian mixture with a large number of modes may be tricky to model directly. Good conditioning solves this!

Could this be why large image generative models are almost always conditional? 🤔
lavoiems.bsky.social
🧵 Everyone is chasing new diffusion models—but what about the representations they model from?
We introduce Discrete Latent Codes (DLCs):
- Discrete representation for diffusion models
- Uncond. gen. SOTA FID (1.59 on ImageNet)
- Compositional generation
- Integrates with LLM
🧱
lavoiems.bsky.social
Congrats Lucas! Looking forward to see what will come out of your lab in Zurich!