Lightnews — Scholar-powered news

Stefan Baumann

@stefanabaumann.bsky.social

1.3K followers 650 following 91 posts

PhD Student at @compvis.bsky.social & @ellis.eu working on generative computer vision.

Interested in extracting world understanding from models and more controlled generation. 🌐 https://stefan-baumann.eu/

Posts Replies Media Videos

Stefan Baumann

@stefanabaumann.bsky.social

The work I linked is relating to pretraining, too. Doing this for multiple rewards at once is indeed an aspect I haven't seen previously, I was just curious whether I was missing something about the general method

November 3, 2025 at 4:11 PM

Stefan Baumann

@stefanabaumann.bsky.social

Hasn't this idea been around for a while? E.g., proceedings.neurips.cc/paper_files/...

proceedings.neurips.cc

October 31, 2025 at 7:49 PM

Stefan Baumann

@stefanabaumann.bsky.social

Lovely work!
Let's make everything generative! No reason to forgo the availability of an (at least implicit) distribution for every prediction to make, if we can make it at least as accurate and similarly efficient as discriminative baselines in the long run

October 29, 2025 at 6:59 PM

Stefan Baumann

@stefanabaumann.bsky.social

Classic case of xkcd 2501

October 17, 2025 at 4:58 PM

Stefan Baumann

@stefanabaumann.bsky.social

Thank you! I think that might be possible, although I'd likely consider incorporating more information in that case

October 16, 2025 at 8:12 AM

Stefan Baumann

@stefanabaumann.bsky.social

We make code and weights available.
We'll also be in Honolulu to present the paper at #ICCV2025 next week 🌺.

Take a look now!
🌐 Project Page: compvis.github.io/flow-poke-tr...
📝 Paper: arxiv.org/abs/2510.12777
💻 Code & Weights: github.com/CompVis/flow...

What If: Understanding Motion Through Sparse Interactions

FPT enables fast prediction of multimodal motion distributions in open settings

compvis.github.io

October 15, 2025 at 2:00 AM

Stefan Baumann

@stefanabaumann.bsky.social

All of this wouldn't have been possible without the support of my amazing collaborators
@rmsnorm.bsky.social, @timyphan.bsky.social, and Björn Ommer at @compvis.bsky.social. A giant thank you to them! ❤️

October 15, 2025 at 1:59 AM

Stefan Baumann

@stefanabaumann.bsky.social

⚡️ FPT generalizes from open-set training. Applications:
• Articulated motion (Drag-A-Move): fine-tuned FPT outperforms specialized models for motion prediction
• Face motion: zero-shot, beats specialized baselines
• Moving part segmentation: emerges from formulation

October 15, 2025 at 1:58 AM

Stefan Baumann

@stefanabaumann.bsky.social

⚙️ Unlike other methods, we don't regress or sample one trajectory.
FPT 𝘳𝘦𝘱𝘳𝘦𝘴𝘦𝘯𝘵𝘴 𝘵𝘩𝘦 𝘧𝘶𝘭𝘭 𝘮𝘰𝘵𝘪𝘰𝘯 𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯, enabling:
• interpretable uncertainty
• controllable interaction effects
• efficient prediction (>100k predictions/s)

October 15, 2025 at 1:57 AM

Stefan Baumann

@stefanabaumann.bsky.social

💡 Our idea:
Predict 𝗱𝗶𝘀𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻𝘀 of motion, not just one flow field instance.

Given a few pokes, our model outputs the probability 𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯 of how parts of the scene might move.

→ This directly captures 𝘶𝘯𝘤𝘦𝘳𝘵𝘢𝘪𝘯𝘵𝘺 and interactions.

October 15, 2025 at 1:57 AM

Stefan Baumann

@stefanabaumann.bsky.social

🧠 Understanding how the world 𝘤𝘰𝘶𝘭𝘥 change is core to physical intelligence.

But most models predict 𝗼𝗻𝗲 𝗳𝘂𝘁𝘂𝗿𝗲, a single deterministic motion.

The reality is 𝘶𝘯𝘤𝘦𝘳𝘵𝘢𝘪𝘯 and 𝘮𝘶𝘭𝘵𝘪-𝘮𝘰𝘥𝘢𝘭: one poke can lead to many outcomes.

October 15, 2025 at 1:57 AM

Stefan Baumann

@stefanabaumann.bsky.social

Oh yeah, sorry, I should've made it more clear that I was talking in the more general case

October 3, 2025 at 6:48 PM

Stefan Baumann

@stefanabaumann.bsky.social

Let's for example say (zero-shot) semantic correspondence working quite well based on activations of image diffusion models.

The model has never been trained for it, and, while it's obvious that related capabilities might be useful for denoising, I'd still consider this an emergent capability

October 3, 2025 at 6:45 PM

Stefan Baumann

@stefanabaumann.bsky.social

Not in the sense of, e.g., generating new kinds of videos when the model was trained for video generation, but capabilities w.r.t. other tasks could still be considered emergent, right?

October 3, 2025 at 6:43 PM

Stefan Baumann

@stefanabaumann.bsky.social

Fair :D

September 18, 2025 at 3:21 PM

Stefan Baumann

@stefanabaumann.bsky.social

First time I ever hear someone from the 3D CV community actually say this out loud! This has been bugging me for a long time

September 18, 2025 at 2:48 PM

Stefan Baumann

@stefanabaumann.bsky.social

Ah, makes sense :)

September 11, 2025 at 1:18 PM

Stefan Baumann

@stefanabaumann.bsky.social

Why are you not on a current stable version?

September 11, 2025 at 11:59 AM

Stefan Baumann

@stefanabaumann.bsky.social

The bugs I ran into reproduce across 2.7, 2.8 and current nightlies

September 11, 2025 at 11:35 AM

Stefan Baumann

@stefanabaumann.bsky.social

Welcome to the club! I've somehow managed to find two bugs with torch.compile() in the last few days 🥲

September 10, 2025 at 11:26 PM

Stefan Baumann

@stefanabaumann.bsky.social

That process really sounds like a labor of love! Penrose looks really interesting, I'll play around with it! Thanks!

August 31, 2025 at 4:47 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news