denisehf.bsky.social
@denisehf.bsky.social
Reposted
Paper🧵 (cross-posted at X): When does composition of diffusion models "work"? Intuitively, the reason dog+hat works and dog+horse doesn’t has something to do with independence between the concepts being composed. The tricky part is to formalize exactly what this means. 1/
February 11, 2025 at 5:59 AM
Reposted
🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute?

We explored this through the lens of MoEs:
January 28, 2025 at 6:26 AM
Reposted
Thrilled to share the latest work from our team at
@Apple
where we achieve interpretable and fine-grained control of LLMs and Diffusion models via Activation Transport 🔥

📄 arxiv.org/abs/2410.23054
🛠️ github.com/apple/ml-act

0/9 🧵
December 10, 2024 at 1:09 PM
Reposted
🤔Image-to-3D, monocular depth estimation, camera pose estimation, …, can we achieve all of this with just ONE model easily?

🚀Our answer is Yes -- Excited to introduce our latest work: World-consistent Video Diffusion (WVD) with Explicit 3D Modeling!

arxiv.org/abs/2412.01821
December 4, 2024 at 1:41 PM