Noah Snavely
@snavely.bsky.social
1.8K followers 240 following 110 posts
3D vision fanatic http://snavely.io
Posts Media Videos Starter Packs
Reposted by Noah Snavely
andreasgeiger.bsky.social
#TTT3R: 3D Reconstruction as Test-Time Training
TTT3R offers a simple state update rule to enhance length generalization for #CUT3R — No fine-tuning required!
🔗Page: rover-xingyu.github.io/TTT3R
We rebuilt @taylorswift13’s "22" live at the 2013 Billboard Music Awards - in 3D!
Reposted by Noah Snavely
nate-burgdorfer.bsky.social
We present a new approach to inference-time scene optimization, which we name Radiant Triangle Soup (RTS) www.arxiv.org/abs/2505.23642. Also check out really great concurrent work from Held et al. @janheld.bsky.social, Triangle Splatting arxiv.org/abs/2505.19175
Reposted by Noah Snavely
shiryginosar.bsky.social
🧠How “old” is your model?

Put it to the test with the KiVA Challenge: a new benchmark for abstract visual reasoning, grounded in real developmental data from children and adults.

🏆 Prizes:
🥇$1K to the top model
🥈🥉$500
📅 Deadline: 10/7/25
🔗 kiva-challenge.github.io
@iccv.bsky.social
KiVA Challenge @ ICCV 2025
kiva-challenge.github.io
snavely.bsky.social
(ChatGPT claims that this piece is Twinkle Twinkle Little Star, while Gemini says it is Do-Re-Me.)
snavely.bsky.social
ChatGPT and Gemini both seem to struggle with sheet music. They both insist that this excerpt is in D major (2 sharps), and resist any attempt to tell them that there 3 sharps in the key signature. I think this is really cool and interesting!
Reposted by Noah Snavely
shiryginosar.bsky.social
Think LMMs can reason like a 3-year-old?

Think again!

Our Kid-Inspired Visual Analogies benchmark reveals where young children still win: ey242.github.io/kiva.github....

Catch our #ICLR2025 poster today to see where models still fall short!

Thurs. April 24
3-5:30 pm
Halls 3 + 2B #312
Reposted by Noah Snavely
ericzzj.bsky.social
Dynamic Camera Poses and Where to Find Them

Chris Rockwell, @jtung.bsky.social, Tsung-Yi Lin, Ming-Yu Liu, David F. Fouhey, Chen-Hsuan Lin

tl;dr: a large-scale dataset of dynamic Internet videos annotated with camera poses

arxiv.org/abs/2504.17788
Reposted by Noah Snavely
redfairy2002.bsky.social
1/6 🔍➡️ How to transform standard videos into immersive 360° panoramas? We've designed a new AI system for video-to-360° panorama generation!

Our key insight: large-scale data is crucial for robust panoramic synthesis across diverse scenes.
Reposted by Noah Snavely
linyijin.bsky.social
We have released the Stereo4D dataset! Explore the real-world dynamic 3D tracks: github.com/Stereo4d/ste...
snavely.bsky.social
This is really nice work on visual discovery from @boyangdeng.bsky.social!
boyangdeng.bsky.social
Curious about how cities have changed in the past decade? We use MLLMs to analyse 40 million Street View images to answer this. Do you know that "'juice shops' became a thing in NYC" and "miles of overpasses were painted BLUE in SF"? More at→boyangdeng.com/visual-chronicles (vid ↓ w/ 🔊)
Reposted by Noah Snavely
carldoersch.bsky.social
We're very excited to introduce TAPNext: a model that sets a new state-of-art for Tracking Any Point in videos, by formulating the task as Next Token Prediction. For more, see: tap-next.github.io
Reposted by Noah Snavely
jonbarron.bsky.social
A thread of thoughts on radiance fields, from my keynote at 3DV:

Radiance fields have had 3 distinct generations. First was NeRF: just posenc and a tiny MLP. This was slow to train but worked really well, and it was unusually compressed --- The NeRF was smaller than the images.
Reposted by Noah Snavely
informor.bsky.social
Fifth Ave jammed #handsoff
Reposted by Noah Snavely
haian-jin.bsky.social
🚀 We’ve just released the code and checkpoints for our #ICLR2025 Oral paper: "LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias".

Check it out below 👇

🔗 Code: github.com/haian-jin/LVSM
📄 Paper: arxiv.org/abs/2410.17242
🌐 Project Page: haian-jin.github.io/projects/LVSM/
snavely.bsky.social
This is really cool work!
anandbhattad.bsky.social
[1/10] Is scene understanding solved?

Models today can label pixels and detect objects with high accuracy. But does that mean they truly understand scenes?

Super excited to share our new paper and a new task in computer vision: Visual Jenga!

📄 arxiv.org/abs/2503.21770
🔗 visualjenga.github.io
Reposted by Noah Snavely
anandbhattad.bsky.social
[1/10] Is scene understanding solved?

Models today can label pixels and detect objects with high accuracy. But does that mean they truly understand scenes?

Super excited to share our new paper and a new task in computer vision: Visual Jenga!

📄 arxiv.org/abs/2503.21770
🔗 visualjenga.github.io
Reposted by Noah Snavely
cornelltech.bsky.social
#Backslash at #CornellTech, dedicated to advancing new works of art and technology that escape convention, has announced Mimi Ọnụọha as its first Backslash Fellow: tech.cornell.edu/news/mimi-on...

“This work feels like a marked evolution for me personally,” said Ọnụọha.

@snavely.bsky.social
snavely.bsky.social
Very nice! Is this a thing that happens each night at the hotel?
snavely.bsky.social
This is really bad!
Reposted by Noah Snavely
akanazawa.bsky.social
Exciting news! MegaSAM code is out🔥 & the updated Shape of Motion results with MegaSAM are really impressive! A year ago I didn't think we could make any progress on these videos: shape-of-motion.github.io/results.html
Huge congrats to everyone involved and the community 🎉
snavely.bsky.social
Very interesting! The guy who loves singing through a megaphone comes to mind, but I think he came later.
snavely.bsky.social
The Dispossesed is an interesting choice! I didn't know it had a big influence.
snavely.bsky.social
Very interesting -- thank you!
snavely.bsky.social
I think Qianqian et al's work is really cool! The problem of modeling state within a 3D reasoning system is quite interesting.

(And I believe it's pronounced "cuter".)
qianqianwang.bsky.social
Late to post, but excited to introduce CUT3R!

An online 3D reasoning framework for many 3D tasks directly from just RGB. For static or dynamic scenes. Video or image collections, all in one!

Project Page: cut3r.github.io
Code and Model: github.com/CUT3R/CUT3R