Vahe Taamazyan
@vaheta.bsky.social
1.2K followers 430 following 74 posts
Helping robots see @ http://intrinsic.ai (Alphabet company). Here talking about 3D Computer Vision and everything around it. Views are my own.
Posts Media Videos Starter Packs
vaheta.bsky.social
There’s definitely a connection
vaheta.bsky.social
Pretty sure my first words were cv2.imread. or maybe cv::imread, now that I think about it. @opencv.bsky.social @philnelson.bsky.social
vaheta.bsky.social
5/
Check out the video in the first post of the thread to see one of the submissions in action.
The winning teams will be announced at the CVPR 2025 Perception for Industrial Robotics Automation Workshop.

Stay tuned!
vaheta.bsky.social
4/
Some stats:
- 450+ teams registered
- Dozens of teams invested hundreds of hours
- Top 5 submissions were deployed on a real robot using Intrinsic Flowstate + ROS

We tried to pick parts based on participants' predicted poses - live!
vaheta.bsky.social
3/
This is one of the toughest problems in computer vision, and it’s still far from “solved” - especially at the levels of accuracy and robustness required in real industrial settings.
vaheta.bsky.social
2/
Participants were given multi-view, multi-modal images and tasked with training models that not only detect objects, but also predict their full 3D pose.
That means 3D position + rotation - exactly what a robot needs to grasp and manipulate objects accurately.
vaheta.bsky.social
This has been one of the most exciting and demanding projects I’ve worked on.

Over the past 6 months, we teamed up with OpenCV, the BOP benchmark, and the CIRP Lab @uhmanoa.bsky.social to run a global challenge on 6DoF pose estimation for complex industrial parts. 🧵
vaheta.bsky.social
Want to know if humanoid robots have finally become practical? See if more of them roll smoothly on wheels instead of stumbling around on legs. Until then, they’re stuck in demo mode - not deployment and scaling mode.
vaheta.bsky.social
Gotta admit – I enjoy deleting code way more than writing it.
vaheta.bsky.social
Ok, someone literally called their paper Foundation X: arxiv.org/pdf/2503.09860
arxiv.org
vaheta.bsky.social
Having your own pattern is a nice flex, so I’m happy for Bowen actually! :D
vaheta.bsky.social
I only know Bowen Wen’s Foundation Pose and Stereo papers that follow that pattern. What else is there?
vaheta.bsky.social
CV/ML paper naming strategies:

- Answering yes to an obvious question? → “Do You Need X for Y?”
- New one-shot model? → “X Anything”
- Overhyping something trivial? → “All You Need Is X”
vaheta.bsky.social
🚀 Internship Opportunity! 🚀

Our small research team is looking for two talented interns to join us! If you're passionate about cutting-edge research at the intersection of computer vision and robotics, this is your chance to contribute to an exciting, high-impact project.
vaheta.bsky.social
Prediction: There’ll be more people calling themselves software developers in 2–3 years than there are now.
vaheta.bsky.social
I’m wondering how long it’ll take until people start memorizing openings for all 960 starting options
vaheta.bsky.social
I made a bet with a friend on 1 Nvidia stock that in 5 years, I’ll be able to buy a robot that can cook pasta in my kitchen, straight from raw ingredients, with no special setup. Just in case, I’ve already bought the stock😆. What are my chances?
vaheta.bsky.social
And in million years you’ll get the answer 42
vaheta.bsky.social
Here is the challenge website: bpc.opencv.org
vaheta.bsky.social
The Bin Picking Challenge is live! Detect the 6DoF poses of 10 challenging parts and compete for $60K in prizes. Some of them are quite challenging to detect accurately. The video shows a few of the parts in rotation - are you up for the challenge?
vaheta.bsky.social
The Bin Picking Challenge is now open!
vaheta.bsky.social
Big news for computer vision enthusiasts! Intrinsic.ai is sponsoring the Bin-Picking Challenge for Perception, with $60,000 in prizes and a chance to present your work at the CVPR Perception for Industrial Robotics Automation Workshop! 🧵 1/5
vaheta.bsky.social
The only reasonable explanation I can think of is that Nvidia’s moat is larger in training than in inference. If training becomes much more efficient, demand saturation becomes a risk. But even this explanation feels somewhat weak.
vaheta.bsky.social
I see people debating the Humanity’s Last Exam dataset. I actually contributed a problem to it. For the past 2 years, I’ve used a much simpler version of that problem to evaluate new LLMs—and they’ve all failed miserably. I can’t speak for other problems, but once an LLM solves mine, I’m retiring 🫡