Cédric Rommel
ccrommel.bsky.social
Cédric Rommel
@ccrommel.bsky.social
Research Scientist at Meta | AI and neural interfaces | Interested in data augmentation, generative models, geometric DL, brain decoding, human pose, …

📍Paris, France 🔗 cedricrommel.github.io
Very inspiring talk by Fei Fei Lee yesterday at #NeurIPS2024 on visual intelligence !
December 12, 2024 at 4:10 PM
Very interesting first invited talk at the intersection between cognitive sciences and AI by @alisongopnik.bsky.social ! 🤩
December 11, 2024 at 1:56 AM
If you’re at #Neurips2024 next week, come meet us in poster session 3 on Thu 12 Dec 11 a.m!

Or at our oral presentation during the @neur_reps workshop on Saturday 14th!

Paper: arxiv.org/abs/2312.06386
Github: github.com/cedricrommel...
December 4, 2024 at 8:00 AM
We train it with the resilient winner-takes-all loss, which allows the model to optimally quantize the space without requiring many heads.

In the end, our model works as a conditional density estimator, taking the shape of a mixture of Dirac deltas.
December 4, 2024 at 8:00 AM
- Limbs length and directions are disentangled to constrain predicted poses to an estimated manifold.
- A multi-head subnetwork is used to predict different possible rotations for each joint, together with their corresponding likelihoods.
- Both are then merged into predicted poses.
December 4, 2024 at 8:00 AM
In fact, we prove the *only* way of conciliating consistency with accurate predictions is to output multiple 3D poses for each 2D input.

We hence propose ManiPose, a manifold-constrained multi-hypothesis deep network capable of better dealing with depth ambiguity.
December 4, 2024 at 8:00 AM
Previous approaches constrain poses to an estimated manifold by disentangling limbs lengths and directions. But they lag behind unconstrained models in terms of joint position error (MPJPE).

In our work, we prove this is unavoidable because of points 1 and 2.
December 4, 2024 at 8:00 AM
There are 3 main reasons to this:
1. Existing training losses and evaluation metrics (MPJPE) are blind to such inconsistencies ;
2. Many possible 3D poses can map to the same 2D input ;
3. Pose sequences cannot occupy the whole space: they lie on a smooth manifold because of limbs rigidity.
December 4, 2024 at 8:00 AM
While standard approaches directly map 2D coordinates to 3D, prior works noticed that predicted poses’ limbs could shrink and stretch along a movement.

In our work, we prove these are not isolated cases and that these methods always predict *inconsistent* 3D pose sequences.
December 4, 2024 at 8:00 AM
Many intelligent systems, like autonomous cars and smart/VR glasses, need to understand human’s movements and poses.

This can be achieved with a single camera by detecting human keypoints on a video, then lifting them into a 3D pose.
December 4, 2024 at 8:00 AM
Inferring 3D human poses from video is highly ill-posed because of depth ambiguity.

Our work accepted to #NeurIPS2024, ManiPose, gets one step closer to solving this, by leveraging prior knowledge about poses topology and cool multiple-choice learning techniques.
December 4, 2024 at 8:00 AM