Additionally, TAPNext tracks in a purely online fashion enabling TAPNext to run with minimal latency, removing the temporal windowing required by many existing causal state of art trackers.
TAPNext particularly excels at tracking thin objects -- often and notoriously hard for existing trackers. Thin objects occur commonly in many real-world applications, e.g. in robotics and outdoor natural scenes.
TAPNext is conceptually simple, and removes many of the inductive biases present in many current Tracking Any Point models. Interestingly, many widely used tracking heuristics emerge naturally in TAPNext through end-to-end training.
We're very excited to introduce TAPNext: a model that sets a new state-of-art for Tracking Any Point in videos, by formulating the task as Next Token Prediction. For more, see: tap-next.github.io