We present our latest TLD system designed tracking of rigid 3D objects in video. The system is based on a combination of neural networks and a 3D model and achieves promising trade-off between precision and robustness, all without camera calibration.
The system operates in 3 stages. First, the object location is detected across multiple scales. The second stage locates object parts on these detections. The third stage perform data association and aligns a 3D model using L1 loss.
Our system also allows to estimate additional attributes about the object, such as the name of the team. This information can be presented as a text, or as additional 3D object which accurately follows the motion of the car.
The live video stream can be paused at any point and instantly zoom in into the CG world. Here it is possible to point out to some specific details about the car, or give more context about the scene.
The system can be applied to any rigid object. In this case, a toy model of Porsche 911.
The following video shows the system in action.