We present our latest TLD system designed tracking of rigid 3D objects in video. The system is based on a combination of neural networks and a 3D model and achieves promising trade-off between precision and robustness, all without camera calibration.

Stages


The system operates in 3 stages. First, the object location is detected across multiple scales. The second stage locates object parts on these detections. The third stage perform data association and aligns a 3D model using L1 loss.

Attributes


Our system also allows to estimate additional attributes about the object, such as the name of the team. This information can be presented as a text, or as additional 3D object which accurately follows the motion of the car.

CG Bridge


The live video stream can be paused at any point and instantly zoom in into the CG world. Here it is possible to point out to some specific details about the car, or give more context about the scene.

Generality


The system can be applied to any rigid object. In this case, a toy model of Porsche 911.

The following video shows the system in action.

Human Head
Our journey to TLD3 started with a system called HeadTLD which was our first attempt to simultaneously solve object detection and pose estimation. Our work was recognized by NVIDIA as part of Inception Program Contest.

Porsche 911
Later on, we switched to a different target, more rigid but also much faster -- Porsche 911. This revealed a number of new challenges; mostly related to mutual occlusion. But also clearly indicated that the quality of 3D model impacts the quality of the tracking. So accurate modeling of the target is our preference for the future.

Formula 1
The next logical step was to apply the system to something even faster -- Formula 1. Improvements done on this use case resulted in the final version of TLD3.0.

[Airplane]
Under development.