Our goal is to solve the object tracking problem and enable new applications of augmented reality. While camera tracking works already well, the next big step is to achieve similar performance for object tracking. We believe that these trackers need to integrate both 3D geometry and large memory. Our toolset is designed exactly for that, and consists of the following components.

3D Modeling

Tracking using a bounding box reached is potential already, future trackers will require more accurate representation. We use a 3D model for that. This model can be general (human head), or specific (Porsche 911). If possible, we use models from online datasets, or create them in Blender.

Rendering Engine

Our rendering engine is used to draw graphics into video stream, augment training set with synthetic examples, and most importantly, it provides a bridge between CV and CG worlds. The engine allows to render texture and object coordinates, which are in turn used as rich label maps for training.

Neural Network

This is where we save our memory. Our tiny C++ neural network is stripped from anything not directly useful. We put a lot of emphasis into visualization which is essential for debugging. Our models take about 3MB / object. Training of a new model is done over night on single Titan X.

Video Annotator

Innovative tool for annotation of object trajectories with 3D pose and instance labels. It integrates our latest tracking and detection methods, and allows efficient bootstrapping. We use both real and synthetic examples for training.

This technology stack was used to build our latest system TLD3.0, which gives promising results for racing cars and human head. But the problem is by no means solved yet. We keep on innovating on daily basis so that our goal will come true one day.