Project 7: Deep Vision: Multiple Object Tracking

Leader: M. Felsberg, LiU
Participants at LU: A. Heyden, LU

Project description: Society’s need for understanding of big data, and in particular visual data, is constantly
growing. For instance, virtually all major public areas are covered by surveillance cameras producing video streams, which need to be analyzed on-demand or even online to prevent major acts of crime or terrorism. This analysis requires predominantly fully automatic detection, tracking, and recognition of objects, e.g., people, and predictions of their actions/behavior and anomalies. Visual detection and recognition problems have recently been addressed with previously unseen performance by deep learning approaches, where convolutional networks are trained on enormous datasets, e.g., IMAGENET (more than 14 million images), taking weeks even on powerful platforms. Due to the success of deep learning, many vision problems have recently been addressed using the same framework. In contrast to many other researchers that apply deep learning as a black box, we have addressed new problems, such as action recognition and object tracking, by investigating the separate layers of those networks.

In the present project, we aim at a major leap beyond current deep learning approaches. We will look into the procedural fundamentals of the learning algorithm and investigate the aspect of learning with respect to the overall task, which often requires reinforcement learning rather than supervised learning. Furthermore, we will derive methods for incremental and online learning, based on a deeper understanding of the semantic structure of the network.
This structural knowledge will also allow the introduction of regularization, constraints, invariances, and existing
modelling. Finally, we plan to go beyond feed-forward networks and look into recurrent network design and dynamic schemes. Using these novel approaches to deep learning and on our previous experience with visual object tracking, we will address the problem of multiple object tracking. Visual object tracking requires an online adaptation of the tracked model, thus online deep learning is needed and approaches to construct compact and discriminative deep features will be investigated. The assessment of tracking is typically by means of region overlap, i.e., a success score between zero and one, thus requiring reinforcement learning. Multiple object tracking is typically an iterative process, which can be implemented by a dynamic network with recurrent procedures. Finally, depending on the cameras, prior knowledge such as geometric constraints are available that should be modelled into the solution, too.