Logo multitel

7th Multitel Spring workshop on video analysis – June 26, 2012

Organized in the framework of portfolio and platform.

  • 10:00 Welcome and coffee
  • 10:25 Opening Remarks and Welcome

Paper Session 1: Pattern Recognition and Tracking

  • 10:30 “Using Scene Context to Improve Template Matching“, Changxin Gao, UCL.

Template matching is widely used in pattern recognition and computer vision. However, the performance of traditional template matching approaches is often sensitive to large intra-class variance, occlusion, minor variety of poses, low-resolution conditions, background clutter etc. To resolve the problem, we present a biologically inspired template matching method based on scene context. There are three key contributions. First, the essential idea is based on spatial layout consistency of natural scenes, which is inspired from two biological plausible ways, scene-centered viewpoint and change blindness phenomenon. The second is a framework to improve template matching using scene context information, described by semantic representations, which enhances its robustness. Third, it measures the similarity of a template and a sub-image by putting both into scene context, which is totally different from the conventional template matching method.

  • 11:00 “SV3D—PID controller for Pan-Tilt-Zoom camera autotracking“, Li Sun, UCL.

Traditional video surveillance techniques require a large number of fixed cameras to maximize the area of coverage and the resolution of each observed target. Such multi-camera system is both cumbersome and costly. Compared to systems with multiple fixed cameras, a system with one single PTZ camera can be much more efficient; meanwhile, it can give the large coverage due to its pan-tilt motion, and the appropriate resolution due to the zooming in/out. Therefore, it is important to design a proper control scheme for PTZ camera so that it can actively focus on the target in the scene. In this presentation, a general architecture for utilizing the PTZ camera in video surveillance will be first introduced in detail. This architecture allow us to access the image and the PTZ parameters from the camera, at the same time, the commands to motorize the camera can also be sent within this architecture. Then details on PID controller for active auto tracking are also provided. The PI controller commands the PTZ camera to perform pan/tilt/zoom motion with a specified speed corresponding to the motion of the target.

  • 11:30 “Matching points to track objects in dynamic and close view video“, Quentin De Neyer, UCL.

We investigate the tracking of objects on close-up pictures obtained by PTZ cameras in feedback control framework. Tracking methods based on background modeling are not usable in this context, because of the constant motion of the camera. The tracking is therefore based on the foreground, using point matching. Point matching is particularly efficient when the local deformation are small around one point in successive frames, which is the case in close-up views. In practice, the object (foreground) is represented by a set of points and their associated image-based description. These points are matched individually in each new frame and a global motion of the object is inferred. Several challenges arise when implementing that kind of processes, including the choice of points, the way they are matched as well as the motion inference process. Moreover, the set of points must be updated after each frame process strategically.
In the context of the SV3D project, some results have already been obtained in the field of surveillance. The points are currently matched by maximizing correlation of small image patches. The implemented algorithm discriminates between background and foreground based on dual motion extraction. Points are then attributed confidence measures so as to weight
the impact of these points in the next frame analysis.

  • 12:00 “Prioritizing the Propagation of Identity Beliefs for Multi-object Tracking“, Amit Kumar K.C., UCL.

Multi-object tracking requires locating the targets as well as labeling their identities. Inferring identities of the targets from their appearances is a challenge when the availability and the reliability of the observation process do vary along the time and space. The purpose of this presentation is to assign identities to those appearance measurements using a graph-based formalism. Each node of the graph corresponds to a tracklet, which is defined to be a sequence of positions that very likely correspond to the same physical target. Tracklets are pre-computed and the talk investigates how to assign them identities, knowing the reference appearance of each target. Initially, each node is assigned a probability distribution over the set of possible identities, based on the observed appearance features. Afterwards, belief propagation is considered to infer the identities of more ambiguous nodes from those of less ambiguous nodes, by exploiting the graph constraints and the measures of similarities between the nodes. In contrast to the standard belief propagation, which treats the nodes in an arbitrary order, the proposed method uses a priority-based belief propagation, in which less ambiguous nodes are scheduled to transmit their messages first. Vadidation is performed on a real-life basketball dataset. The proposed method achieves 89% identification rate, which is an improvement of 21% and 16% compared to individual identity assignment, and to standard belief propagation, respectively.

Lunch break

(A buffet lunch will be offered by Multitel)

  • 13:30 “Multi-objects Tracking using a generative model“, Alexis Bienvenu, Multitel.

We present a modified version of the POM (Probability Occupancy Map) algorithm, a state of the art tracking system. The algorithm is based on a generative model of the views given a configuration of the scene. Experimental results conducted on several datasets (Pets, Apidis, Vanaheim) show the accuracy and robustness of this method for the tracking of people in the context of videosurveillance.

Paper Session 2: Human Body Detection and Analysis

  • 14:00 “Training with corrupted labels to improve the performance of a probably correct detector : an application to people detection in videos“, Pascaline Parisot, UCL.

While the fusion of foreground silhouettes into a ground occupancy mask has become a key component of modern approach to multi-view people detection, background subtraction approaches to people detection remain subject to errors when dealing with a single viewpoint. Besides, several works have demonstrated the benefit of exploiting classifiers to detect objects or people in images, based on the observation of local texture statistics. In our approach, we train a classifier to differentiate false and true positives among the detections computed based on foreground mask analysis. To circumvent the manual annotation burden incurred by the training stage, while adapting the classifier to the appearance specificities of the people/players to detect, we propose to define automatically two classes of training samples based on the foreground silhouette analysis detector. The first class of samples corresponds to the bounding boxes surrounding the detected silhouettes, while the second class is made of samples that are randomly selected in the remaining parts of the image. Hence, we face a training set whose labels might be corrupted. The classifier design choices are discussed through extensive experiments. As a main conclusion, ensemble of random sets of binary tests, also named random ferns in the literature, appear to be more robust to the corruption of labels than a boosted selection of similar binary tests.

  • 14:30 “Alternative Search Techniques for Face Detection using Location Estimation and Binary Features“ Venkatesh Balasubburaman, Multitel.

The sliding window approach is the most widely used technique to detect an object from an image. In the past few years, classifiers have been improved in many ways to increase the scanning speed. Apart from the classifier design (such as the cascade), the scanning speed also depends on a number of different factors (such as the grid spacing, and the scale at which the image is searched). When the scanning grid spacing is larger than the tolerance of the trained classifier it suffers from low detection. In this paper we present a technique to reduce the number of miss detections when fewer number of subwindows are processed in the sliding window approach for face detection. This is achieved by using a small patch to predict the location of the face within a local search area. We use simple binary features and a decision tree as it proved to be efficient for our application. We also show that by using a simple interest point detector based on quantized gradient orientation, as the front-end to the proposed location estimation technique, we can further improve the performance. Experimental evaluation on several face databases show better detection rate and speed with our proposed approach when fewer number of subwindows are processed compared to the standard scanning technique.

  • 15:00 “Space-Time Histograms for person re-identification, comparison with lower order Histograms on news videos“, R√©mi Auguste, LIFL.

Presentation of a new color-based descriptor for videos, called Space-Time Histogram. We apply this model on persons extracted from news videos and we link all the appearance of each person throughout the video using the histograms signature. We compare our results with those obtained by histograms and spatiograms.

  • 15:30 “3D Face reconstruction in a binocular passive stereoscopic system using face properties“, Amel Aissaoui, LIFL.

We present a novel approach for face stereo reconstruction in passive stereo vision system. Our approach is based on the generation of a facial disparity map, requiring neither expensive devices nor generic face models. It consists of incorporating face properties in the disparity estimation to enhance the 3D face reconstruction. An algorithm based on the Active Shape Model (ASM) is proposed to acquire 3D sparse estimation of the face with a high confidence. Using sparse estimation as guidance and considering the face symmetry and smoothness, the dense disparity is completed. Experimental results demonstrate the reconstruction accuracy of the proposed method.

  • 16:00 “Symmetry Based Model for Head Pose Estimation“, Afifa Dahmane, LIFL.

Head pose estimation from digital images consists of locating a person’s head and estimating the orientation of its three degrees of freedom (Yaw, Pitch and Roll). This task has been considered an important research task for decades. Over the years, many techniques have been proposed to solve this problem. They can be categorized in two main classes: Model-based approaches and Appearance-based approaches. The Model-based approaches are fast and simple, but sensitive to occlusion and usually require high resolution images which may be not available in many applications such as driver monitoring or video surveillance. Appearance-based approaches suffer from information about identity and lighting which are contained in the face appearance.
We propose an approach to select a set of features from the symmetrical parts of the face. The approach does not need the location of interest points on face and is robust to partial occlusion. The size of bilateral symmetrical area of the face is a good indicator of the Yaw head pose. We train a Decision Tree model in order to recognize head pose with regard to the areas of symmetry.

  • 16:30 Discussion & Closing