Logo multitel

The documents distributed by this server have been provided by the contributing authors free of charge as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis, uniquely for a strictly personal use. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. Accessing this page for commercial use and/or indexing it in order to make it searchable on a commercial basis is strictly prohibited without prior written consent. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's and publishers copyright. These works may not be reposted or distributed in any form without the explicit permission of the copyright holder.

2012

  • pdf icon
85 J.-M. Odobez, C. Carincotte, R. Emonet, E. Jouneau, S. Zaidenberg, B. Ravera, F. Bremond and A. Grifoni. Unsupervised Activity Analysis and Monitoring Algorithms for E ective Surveillance Systems. European Conference on Computer Vision . Firenze, Italy. October 2012.
Abstract: In this demonstration, we will show the di erent modules related to the automatic surveillance prototype developed in the context of the EU VANAHEIM project. Several components will be demonstrated on real data from the Torino metro. First, di erent unsupervised activity modeling algorithms that capture recurrent activities from long recordings will be illustrated. A contrario, they provide unusuallness measures that can be used to select the most interesting streams to be displayed in control rooms. Second, di erent scene analysis algorithms will be demonstrated, ranging from left-luggage detection to the automatic identi cation of groups and their tracking. Third, a set of situationnal reporting methods (flow and count monitoring in escalators and at platforms as well as human presence at lift ) that provide a global view of the activity in the metro station and are displayed on maps or along with analyzed video streams. Finally, an oine activity discovery tool based on long term recordings. All algorithms are integrated into a Video Management Solution using an innovative VideoWall module that will be demonstrated as well.
Bibentry:
@INPROCEEDINGS{Odobez:2012,
  author = {J.-M. Odobez and C. Carincotte and R. Emonet and E. Jouneau and S. Zaidenberg and B. Ravera and F. Bremond and A. Grifoni},
  title = {Unsupervised Activity Analysis and Monitoring Algorithms for E ective Surveillance Systems},
  booktitle = {European Conference on Computer Vision},
  year = {2012},
  address = {Firenze, Italy},
  month = {October 7-13},
  note = {dpt:img*grp:vs*lg:en*prj:vanaheim},
  abstract = {In this demonstration, we will show the di erent modules related to the automatic surveillance prototype developed in the context of the EU VANAHEIM project. Several components will be demonstrated on real data from the Torino metro. First, di erent unsupervised activity modeling algorithms that capture recurrent activities from long recordings will be illustrated. A contrario, they provide unusuallness measures that can be used to select the most interesting streams to be displayed in control rooms. Second, di erent scene analysis algorithms will be demonstrated, ranging from left-luggage detection to the automatic identi cation of groups and their tracking. Third, a set of situationnal reporting methods (flow and count monitoring in escalators and at platforms as well as human presence at lift ) that provide a global view of the activity in the metro station and are displayed on maps or along with analyzed video streams. Finally, an oine activity discovery tool based on long term recordings. All algorithms are integrated into a Video Management Solution using an innovative VideoWall module that will be demonstrated as well.},
  url = {2012_ECCV_VANAHEIM.pdf},
};
  • pdf icon
84 V. Bala Subburaman, A. Descamps and C. Carincotte. Counting people in the crowd using a generic head detector. IEEE Int. Workshop on Performance Evaluation of Tracking and Surveillance, PETS . Beijing, China. September 2012.
Abstract: Crowd counting and density estimation is still one of the important task in video surveillance. Usually a regression based method is used to estimate the number of people from a sequence of images. In this paper we investigate to estimate the count of people in a crowded scene. We detect the head region since this is the most visible part of the body in a crowded scene. The head detector is based on state-of-art cascade of boosted integral features. To prune the search region we propose a novel interest point detector based on gradient orientation feature to locate regions similar to the top of head region from gray level images. Two different background subtraction methods are evaluated to further reduce the search region. We evaluate our approach on PETS 2012 and Turin metro station databases. Experiments on these databases show good performance of our method for crowd counting.
Bibentry:
@INPROCEEDINGS{BalaSubburaman:2012,
  author = {V. Bala Subburaman and A. Descamps and C. Carincotte},
  title = {Counting people in the crowd using a generic head detector},
  booktitle = {IEEE Int. Workshop on Performance Evaluation of Tracking and Surveillance, PETS},
  year = {2012},
  address = {Beijing, China},
  month = {September 18, 2012},
  note = {dpt:img*grp:vs*lg:en*prj:vanaheim},
  abstract = {Crowd counting and density estimation is still one of the important task in video surveillance. Usually a regression based method is used to estimate the number of people from a sequence of images. In this paper we investigate to estimate the count of people in a crowded scene. We detect the head region since this is the most visible part of the body in a crowded scene. The head detector is based on state-of-art cascade of boosted integral features. To prune the search region we propose a novel interest point detector based on gradient orientation feature to locate regions similar to the top of head region from gray level images. Two different background subtraction methods are evaluated to further reduce the search region. We evaluate our approach on PETS 2012 and Turin metro station databases. Experiments on these databases show good performance of our method for crowd counting.},
  url = {2012_PETS_VANAHEIM.pdf},
};

2011

  • pdf icon
83 A. Descamps, C. Carincotte and B. Gosselin. Person Detection for Indoor Videosurveillance using Spatio-Temporal Integral Features. Interactive Human Behavior Analysis in Open or Public Spaces Workshop (INTERHUB) . Amsterdam, The Netherlands. November 2011.
Abstract: In this paper, we address the problem of person detection in indoor videosurveillance data. We present a new method based on the state of the art integral channel features. This approach is extended to allow the use of temporal features in addition to appearance based features. The temporal features are integrated by a robust background subtraction method. Our method is then evaluated on several datasets presenting various and challenging conditions typical of videosurveillance context. The evaluation shows that additional temporal features are effi cient and improve greatly the performance of the detector.
Bibentry:
@INPROCEEDINGS{Descamps:2011:a,
  author = {A. Descamps and C. Carincotte and B. Gosselin},
  title = {Person Detection for Indoor Videosurveillance using Spatio-Temporal Integral Features},
  booktitle = {Interactive Human Behavior Analysis in Open or Public Spaces Workshop (INTERHUB)},
  year = {2011},
  address = {Amsterdam, The Netherlands},
  month = {November 16-18},
  note = {dpt:img*grp:vs*lg:en*prj:vanaheim},
  abstract = {In this paper, we address the problem of person detection in indoor videosurveillance data. We present a new method based on the state of the art integral channel features. This approach is extended to allow the use of temporal features in addition to appearance based features. The temporal features are integrated by a robust background subtraction method. Our method is then evaluated on several datasets presenting various and challenging conditions typical of videosurveillance context. The evaluation shows that additional temporal features are effi cient and improve greatly the performance of the detector.},
  url = {2011_INTERHUB_VANAHEIM.pdf},
};
  • pdf icon
82 E. Jouneau and C. Carincotte. Mono versus Multi-view tracking-based model for automatic scene activity modeling and anomaly detection. IEEE Int. Conf. on Advanced Video and Signal-Based Surveillance (AVSS) . Klagenfurt, Austria. September 2011.
Abstract: In this paper, we present a novel method able to automatically discover recurrent activities occurring in a video scene, and to identify the temporal relations between these activities, which can be used either in mono-view or in multi-view context (for example, to discover the different flows of passengers inside a subway station and identify the rules that govern these flows). The proposed method is based on particle-based trajectories, analyzed through a cascade of HMM and HDP-HMM models. We experiment our model for scene activity recognition task on a subway dataset using both mono-view and multi-view analysis. We last show that our model is also able to perform on the fly and in real-time abnormal events detection (by identifying activities or relations that do not fit in the usual/learnt ones).
Bibentry:
@INPROCEEDINGS{Jouneau:2011:b,
  author = {E. Jouneau and C. Carincotte},
  title = {Mono versus Multi-view tracking-based model for automatic scene activity modeling and anomaly detection},
  booktitle = {IEEE Int. Conf. on Advanced Video and Signal-Based Surveillance (AVSS)},
  year = {2011},
  address = {Klagenfurt, Austria},
  month = {September},
  note = {dpt:img*grp:vs*lg:en*prj:vanaheim},
  abstract = {In this paper, we present a novel method able to automatically discover recurrent activities occurring in a video scene, and to identify the temporal relations between these activities, which can be used either in mono-view or in multi-view context (for example, to discover the different flows of passengers inside a subway station and identify the rules that govern these flows). The proposed method is based on particle-based trajectories, analyzed through a cascade of HMM and HDP-HMM models. We experiment our model for scene activity recognition task on a subway dataset using both mono-view and multi-view analysis. We last show that our model is also able to perform on the fly and in real-time abnormal events detection (by identifying activities or relations that do not fit in the usual/learnt ones).},
  url = {2011_AVSS_VANAHEIM.pdf},
};
  • pdf icon
81 E. Jouneau and C. Carincotte. Particle-based tracking model for automatic anomaly detection. Int. Conference on Image Processing (ICIP) . Brussels, Belgium. September 2011.
Abstract: In this paper, we present a new method to automatically discover recurrent activities occurring in a video scene, and to identify the temporal relations between these activities, e.g. to discover the different flows of cars at a road intersection, and to identify the traffic light sequence that governs these flows. The proposed method is based on particle-based trajectories, analyzed through a cascade of HMM and HDP-HMM models. We demonstrate the effectiveness of our model for scene activity recognition task on a road intersection dataset. We last show that our model is also able to perform on the fly abnormal events detection (by identifying activities or relations that do not fit in the usual/discovered ones), with encouraging performances.

Keywords: ACTIVITY RECOGNITION, ANOMALY DETECTION, HDP-HMM, HMM, TOPIC MODELS, VIDEO SURVEILLANCE.

Bibentry:
@INPROCEEDINGS{Jouneau:2011:a,
  author = {E. Jouneau and C. Carincotte},
  title = {Particle-based tracking model for automatic anomaly detection},
  booktitle = {Int. Conference on Image Processing (ICIP)},
  year = {2011},
  address = {Brussels, Belgium},
  month = {September},
  note = {dpt:img*grp:vs*lg:en*prj:vanaheim},
  abstract = {In this paper, we present a new method to automatically discover recurrent activities occurring in a video scene, and to identify the temporal relations between these activities, e.g. to discover the different flows of cars at a road intersection, and to identify the traffic light sequence that governs these flows. The proposed method is based on particle-based trajectories, analyzed through a cascade of HMM and HDP-HMM models. We demonstrate the effectiveness of our model for scene activity recognition task on a road intersection dataset. We last show that our model is also able to perform on the fly abnormal events detection (by identifying activities or relations that do not fit in the usual/discovered ones), with encouraging performances.},
  url = {2011_ICIP_VANAHEIM.pdf},
  keywords = {Video surveillance, activity recognition, anomaly detection, HMM, HDP-HMM, topic models},
};
  • pdf icon
80 C. Carincotte, R. Elmostadi and K. Hagihara. Enhanced image applications for high data rate HF channel. Military Communications and Information Systems Conference (MCC) . Amsterdam, The Netherlands. October 2011.
Abstract: The goal of this paper is to present innovative image applications that will be available on future high data rate HF transmissions. Fitting in the considerations and studies carried out currently on future HF transmissions, for instance within NATO groups, these new image applications are currently under development in the EDA HDR-HF project, which aims at developing new communication technologies to offer a high bit data rate transmission system for data and multimedia military applications over HF channels. In this paper, we present the work currently on-going on these new multimedia applications, mainly focusing on JPEG 2000 and JPIP image transmission, and on image enhanced analysis such as automatic object detection, and image registration and 3D rendering.

Keywords: 3D RENDERING, AUTOMATIC OBJECT DETECTION, HF TRANSMISSION, IMAGE APPLICATIONS, IMAGE REGISTRATION, JPEG2000/JPIP TRANSMISSION.

Bibentry:
@INPROCEEDINGS{Carincotte:2011:a,
  author = {C. Carincotte and R. Elmostadi and K. Hagihara},
  title = {Enhanced image applications for high data rate HF channel},
  booktitle = {Military Communications and Information Systems Conference (MCC)},
  year = {2011},
  address = {Amsterdam, The Netherlands},
  month = {October},
  note = {dpt:img*grp:vs*lg:en*prj:hdr-hf},
  abstract = {The goal of this paper is to present innovative image applications that will be available on future high data rate HF transmissions. Fitting in the considerations and studies carried out currently on future HF transmissions, for instance within NATO groups, these new image applications are currently under development in the EDA HDR-HF project, which aims at developing new communication technologies to offer a high bit data rate transmission system for data and multimedia military applications over HF channels. In this paper, we present the work currently on-going on these new multimedia applications, mainly focusing on JPEG 2000 and JPIP image transmission, and on image enhanced analysis such as automatic object detection, and image registration and 3D rendering.},
  url = {2011_MCC_HDR-HF.pdf},
  keywords = {HF transmission, Image applications, JPEG2000/JPIP transmission, Automatic object detection, Image registration, 3D rendering},
};

2010

  • pdf icon
79 I.A. Fernandez, F. Chen, F. Lavigne, X. Desurmont and C. De Vleeschouwer. Worthy visual content on mobile through interactive video streaming. IEEE Int. Conf. on Multimedia & Expo . Singapore. July 2010.
Abstract: This paper builds on an interactive streaming architecture that supports both user feedback interpretation, and temporal juxtaposition of multiple video bitstreams in a single streaming session. As an original contribution, we explain how these functionalities can be exploited to offer improved viewing experience, when accessing highresolution or multi-views video content through individual and potentially bandwidth-constrained connections. This is done by giving the client the opportunity to select interactively a preferred version among the multiple streams that are offered to render the scene. An instance of this architecture has been implemented extending the liveMedia streaming library and using the x264 video encoder. Automatic methods have been designed and implemented to generate the multiple versions of the streamed content. In a surveillance scenario, the versions are constructed by sub-sampling the original high resolution image, or by cropping the image sequence to focus on regions of interest, in a temporally consistent way. In a soccer game context, zoomed-in versions of far view shots are computed and offered as alternatives to the sub-sampled sequence. We demonstrate the feasibility and relevance of the approach through subjective experiments.

Keywords: AUTOMATIC ENRICHED CONTENT, H.264/AVC, INTERACTIVE SYSTEM, MOBILE STREAM, REGION OF INTEREST, VIDEO SEGMENTATION.

Bibentry:
@INPROCEEDINGS{Fernandez:2010:b,
  author = {I.A. Fernandez and F. Chen and F. Lavigne and X. Desurmont and C. De Vleeschouwer},
  title = {Worthy visual content on mobile through interactive video streaming},
  booktitle = {IEEE Int. Conf. on Multimedia & Expo},
  year = {2010},
  address = {Singapore},
  month = {July 19-23},
  note = {dpt:img*grp:mm*lg:en*prj:walcomo},
  abstract = {This paper builds on an interactive streaming architecture that supports both user feedback interpretation, and temporal juxtaposition of multiple video bitstreams in a single streaming session. As an original contribution, we explain how these functionalities can be exploited to offer improved viewing experience, when accessing highresolution or multi-views video content through individual and potentially bandwidth-constrained connections. This is done by giving the client the opportunity to select interactively a preferred version among the multiple streams that are offered to render the scene. An instance of this architecture has been implemented extending the liveMedia streaming library and using the x264 video encoder. Automatic methods have been designed and implemented to generate the multiple versions of the streamed content. In a surveillance scenario, the versions are constructed by sub-sampling the original high resolution image, or by cropping the image sequence to focus on regions of interest, in a temporally consistent way. In a soccer game context, zoomed-in versions of far view shots are computed and offered as alternatives to the sub-sampled sequence. We demonstrate the feasibility and relevance of the approach through subjective experiments.},
  url = {2010_ICME_WALCOMO.pdf},
  keywords = {Mobile Stream, Interactive System, Automatic Enriched Content, Video Segmentation, H.264/AVC, Region of Interest},
};
  • pdf icon
78 F. Lavigne, F. Chen and X. Desurmont. Automatic video zooming for sport team video broadcasting on smart phone. Int. Conf. on Computer Vision Theory and Applications . Angers, France. May 2010.
Abstract: This paper presents a general framework to adapt the size of a sport team video extracted from TV to a small device screen. We use a soccer game context to describe the four main steps of our video processing framework: (1) A view type detector specifies if the current frame of the video has to be resized or not. (2) If the camera point of view is far, a ball detector localizes the interesting area of the scene. (3) Then, the current frame is resized and centered on the ball, taking into account some parameters, such as the ball position and its speed. (4) At the end of the process, the score banner is detected and removed by an inpainting method.

Keywords: BALL DETECTION, CLIP SEGMENTATION, REGION OF INTEREST FOCUS, SPORT TEAM VIDEO BROADCASTING, VIEW TYPE DETECTION.

Bibentry:
@INPROCEEDINGS{Lavigne:2010:a,
  author = {F. Lavigne and F. Chen and X. Desurmont},
  title = {Automatic video zooming for sport team video broadcasting on smart phone},
  booktitle = {Int. Conf. on Computer Vision Theory and Applications},
  year = {2010},
  address = {Angers, France},
  month = {May 17-21},
  note = {dpt:img*grp:mm*lg:en*prj:walcomo},
  abstract = {This paper presents a general framework to adapt the size of a sport team video extracted from TV to a small device screen. We use a soccer game context to describe the four main steps of our video processing framework: (1) A view type detector specifies if the current frame of the video has to be resized or not. (2) If the camera point of view is far, a ball detector localizes the interesting area of the scene. (3) Then, the current frame is resized and centered on the ball, taking into account some parameters, such as the ball position and its speed. (4) At the end of the process, the score banner is detected and removed by an inpainting method.},
  url = {2010_VISAPP_WALCOMO.pdf},
  keywords = {sport team video broadcasting, clip segmentation, view type detection, ball detection, region of interest focus},
};
  • pdf icon
77 I.A. Fernandez, F. Chen, F. Lavigne, X. Desurmont and C. De Vleeschouwer. Browsing Sport Content Through an Interactive H.264 Streaming Session. Int. Conf. on Advances in Multimedia . Athens/Glyfada, Greece. June 2010.
Abstract: This paper builds on an interactive streaming architecture that supports both user feedback interpretation, and temporal juxtaposition of multiple video bitstreams in a single streaming session. As an original contribution, these functionalities can be exploited to offer improved viewing experience, when accessing football content through individual and potentially bandwidth constrained connections. Starting from a conventional broadcasted content, our system automatically splits the native content into non-overlapping and semantically consistent segments. Each segment is then divided into shots, based on conventional view boundary detection. Shots are finally splitted in small clips. These clips support our browsing capabilities during the whole playback in a temporally consistent way. Multiple versions are automatically created to render each clip. Versioning depends on the view type of the initial shot, and typically corresponds to the generation of zoomed in and spatially or temporally subsampled video streams. Clips are encoded independently so that the server can decide on the fly the version to send as a function of the semantic relevance of the segments (in a user-transparent basis, as inferred from video analysis or metadata) and the interactive user requests. Replaying certain game actions is also offered upon request. The streaming is automatically switched to the requested event. Later, the playback is resumed without any offset. The capabilities of our system rely on the H.264/AVC standard. We use soccer videos to validate our framework in subjective experiments showing the feasibility and relevance of our system.

Keywords: BROWSING CAPABILITIES, CLIP SEGMENTATION, H.264/AVC., INTERACTIVE STREAMING, VIEW TYPE DETECTION.

Bibentry:
@INPROCEEDINGS{Fernandez:2010:a,
  author = {I.A. Fernandez and F. Chen and F. Lavigne and X. Desurmont and C. De Vleeschouwer},
  title = {Browsing Sport Content Through an Interactive H.264 Streaming Session},
  booktitle = {Int. Conf. on Advances in Multimedia},
  year = {2010},
  address = {Athens/Glyfada, Greece},
  month = {June 13-19},
  note = {dpt:img*grp:mm*lg:en*prj:walcomo},
  abstract = {This paper builds on an interactive streaming architecture that supports both user feedback interpretation, and temporal juxtaposition of multiple video bitstreams in a single streaming session. As an original contribution, these functionalities can be exploited to offer improved viewing experience, when accessing football content through individual and potentially bandwidth constrained connections. Starting from a conventional broadcasted content, our system automatically splits the native content into non-overlapping and semantically consistent segments. Each segment is then divided into shots, based on conventional view boundary detection. Shots are finally splitted in small clips. These clips support our browsing capabilities during the whole playback in a temporally consistent way. Multiple versions are automatically created to render each clip. Versioning depends on the view type of the initial shot, and typically corresponds to the generation of zoomed in and spatially or temporally subsampled video streams. Clips are encoded independently so that the server can decide on the fly the version to send as a function of the semantic relevance of the segments (in a user-transparent basis, as inferred from video analysis or metadata) and the interactive user requests. Replaying certain game actions is also offered upon request. The streaming is automatically switched to the requested event. Later, the playback is resumed without any offset. The capabilities of our system rely on the H.264/AVC standard. We use soccer videos to validate our framework in subjective experiments showing the feasibility and relevance of our system.},
  url = {2010_MMEDIA_WALCOMO.pdf},
  keywords = {interactive streaming, browsing capabilities, clip segmentation, view type detection, H.264/AVC.},
};
  • pdf icon
76 X. Desurmont, C. Carincotte and F. Bremond. Intelligent Video Systems: A Review of Performance Evaluation Metrics that use Mapping Procedures. IEEE Int. Workshop on Performance Evaluation of Tracking and Surveillance . Boston, US. August 2010.
Abstract: In Intelligent Video Systems, most of the recent advanced performance evaluation metrics perform a stage of mapping data between the system results and ground truth. This paper aims to review these metrics using a proposed framework. It will focus on metrics for events detection, objects detection and objects tracking systems.

Keywords: MAPPING, METRIC, PERFORMANCE EVALUATION, VIDEO INTELLIGENT SYSTEM.

Bibentry:
@INPROCEEDINGS{Desurmont:2010,
  author = {X. Desurmont and C. Carincotte and F. Bremond},
  title = {Intelligent Video Systems: A Review of Performance Evaluation Metrics that use Mapping Procedures},
  booktitle = {IEEE Int. Workshop on Performance Evaluation of Tracking and Surveillance},
  year = {2010},
  address = {Boston, US},
  month = {August 29},
  note = {dpt:img*grp:vs*lg:en},
  abstract = {In Intelligent Video Systems, most of the recent advanced performance evaluation metrics perform a stage of mapping data between the system results and ground truth. This paper aims to review these metrics using a proposed framework. It will focus on metrics for events detection, objects detection and objects tracking systems.},
  url = {2010_IEEE-PETS.pdf},
  keywords = {Video intelligent system, performance evaluation, metric, mapping},
};
  • pdf icon
75 J.Y.L. Lawson, M. Coterot, C. Carincotte and B. Macq. Component-Based High Fidelity Interactive Prototyping of Post-WIMP Interactions. ACM Int. Conference on Multimodal Interfaces . Beijing, China. November 2010.
Abstract: In order to support interactive high-fidelity prototyping of post-WIMP user interactions, we propose a multi-fidelity design method based on a unifying component-based model and supported by an advanced tool suite, the OpenInterface Platform Workbench. Our approach strives for supporting a collaborative (programmer-designer) and user-centered design activity. The workbench architecture allows exploration of novel interaction techniques through seamless integration and adaptation of heterogeneous components, high-fidelity rapid prototyping, runtime evaluation and fine-tuning of designed systems. This paper illustrates through the iterative construction of a running example how OpenInterface allows the leverage of existing resources and fosters the creation of non-conventional interaction techniques.

Keywords: COMPONENT-BASED ARCHITECTURE, INTERACTIVE DESIGN, MULTIMODAL INTERFACES, MULTIMODAL SOFTWARE ARCHITECTURE, OPENINTERFACE, POST-WIMP, PROTOTYPING, REUSABLE SOFTWARE COMPONENT, SKEMMI.

Bibentry:
@INPROCEEDINGS{Lawson:2010,
  author = {J.Y.L. Lawson and M. Coterot and C. Carincotte and B. Macq},
  title = {Component-Based High Fidelity Interactive Prototyping of Post-WIMP Interactions},
  booktitle = {ACM Int. Conference on Multimodal Interfaces},
  year = {2010},
  address = {Beijing, China},
  month = {November 8-12},
  note = {dpt:img*grp:mm*lg:en*prj:numediart},
  abstract = {In order to support interactive high-fidelity prototyping of post-WIMP user interactions, we propose a multi-fidelity design method based on a unifying component-based model and supported by an advanced tool suite, the OpenInterface Platform Workbench. Our approach strives for supporting a collaborative (programmer-designer) and user-centered design activity. The workbench architecture allows exploration of novel interaction techniques through seamless integration and adaptation of heterogeneous components, high-fidelity rapid prototyping, runtime evaluation and fine-tuning of designed systems. This paper illustrates through the iterative construction of a running example how OpenInterface allows the leverage of existing resources and fosters the creation of non-conventional interaction techniques.},
  url = {2010_ICMI_NUMEDIART.pdf},
  keywords = {Prototyping, component-based architecture, reusable software component, multimodal interfaces, multimodal software architecture, OpenInterface, SKEMMI, post-WIMP, interactive design},
};

2009

  • pdf icon
74 X. Desurmont, F. Lavigne, J. Meessen and B. Macq. Learning the Fusion of Multiple Video Analysis Detectors. Multimedia Content Access: Algorithms and Systems III, part of the IS&T SPIE Symposium on Electronic Imaging . San Jose, CA, USA. January 2009.
Abstract: This paper presents a new fusion scheme for enhancing the result quality based on the combination of multiple different detectors. We present a study showing the fusion of multiple video analysis detectors like "detecting unattended luggage" in video sequences. One of the problems is the time jitter between different detectors, i.e. typically one system can trigger an event several seconds before another one. Another issue is the computation of the adequate fusion of realigned events. We propose a fusion system that overcomes these problems by being able (i) In the learning stage to match off-line the ground truth events with the result of the detectors events using a dynamic programming scheme (ii) To learn the relation between ground truth and result (iii) To fusion in real-time the events from different detectors thanks to the learning stage in order to maximize the global quality of result. We show promising results by combining outputs of different video analysis detector technologies.

Keywords: BAYES, DETECTORS, FUSION, LEARNING, PERFORMANCE EVALUATION, ROC.

Bibentry:
@INPROCEEDINGS{Desurmont:2009:a,
  author = {X. Desurmont and F. Lavigne and J. Meessen and B. Macq},
  title = {Learning the Fusion of Multiple Video Analysis Detectors},
  booktitle = {Multimedia Content Access: Algorithms and Systems III, part of the IS&T SPIE Symposium on Electronic Imaging},
  year = {2009},
  address = {San Jose, CA, USA},
  month = {January 18-22},
  note = {dpt:img*grp:vs*lg:en*prj:clovis|cantata},
  abstract = {This paper presents a new fusion scheme for enhancing the result quality based on the combination of multiple different detectors. We present a study showing the fusion of multiple video analysis detectors like "detecting unattended luggage" in video sequences. One of the problems is the time jitter between different detectors, i.e. typically one system can trigger an event several seconds before another one. Another issue is the computation of the adequate fusion of realigned events. We propose a fusion system that overcomes these problems by being able (i) In the learning stage to match off-line the ground truth events with the result of the detectors events using a dynamic programming scheme (ii) To learn the relation between ground truth and result (iii) To fusion in real-time the events from different detectors thanks to the learning stage in order to maximize the global quality of result. We show promising results by combining outputs of different video analysis detector technologies.},
  url = {2009_SPIE-EI_CLOVIS-CANTATA.pdf},
  keywords = {Fusion, detectors, learning, Roc, Bayes, performance evaluation},
};
  • pdf icon
73 C. Simon, J. Meessen and C. De Vleeschouwer. Insertion de proximal SVM dans des arbres aléatoires, mode d'emploi. XIème conférence francophone sur l'apprentissage artificiel . Hammamet, Tunisie. May 2009.
Abstract: En insérant plusieurs classifieurs SVM dans une architecture d'arbre binaire, il est possible de changer un problème multi-classes arbitraire en une hiérarchie de classifications binaires. Un des problèmes essentiels consiste à déterminer è chaque noeud de l'arbre la fa\ccon de regrouper les classes multiples en une paire de classes superposées à discriminer. Comme contribution principale, cet article propose d'utiliser un ensemble d'arbres aléatoires au lieu d'un seul arbre de décision optimisé, et ce de façon à réduire l'impact du choix de la paire de classes superposées sur les performances du classifieur. Les résultats empiriques obtenus sur différents ensembles de données UCI démontrent une amélioration des performances de classification, en comparaison aux solutions SVM et aux ensembles d'arbres aléatoires conventionnels.

Keywords: CLASSIFICATION MULTI-CLASSES, ENSEMBLE D'ARBRES DE DÉCISION, PROXIMAL SVM, SVM.

Bibentry:
@INPROCEEDINGS{Simon:2009:b,
  author = {C. Simon and J. Meessen and C. De Vleeschouwer},
  title = {Insertion de proximal SVM dans des arbres aléatoires, mode d'emploi},
  booktitle = {XIème conférence francophone sur l'apprentissage artificiel},
  year = {2009},
  address = {Hammamet, Tunisie},
  month = {May 25-29},
  note = {dpt:img*grp:mm*lg:fr*prj:arcade},
  abstract = {En insérant plusieurs classifieurs SVM dans une architecture d'arbre binaire, il est possible de changer un problème multi-classes arbitraire en une hiérarchie de classifications binaires. Un des problèmes essentiels consiste à déterminer è chaque noeud de l'arbre la fa\ccon de regrouper les classes multiples en une paire de classes superposées à discriminer. Comme contribution principale, cet article propose d'utiliser un ensemble d'arbres aléatoires au lieu d'un seul arbre de décision optimisé, et ce de façon à réduire l'impact du choix de la paire de classes superposées sur les performances du classifieur. Les résultats empiriques obtenus sur différents ensembles de données UCI démontrent une amélioration des performances de classification, en comparaison aux solutions SVM et aux ensembles d'arbres aléatoires conventionnels.},
  url = {2009_CAP_ARCADE.pdf},
  keywords = {SVM, proximal SVM, ensemble d'arbres de décision, classification multi-classes},
};
  • pdf icon
72 C. Machy, X. Desurmont, C. Mancas-Thillou, C. Carincotte and V. Delcourt. Machine vision for automated inspection of railway traffic recordings. Image Processing: Machine Vision Applications II , part of the IS&T SPIE Symposium on Electronic Imaging . San Jose, CA, USA. January 2009.
Abstract: For the 9000 train accidents reported each year in the European Union [1], the Recording Strip (RS) and Filling-Card (FC) related to the train activities represent the only usable evidence for SNCF (the French railway operator) and most of National authorities. More precisely, the RS contains information about the train journey, speed and related Driving Events (DE) such as emergency brakes, while the FC gives details on the departure/arrival stations. In this context, a complete checking for 100% of the RS was recently voted by French law enforcement authorities (instead of the 5% currently performed), which raised the question of an automated and efficient inspection of this huge amount of recordings. To do so, we propose a machine vision prototype, constituted with cassettes receiving RS and FC to be digitized. Then, a video analysis module firstly determines the type of RS among eight possible types; time/speed curves are secondly extracted to estimate the covered distance, speed and stops, while associated DE are finally detected using convolution process. A detailed evaluation on 15 RS (8000 kilometers and 7000 DE) shows very good results (100% of good detections for the type of band, only 0.28% of non detections for the DE). An exhaustive evaluation on a panel of about 100 RS constitutes the perspectives of the work.

Keywords: CROSS-CORRELATION, CURVE TRACKING, MACHINE VISION, PATTERN MATCHING, PERFORMANCE EVALUATION, RAILWAY TRAFFIC, RECORDING STRIP, SEGMENTATION.

Bibentry:
@INPROCEEDINGS{Machy:2009:a,
  author = {C. Machy and X. Desurmont and C. Mancas-Thillou and C. Carincotte and V. Delcourt},
  title = {Machine vision for automated inspection of railway traffic recordings},
  booktitle = {Image Processing: Machine Vision Applications II , part of the IS&T SPIE Symposium on Electronic Imaging},
  year = {2009},
  address = {San Jose, CA, USA},
  month = {January 18-22},
  note = {dpt:img*grp:mv*lg:en*prj:sncf-bg},
  abstract = {For the 9000 train accidents reported each year in the European Union [1], the Recording Strip (RS) and Filling-Card (FC) related to the train activities represent the only usable evidence for SNCF (the French railway operator) and most of National authorities. More precisely, the RS contains information about the train journey, speed and related Driving Events (DE) such as emergency brakes, while the FC gives details on the departure/arrival stations. In this context, a complete checking for 100% of the RS was recently voted by French law enforcement authorities (instead of the 5% currently performed), which raised the question of an automated and efficient inspection of this huge amount of recordings. To do so, we propose a machine vision prototype, constituted with cassettes receiving RS and FC to be digitized. Then, a video analysis module firstly determines the type of RS among eight possible types; time/speed curves are secondly extracted to estimate the covered distance, speed and stops, while associated DE are finally detected using convolution process. A detailed evaluation on 15 RS (8000 kilometers and 7000 DE) shows very good results (100% of good detections for the type of band, only 0.28% of non detections for the DE). An exhaustive evaluation on a panel of about 100 RS constitutes the perspectives of the work.},
  url = {2009_SPIE-EI_SNCF-BG.pdf},
  keywords = {Machine vision, railway traffic, recording strip, performance evaluation, segmentation, pattern matching, cross-correlation, curve tracking},
};
  • pdf icon
71 J.G. Verly, J. Meessen and B. Michel. 3D MEDIA: A Unique Audio and Video R&D Project Spanning Wallonia. ERCIM News July 2009.
Bibentry:
@MISC{ERCIM:2009,
  author = {J.G. Verly and J. Meessen and B. Michel},
  title = {3D MEDIA: A Unique Audio and Video R&D Project Spanning Wallonia},
  year = {2009},
  month = {July},
  note = {dpt:img*grp:mm*lg:en*prj:3dmedia},
  url = {2009_ERCIM_3D-MEDIA.pdf},
  howpublished = {ERCIM News},
};
  • pdf icon
70 X. Desurmont. Objective Performance Evaluation and Optimal Allocation Framework for Video Analysis Methods. PhD Thesis, University catholique de Louvain Louvain-la-Neuve, Belgium. June 2009.
Abstract: This thesis focuses on the use of Video Intelligent Systems (VIS) to analyse public area. Applications are automatic video surveillance, traffic monitoring, care of ageing people, etc. Indeed, today an ever-increasing number of cameras is being installed in both private and public areas. Since human supervisors can deal effectively with only a limited number of monitors and their performance falls as a result of boredom or fatigue, automatic analysis of the video content is required. Theses systems are generally complex and difficult to optimise. Nevertheless, I propose a framework of optimisation by first studying the generic approach class of systems, determining how to evaluate them and then determining how to enhance them by the optimal combination of different instances of VIS. I first study a generic approach of VIS and propose a representation in a structure with different processing stages (object segmentation, object tracking, events detection) decomposed with similar processing chains. Secondly, I focus on the problem of performance evaluation for different levels, especially when matching of several entities from result and ground truth. I propose a formulation to find the most adequate interpretation of the results and derive some implementations for object tracking and event detection algorithms. Thirdly, I derive a new approach for the combination of information provided by different VIS implementations. I propose splitting the input space in order to optimise the performance locally and then regrouping the VIS outputs by a resource allocation framework (similar to Lagrangian optimisation) that maximises the overall performance. The approach has been validated for event detection and foreground segmentation. Finally, I propose a new method to predict the limits when combining classifiers based on the ROC curve and information theory.
Bibentry:
@PHDTHESIS{Desurmont:2009:b,
  author = {X. Desurmont},
  title = {Objective Performance Evaluation and Optimal Allocation Framework for Video Analysis Methods},
  year = {2009},
  address = {Louvain-la-Neuve, Belgium},
  month = {June},
  note = {dpt:img*grp:vs*lg:en},
  abstract = {This thesis focuses on the use of Video Intelligent Systems (VIS) to analyse public area. Applications are automatic video surveillance, traffic monitoring, care of ageing people, etc. Indeed, today an ever-increasing number of cameras is being installed in both private and public areas. Since human supervisors can deal effectively with only a limited number of monitors and their performance falls as a result of boredom or fatigue, automatic analysis of the video content is required. Theses systems are generally complex and difficult to optimise. Nevertheless, I propose a framework of optimisation by first studying the generic approach class of systems, determining how to evaluate them and then determining how to enhance them by the optimal combination of different instances of VIS. I first study a generic approach of VIS and propose a representation in a structure with different processing stages (object segmentation, object tracking, events detection) decomposed with similar processing chains. Secondly, I focus on the problem of performance evaluation for different levels, especially when matching of several entities from result and ground truth. I propose a formulation to find the most adequate interpretation of the results and derive some implementations for object tracking and event detection algorithms. Thirdly, I derive a new approach for the combination of information provided by different VIS implementations. I propose splitting the input space in order to optimise the performance locally and then regrouping the VIS outputs by a resource allocation framework (similar to Lagrangian optimisation) that maximises the overall performance. The approach has been validated for event detection and foreground segmentation. Finally, I propose a new method to predict the limits when combining classifiers based on the ROC curve and information theory.},
  url = {2009_PHD_DESURMONT.pdf},
  school = {University catholique de Louvain},
  type = {PhD Thesis},
};
  • pdf icon
69 J. Meessen, M. Coterot, C. De Vleeschouwer, X. Desurmont and B. Macq. Flexible user interface for efficient content-based video surveillance retrieval: design and evaluation. Multimedia Content Access: Algorithms and Systems III, part of the IS&T SPIE Symposium on Electronic Imaging . San Jose, CA, USA. January 2009.
Abstract: The major drawback of interactive retrieval systems is the potential frustration of the user that is caused by an excessive labelling work. Active learning has proven to help solving this issue, by carefully selecting the examples to present to the user. In this context, the design of the user interface plays a critical role since it should invite the user to label the examples elected by the active learning. This paper presents the design and evaluation of an innovative user interface for image retrieval. It has been validate using real-life IEEE PETS video surveillance data. In particular, we investigated the most appropriate repartition of the display area between the retrieved video frames and the active learning examples, taking both objective and subjective user satisfaction parameters into account. The flexibility of the interface relies on a scalable representation of the video content such as Motion JPEG 2000 in our implementation.
Bibentry:
@INPROCEEDINGS{Meessen:2009:a,
  author = {J. Meessen and M. Coterot and C. De Vleeschouwer and X. Desurmont and B. Macq},
  title = {Flexible user interface for efficient content-based video surveillance retrieval: design and evaluation},
  booktitle = {Multimedia Content Access: Algorithms and Systems III, part of the IS&T SPIE Symposium on Electronic Imaging},
  year = {2009},
  address = {San Jose, CA, USA},
  month = {January 18-22},
  note = {dpt:img*grp:mm|vs*lg:en},
  abstract = {The major drawback of interactive retrieval systems is the potential frustration of the user that is caused by an excessive labelling work. Active learning has proven to help solving this issue, by carefully selecting the examples to present to the user. In this context, the design of the user interface plays a critical role since it should invite the user to label the examples elected by the active learning. This paper presents the design and evaluation of an innovative user interface for image retrieval. It has been validate using real-life IEEE PETS video surveillance data. In particular, we investigated the most appropriate repartition of the display area between the retrieved video frames and the active learning examples, taking both objective and subjective user satisfaction parameters into account. The flexibility of the interface relies on a scalable representation of the video content such as Motion JPEG 2000 in our implementation.},
  url = {2009_SPIE-EI.pdf},
};
  • pdf icon
68 C. Simon, J. Meessen and C. De Vleeschouwer. Embedding Proximal Support Vectors into Randomized Trees. European Symposium on Artificial Neural Networks (ESANN) . Bruges, Belgium. April 2009.
Abstract: This paper proposes to embed proximal support vectors into randomized trees in order to solve high dimensional and multi-category classification problems in a computationally efficient way. In comparison to standard support vector machine, the linear proximal SVM (PSVM) is known to provide a good trade off between classification correctness and computationally efficiency[5]. Moreover, by embedding multiple PSVM classifiers into a binary tree architecture, it is possible to turn an arbitrary multi-classes problem into a hierarchy of binary classifications. The critical issue then consists in determining in each node of the tree how to aggregate the multiple classes into a pair of say overlay classes to discriminate. As a fundamental contribution, our paper proposes to deploy an ensemble of randomized trees, instead of a single optimized decision tree, to bypass the question of overlay classes definition. Hence, the multiple trees explore a set of distinct overlay classes definitions simultaneously, which relaxes the class partitioning question for each individual tree, and results in more robust decisions for the ensemble of trees. Empirical results on various datasets demonstrate a significant gain in accuracy both compared to "one versus one" SVM solutions and to conventional ensemble of decision trees classifiers.
Bibentry:
@INPROCEEDINGS{Simon:2009:a,
  author = {C. Simon and J. Meessen and C. De Vleeschouwer},
  title = {Embedding Proximal Support Vectors into Randomized Trees},
  booktitle = {European Symposium on Artificial Neural Networks (ESANN)},
  year = {2009},
  address = {Bruges, Belgium},
  month = {April 22-24},
  note = {dpt:img*grp:mm*lg:en*prj:arcade},
  abstract = {This paper proposes to embed proximal support vectors into randomized trees in order to solve high dimensional and multi-category classification problems in a computationally efficient way. In comparison to standard support vector machine, the linear proximal SVM (PSVM) is known to provide a good trade off between classification correctness and computationally efficiency[5]. Moreover, by embedding multiple PSVM classifiers into a binary tree architecture, it is possible to turn an arbitrary multi-classes problem into a hierarchy of binary classifications. The critical issue then consists in determining in each node of the tree how to aggregate the multiple classes into a pair of say overlay classes to discriminate. As a fundamental contribution, our paper proposes to deploy an ensemble of randomized trees, instead of a single optimized decision tree, to bypass the question of overlay classes definition. Hence, the multiple trees explore a set of distinct overlay classes definitions simultaneously, which relaxes the class partitioning question for each individual tree, and results in more robust decisions for the ensemble of trees. Empirical results on various datasets demonstrate a significant gain in accuracy both compared to "one versus one" SVM solutions and to conventional ensemble of decision trees classifiers.},
  url = {2009_ESANN_ARCADE.pdf},
};
  • pdf icon
67 M. Mancas,, R. Bose, H. Sullivan, C. Machy, R. Ben Madhkour and T. Ravet. Morface: Face Morphing. June 2009.
Abstract: The purpose of this project was to produce a real-time installation which uses two approaches: a close approach dealing with user's face and head position and a far approach where the installation reacts to user's motion differently if it is repetitive or not. The installation was personalized by a Mona Lisa.
Bibentry:
@MISC{Mancas:2009:b,
  author = {M. Mancas, and R. Bose and H. Sullivan and C. Machy and R. Ben Madhkour and T. Ravet},
  title = {Morface: Face Morphing},
  year = {2009},
  month = {June},
  note = {dpt:img*grp:mm*lg:en*prj:numediart},
  abstract = {The purpose of this project was to produce a real-time installation which uses two approaches: a close approach dealing with user's face and head position and a far approach where the installation reacts to user's motion differently if it is repetitive or not. The installation was personalized by a Mona Lisa.},
  url = {numediart_2009_s06_p2_report.pdf},
  journal = {QPSR of the numediart research program},
  volume = {2},
  issue = {2},
};
  • pdf icon
66 C. Machy, C. Carincotte and X. Desurmont. On the use of Video Content Analysis in ITS: a review from academic to commercial applications. Int. Conf. on ITS Telecommunications . Lille, France. October 2009.
Abstract: Stand-alone cameras or CCTV networks are nowadays commonly present in public areas such as city centers, stores and more recently in transportation infrastructures. In the meantime, automatic processing of video data is a field of activity stirring up the utmost attention in the pattern recognition community; state-of-the-art advances in this area enable the reliable extraction of features and the investigation of numerous applications dedicated to ITS. A first obvious field of application of Video Content Analysis (VCA) consists in improving safety and security in transport's context. Embedded VCA in vehicles can track pedestrians to avoid collisions, improving safety. Used in railway station, VCA are able to detect left luggage allowing to enhance security. Video streams available from such installations may also represent a useful source of information for statistical transportation applications, e.g. monitoring of road traffic conditions or providing accurate counting statistics in railway/subway stations. This paper proposes an overview of the VCA applications in terms of safety, security and efficiency for ITS, with a specific focus on the usability of such VCA systems (emerging research topics, state-of-the-art studies, already commercialized applications, etc).
Bibentry:
@INPROCEEDINGS{Machy:2009:b,
  author = {C. Machy and C. Carincotte and X. Desurmont},
  title = {On the use of Video Content Analysis in ITS: a review from academic to commercial applications},
  booktitle = {Int. Conf. on ITS Telecommunications},
  year = {2009},
  address = {Lille, France},
  month = {October 20-22},
  note = {dpt:img*lg:en*prj:serket},
  abstract = {Stand-alone cameras or CCTV networks are nowadays commonly present in public areas such as city centers, stores and more recently in transportation infrastructures. In the meantime, automatic processing of video data is a field of activity stirring up the utmost attention in the pattern recognition community; state-of-the-art advances in this area enable the reliable extraction of features and the investigation of numerous applications dedicated to ITS. A first obvious field of application of Video Content Analysis (VCA) consists in improving safety and security in transport's context. Embedded VCA in vehicles can track pedestrians to avoid collisions, improving safety. Used in railway station, VCA are able to detect left luggage allowing to enhance security. Video streams available from such installations may also represent a useful source of information for statistical transportation applications, e.g. monitoring of road traffic conditions or providing accurate counting statistics in railway/subway stations. This paper proposes an overview of the VCA applications in terms of safety, security and efficiency for ITS, with a specific focus on the usability of such VCA systems (emerging research topics, state-of-the-art studies, already commercialized applications, etc).},
  url = {2009_ITS-T_SERKET.pdf},
};
  • pdf icon
65 C. Simon, J. Meessen and C. De Vleeschouwer. Visual Event Recognition using Decision Trees. Multimedia Tools and Applications . April 2009.
Abstract: This paper presents a classifier-based approach to recognize dynamic events in video surveillance sequences. The goal of this work is to propose a flexible event recognition system that can be used without relying on a long-term explicit tracking procedure. It is composed of three stages. The first one aims at defining and building a set of relevant features describing the shape and movements of the foreground objects in the scene. To this aim, we introduce new motion descriptors based on space-time volumes. Second, an unsupervised learning-based method is used to cluster the objects, thereby defining a set of coarse to fine local patterns of features, representing primitive events in the video sequences. Finally, events are modeled as a spatio-temporal organization of patterns based on an ensemble of randomized trees. In particular, we want this classifier to discover the temporal and causal correlations between the most discriminative patterns. Our system is experimented and validated both on simulated and real-life data.
Bibentry:
@ARTICLE{Simon:2009:c,
  author = {C. Simon and J. Meessen and C. De Vleeschouwer},
  title = {Visual Event Recognition using Decision Trees},
  year = {2009},
  note = {dpt:img*grp:mm|vs*lg:en*prj:arcade},
  abstract = {This paper presents a classifier-based approach to recognize dynamic events in video surveillance sequences. The goal of this work is to propose a flexible event recognition system that can be used without relying on a long-term explicit tracking procedure. It is composed of three stages. The first one aims at defining and building a set of relevant features describing the shape and movements of the foreground objects in the scene. To this aim, we introduce new motion descriptors based on space-time volumes. Second, an unsupervised learning-based method is used to cluster the objects, thereby defining a set of coarse to fine local patterns of features, representing primitive events in the video sequences. Finally, events are modeled as a spatio-temporal organization of patterns based on an ensemble of randomized trees. In particular, we want this classifier to discover the temporal and causal correlations between the most discriminative patterns. Our system is experimented and validated both on simulated and real-life data.},
  url = {2009_MTAP_ARCADE.pdf},
  journal = {Multimedia Tools and Applications},
};
  • pdf icon
64 M. Mancas, P. Brunet, F. Cavallero, D. Glowinski, C. Machy, P.-J. Maes, S. Paschalidou, M.K. Rajagopal, S. Schibeci and L. Vincze. Hypersocial Museum: addressing the social interaction challenge with museum scenarios and attention-based approaches. September 2009.
Abstract: This work intended to show that the use of museum scenarios and computational attention models are a good way to achieve social interaction studies. The use of attention-based algorithm allows the possibility to dynamically analyze visitors' behavior in museums in long-term context where some typical behavior models are learnt. Moreover, by providing feedback to visitors, it is possible to interact with them and also to foster interaction between visitors: the system here becomes a sort of mediator in human to human interaction.
Bibentry:
@MISC{Mancas:2009:c,
  author = {M. Mancas and P. Brunet and F. Cavallero and D. Glowinski and C. Machy and P.-J. Maes and S. Paschalidou and M.K. Rajagopal and S. Schibeci and L. Vincze},
  title = {Hypersocial Museum: addressing the social interaction challenge with museum scenarios and attention-based approaches},
  year = {2009},
  month = {September},
  note = {dpt:img*grp:mm*lg:en*prj:numediart},
  abstract = {This work intended to show that the use of museum scenarios and computational attention models are a good way to achieve social interaction studies. The use of attention-based algorithm allows the possibility to dynamically analyze visitors' behavior in museums in long-term context where some typical behavior models are learnt. Moreover, by providing feedback to visitors, it is possible to interact with them and also to foster interaction between visitors: the system here becomes a sort of mediator in human to human interaction.},
  url = {numediart_2009_s07_p3_report.pdf},
  journal = {QPSR of the numediart research program},
  volume = {2},
  issue = {3},
};
  • pdf icon
63 C. Carincotte, F. Bremond, J.-M. Odobez, L. Patino, B. Ravera and X. Desurmont. Multimedia knowledge-based content analysis over distributed architecture. NEM Summit - "Towards Future Media Internet . Saint-Malo, France. September 2009.
Abstract: In this paper, we review the recently finished CARETAKER project outcomes from a system point of view. The IST FP6-027231 CARETAKER project aimed at studying, developing and assessing multimedia knowledge-based content analysis, knowledge extraction components, and metadata management sub-systems in the context of automated situation awareness and decision support. More precisely, CARETAKER focused on the extraction of a structured knowledge from large multimedia collections recorded over surveillance networks of camera and microphones deployed in real sites. Indeed, the produced audio-visual streams, in addition to security and safety issues, represent a useful source of information when stored and automatically analysed, for instance in urban planning or resource optimisation. In this paper, we overview the communication architecture developed for the project, and detail the different innovative content analysis components developed within the test-beds. We also highlight the different technical concerns encountered for each individual brick, which are common issues in distributed media applications.

Keywords: APPLICATIONS FOR SENSOR NETWORKS, COMMUNICATION ARCHITECTURE, MULTIMODAL APPLICATIONS.

Bibentry:
@INPROCEEDINGS{Carincotte:2009:a,
  author = {C. Carincotte and F. Bremond and J.-M. Odobez and L. Patino and B. Ravera and X. Desurmont},
  title = {Multimedia knowledge-based content analysis over distributed architecture},
  booktitle = {NEM Summit - "Towards Future Media Internet},
  year = {2009},
  address = {Saint-Malo, France},
  month = {September 28-30},
  note = {dpt:img*grp:vs*lg:en*prj:caretaker},
  abstract = {In this paper, we review the recently finished CARETAKER project outcomes from a system point of view. The IST FP6-027231 CARETAKER project aimed at studying, developing and assessing multimedia knowledge-based content analysis, knowledge extraction components, and metadata management sub-systems in the context of automated situation awareness and decision support. More precisely, CARETAKER focused on the extraction of a structured knowledge from large multimedia collections recorded over surveillance networks of camera and microphones deployed in real sites. Indeed, the produced audio-visual streams, in addition to security and safety issues, represent a useful source of information when stored and automatically analysed, for instance in urban planning or resource optimisation. In this paper, we overview the communication architecture developed for the project, and detail the different innovative content analysis components developed within the test-beds. We also highlight the different technical concerns encountered for each individual brick, which are common issues in distributed media applications.},
  url = {2009_NEM-SUMMIT_CARETAKER.pdf},
  keywords = {communication architecture, applications for sensor networks, multimodal applications},
};

2008

  • pdf icon
62 V. Delcourt, C. Machy, C. Mancas-Thillou and X. Desurmont. Automatic Reader of Recording Strips. 8th World Congress on Railway Research (WCRR) . Seoul, Korea. May 2008.
Abstract: Even if the number of accidents involving the railway system is decreasing due to the technical progress, the statistics are still too high. For instance in 2004 in EU 25, 9309 accidents were reported, including 142 in France [ERA]. For each of these accidents in France, one element can be used as evidence in the eyes of the law: the recording strip and its associated filling-card or the so-called ATESS file recorded by digital Juridical Recording Units (JRU) introduced in the mid 80s. The strip contains all the information concerning the journey of the train, speed and time recording and all the driving events (such as emergency breaking). The card features additional information on train�s driver, departure/arrival stations, number of trains, etc. These two elements are presently checked manually. The idea of this project is to simplify the procedure and to perform the checking as automatically as possible. This paper then aims at presenting a whole system for the Automatic Read of Recording Strips (ARRS).
Bibentry:
@INPROCEEDINGS{Delcourt:2008,
  author = {V. Delcourt and C. Machy and C. Mancas-Thillou and X. Desurmont},
  title = {Automatic Reader of Recording Strips},
  booktitle = {8th World Congress on Railway Research (WCRR)},
  year = {2008},
  address = {Seoul, Korea},
  month = {May 19},
  note = {dpt:img*grp:mv*lg:en*prj:scnf-bg},
  abstract = {Even if the number of accidents involving the railway system is decreasing due to the technical progress, the statistics are still too high. For instance in 2004 in EU 25, 9309 accidents were reported, including 142 in France [ERA]. For each of these accidents in France, one element can be used as evidence in the eyes of the law: the recording strip and its associated filling-card or the so-called ATESS file recorded by digital Juridical Recording Units (JRU) introduced in the mid 80s. The strip contains all the information concerning the journey of the train, speed and time recording and all the driving events (such as emergency breaking). The card features additional information on train�s driver, departure/arrival stations, number of trains, etc. These two elements are presently checked manually. The idea of this project is to simplify the procedure and to perform the checking as automatically as possible. This paper then aims at presenting a whole system for the Automatic Read of Recording Strips (ARRS).},
  url = {2008_WCRR.pdf},
};
  • pdf icon
61 M. Mancas, and M. Bagein, N. Guichard, and H. Sullivan, C. Machy, and S. Mahmoudi and X. Siebert. Augmented Virtual Studio. December 2008.
Abstract: The Augmented Virtual Studio (AVS) project aims at acquiring the tools of video analysis and visualization needed to achieve advanced interfaces or interaction with virtual avatars and virtual worlds. Those techniques consist in data visualization, object segmentation, tracking and identification of blobs but also sketch recognition and more generally faces and objects recognition. All these tools were used to build three practical applications.
Bibentry:
@MISC{Mancas:2008,
  author = {M. Mancas and and M. Bagein and N. Guichard and and H. Sullivan and C. Machy and and S. Mahmoudi and X. Siebert},
  title = {Augmented Virtual Studio},
  year = {2008},
  month = {December},
  note = {dpt:img*grp:mm*lg:en*prj:numediart},
  abstract = {The Augmented Virtual Studio (AVS) project aims at acquiring the tools of video analysis and visualization needed to achieve advanced interfaces or interaction with virtual avatars and virtual worlds. Those techniques consist in data visualization, object segmentation, tracking and identification of blobs but also sketch recognition and more generally faces and objects recognition. All these tools were used to build three practical applications.},
  url = {numediart_2008_s04_p3_report.pdf},
  journal = {QPSR of the numediart research program},
  volume = {1},
  issue = {4},
  pages = {147-155},
};
  • pdf icon
60 A. Lapeyronnie, C. Parisot, J. Meessen, X. Desurmont and J.-F. Delaigle. Real-time road traffic classification using mobile video cameras. Real-Time Image Processing, part of the IS&T SPIE Symposium on Electronic Imaging . San Jose, CA, USA. January 2008.
Abstract: On board video analysis has attracted a lot of interest over the two last decades with as main goal to improve safety by detecting obstacles or assisting the driver. Our study aims at providing a real-time understanding of the urban road traffic. Considering a video camera fixed on the front of a public bus, we propose a cost-effective approach to estimate the speed of the vehicles on the adjacent lanes when the bus operates on a dedicated lane. We work on 1-D segments drawn in the image space, aligned with the road lanes. The relative speed of the vehicles is computed by detecting and tracking features along each of these segments. The absolute speed can be estimated from the relative speed if the camera speed is known, e.g. thanks to an odometer and/or GPS. Using pre-defined speed thresholds, the traffic can be classified into different categories such as "fluid", "congestion" etc. The solution offers both good performances and low computing complexity and is compatible with cheap video cameras, which allows its adoption by city traffic management authorities.

Keywords: COMPUTER VISION, REAL-TIME PROCESSING, TRAFFIC CLASSIFICATION.

Bibentry:
@INPROCEEDINGS{Lapeyronnie:2008,
  author = {A. Lapeyronnie and C. Parisot and J. Meessen and X. Desurmont and J.-F. Delaigle},
  title = {Real-time road traffic classification using mobile video cameras},
  booktitle = {Real-Time Image Processing, part of the IS&T SPIE Symposium on Electronic Imaging},
  year = {2008},
  address = {San Jose, CA, USA},
  month = {January 27-31},
  note = {dpt:img*grp:vs*lg:en*prj:moryne},
  abstract = {On board video analysis has attracted a lot of interest over the two last decades with as main goal to improve safety by detecting obstacles or assisting the driver. Our study aims at providing a real-time understanding of the urban road traffic. Considering a video camera fixed on the front of a public bus, we propose a cost-effective approach to estimate the speed of the vehicles on the adjacent lanes when the bus operates on a dedicated lane. We work on 1-D segments drawn in the image space, aligned with the road lanes. The relative speed of the vehicles is computed by detecting and tracking features along each of these segments. The absolute speed can be estimated from the relative speed if the camera speed is known, e.g. thanks to an odometer and/or GPS. Using pre-defined speed thresholds, the traffic can be classified into different categories such as "fluid", "congestion" etc. The solution offers both good performances and low computing complexity and is compatible with cheap video cameras, which allows its adoption by city traffic management authorities.},
  url = {2008_SPIE-EI_MORYNE.pdf},
  keywords = {real-time processing, computer vision, traffic classification},
};
  • pdf icon
59 CARETAKER Consortium. CARETAKER puts knowledge to good use. April 2008.
Abstract: Security is a fairly recent concern for transport companies. Urban transport used to be relatively secure and no-one could have imagined such an increase in crime and threats on board in the recent past. Closed Circuit Television (CCTV) is now an everyday part of our lives, at least in Europe. It helps to improve various daily operations and tasks, as well as providing users with various degrees of security.
Bibentry:
@MISC{Caretaker:2008,
  author = {CARETAKER Consortium},
  title = {CARETAKER puts knowledge to good use},
  year = {2008},
  note = {dpt:img*grp:vs*lg:en*prj:caretaker},
  abstract = {Security is a fairly recent concern for transport companies. Urban transport used to be relatively secure and no-one could have imagined such an increase in crime and threats on board in the recent past. Closed Circuit Television (CCTV) is now an everyday part of our lives, at least in Europe. It helps to improve various daily operations and tasks, as well as providing users with various degrees of security.},
  url = {2008_MOBILITY-MAGAZINE_CARETAKER.pdf},
  journal = {European Public Transport Magazine},
  issue = {13},
};
  • pdf icon
58 F.-O. Devaux, C. De Vleeschouwer, J. Meessen, C. Parisot, B. Macq and J.-F. Delaigle. Remote interactive browsing of video surveillance content based on JPEG 2000. IEEE Transactions on Circuits and Systems for Video Technology . April 2008.
Abstract: In video surveillance applications, pre-stored images are likely to be accessed remotely and interactively upon user request. In such context, the JPEG 2000 still image compression format is attractive because it supports flexible and progressive access to each individual image of the pre-stored content, in terms of spatial location, quality level, as well as resolution. However, when the client wants to play consecutive frames of the video sequence, the purely INTRA nature of JPEG 2000 dramatically penalizes the transmission efficiency. To mitigate this drawback, conditional replenishment mechanisms are envisioned. They convey arbitrary spatio-temporal segments of the initial video sequence directly through sporadic and rate-distortion (RD) optimized refresh of JPEG 2000 packets. Hence, they preserve JPEG 2000 compliance, while saving transmission resources. The replenishment algorithms proposed in this paper are original in two main aspects. First, they exploit the specificities of the JPEG 2000 codestream structure to balance the accuracy (in terms of bit-planes) of the replenishment across image subbands in a rate-distortion optimal way. Second, they take into account the still background nature of video surveillance content by maintaining two reference images at the receiver. One reference is the last reconstructed frame, as proposed in [2] and [3]. The other is a dynamically-computed estimate of the scene background, which helps to recover the background after a moving object has left the scene. As an additional contribution, we demonstrate that the embedded nature of the JPEG 2000 codestream easily supports prioritization of semantically relevant regions of interest while browsing video content. An interesting aspect of this JPEG 2000-based prioritization is that it can be regulated a posteriori, after the codestream generation, based on the interest expressed by the user at browsing time. Simulation results demonstrate the efficiency and flexibility of the approach compared to INTER-based solutions.

Keywords: ADAPTIVE AND INTERACTIVE MEDIA DELIVERY, CONDITIONAL REPLENISHMENT, JPEG 2000, VIDEO SERVER.

Bibentry:
@ARTICLE{Devaux:2008,
  author = {F.-O. Devaux and C. De Vleeschouwer and J. Meessen and C. Parisot and B. Macq and J.-F. Delaigle},
  title = {Remote interactive browsing of video surveillance content based on JPEG 2000},
  year = {2008},
  note = {dpt:img*lg:en*prj:wcam},
  abstract = {In video surveillance applications, pre-stored images are likely to be accessed remotely and interactively upon user request. In such context, the JPEG 2000 still image compression format is attractive because it supports flexible and progressive access to each individual image of the pre-stored content, in terms of spatial location, quality level, as well as resolution. However, when the client wants to play consecutive frames of the video sequence, the purely INTRA nature of JPEG 2000 dramatically penalizes the transmission efficiency. To mitigate this drawback, conditional replenishment mechanisms are envisioned. They convey arbitrary spatio-temporal segments of the initial video sequence directly through sporadic and rate-distortion (RD) optimized refresh of JPEG 2000 packets. Hence, they preserve JPEG 2000 compliance, while saving transmission resources. The replenishment algorithms proposed in this paper are original in two main aspects. First, they exploit the specificities of the JPEG 2000 codestream structure to balance the accuracy (in terms of bit-planes) of the replenishment across image subbands in a rate-distortion optimal way. Second, they take into account the still background nature of video surveillance content by maintaining two reference images at the receiver. One reference is the last reconstructed frame, as proposed in [2] and [3]. The other is a dynamically-computed estimate of the scene background, which helps to recover the background after a moving object has left the scene. As an additional contribution, we demonstrate that the embedded nature of the JPEG 2000 codestream easily supports prioritization of semantically relevant regions of interest while browsing video content. An interesting aspect of this JPEG 2000-based prioritization is that it can be regulated a posteriori, after the codestream generation, based on the interest expressed by the user at browsing time. Simulation results demonstrate the efficiency and flexibility of the approach compared to INTER-based solutions.},
  url = {2008_IEEE-T-CSVT.pdf},
  keywords = {Video server, JPEG 2000, Conditional replenishment, Adaptive and interactive media delivery},
  journal = {IEEE Transactions on Circuits and Systems for Video Technology},
};
  • pdf icon
57 ICT Results service. Transforming buses into mobile sensing platforms. ICT Results May 2008.
Bibentry:
@MISC{IctResult:2008,
  author = {ICT Results service},
  title = {Transforming buses into mobile sensing platforms},
  year = {2008},
  month = {May},
  note = {dpt:img*grp:vs*lg:en*prj:moryne},
  url = {2008_ICT_MORYNE.pdf},
  howpublished = {ICT Results},
};
  • pdf icon
56 C. Simon, J. Meessen and C. De Vleeschouwer. Using decision trees to build an event recognition framework for automated visual surveillance. Annual Belgian-Dutch Machine Learning Conference (Benelearn)  : p. 35-36. Spa, Belgium. May 2008.
Abstract: This paper presents a classifier-based approach to recognize possibly sophisticated events in video surveillance. The aim of this work is to propose a flexible and generic event recognition system that can be used in a real world context. Our system uses the ensemble of randomized trees procedure to model each event as a sequence of structured activity patterns, without using any tracking method. Experimental results demonstrate the robustness of the system toward artifacts and passer-by, and the effectiveness of its framework for event recognition applications in visual surveillance.
Bibentry:
@INPROCEEDINGS{Simon:2008,
  author = {C. Simon and J. Meessen and C. De Vleeschouwer},
  title = {Using decision trees to build an event recognition framework for automated visual surveillance},
  booktitle = {Annual Belgian-Dutch Machine Learning Conference (Benelearn)},
  year = {2008},
  address = {Spa, Belgium},
  month = {May 19-20},
  note = {dpt:img*grp:vs|mm*prj:arcade*lg:en},
  abstract = {This paper presents a classifier-based approach to recognize possibly sophisticated events in video surveillance. The aim of this work is to propose a flexible and generic event recognition system that can be used in a real world context. Our system uses the ensemble of randomized trees procedure to model each event as a sequence of structured activity patterns, without using any tracking method. Experimental results demonstrate the robustness of the system toward artifacts and passer-by, and the effectiveness of its framework for event recognition applications in visual surveillance.},
  url = {2008_BENELEARN_ARCADE.pdf},
  pages = {35-36},
};
  • pdf icon
55 C. Parisot, J. Meessen, C. Carincotte and X. Desurmont. Real-time road traffic classification using on-board bus video camera. 11th Int. IEEE Conf. on Intelligent Transportation Systems (ITSC)  : p. 189-196. Beijing, China. October 2008.
Abstract: On-board video analysis has attracted a lot of interest over the two last decades, mainly for safety improvement (through e.g. obstacles detection or drivers assistance). In this context, our study aims at providing a video-based real-time understanding of the urban road traffic. Considering a video camera fixed on the front of a public bus, we propose a costeffective approach to estimate the speed of the vehicles on the adjacent lanes when the bus operates on its reserved lane. We propose to work on 1-D segments drawn in the image space, aligned with the road lanes. The relative speed of the vehicles is computed by detecting and tracking features along each of these segments, while the absolute speed of vehicles is estimated from the relative one thanks to odometer and/or GPS data. Using pre-defined speed thresholds, the traffic can be classified in realtime into different categories such as "fluid", "congestion"... As demonstrated in the evaluation stage, the proposed solution offers both good performances and low computing complexity, and is also compatible with cheap video cameras, which allows its adoption by city traffic management authorities.
Bibentry:
@INPROCEEDINGS{Parisot:2008,
  author = {C. Parisot and J. Meessen and C. Carincotte and X. Desurmont},
  title = {Real-time road traffic classification using on-board bus video camera},
  booktitle = {11th Int. IEEE Conf. on Intelligent Transportation Systems (ITSC)},
  year = {2008},
  address = {Beijing, China},
  month = {October 12-15},
  note = {dpt:img*grp:vs*prj:moryne*lg:en},
  abstract = {On-board video analysis has attracted a lot of interest over the two last decades, mainly for safety improvement (through e.g. obstacles detection or drivers assistance). In this context, our study aims at providing a video-based real-time understanding of the urban road traffic. Considering a video camera fixed on the front of a public bus, we propose a costeffective approach to estimate the speed of the vehicles on the adjacent lanes when the bus operates on its reserved lane. We propose to work on 1-D segments drawn in the image space, aligned with the road lanes. The relative speed of the vehicles is computed by detecting and tracking features along each of these segments, while the absolute speed of vehicles is estimated from the relative one thanks to odometer and/or GPS data. Using pre-defined speed thresholds, the traffic can be classified in realtime into different categories such as "fluid", "congestion"... As demonstrated in the evaluation stage, the proposed solution offers both good performances and low computing complexity, and is also compatible with cheap video cameras, which allows its adoption by city traffic management authorities.},
  url = {2008_IEEE-ITSC_MORYNE.pdf},
  pages = {189-196},
};
  • pdf icon
54 C. Carincotte, X. Naturel, M. Hick, J.-M. Odobez, J. Yao, A. Bastide and B. Corbucci. Understanding metro station usage using Closed Circuit TeleVision cameras analysis. 11th Int. IEEE Conf. on Intelligent Transportation Systems (ITSC)  : p. 420-427. Beijing, China. October 2008.
Abstract: In this paper, we propose to show how video data available in standard CCTV transportation systems can represent a useful source of information for transportation infrastructure management, optimization and planning if adequately analyzed (e.g. to facilitate equipment usage understanding, to ease diagnostic and planning for system managers). More precisely, we present two algorithms allowing to estimate the number of people in a camera view and to measure the platform time-occupancy by trains. A statistical analysis of the results of each algorithm provide interesting insights regarding station usage. It is also shown that combining information from the algorithms in different views provide a finer understanding of the station usage. An end-user point of view confirms the interest of the proposed analysis.
Bibentry:
@INPROCEEDINGS{Carincotte:2008:b,
  author = {C. Carincotte and X. Naturel and M. Hick and J.-M. Odobez and J. Yao and A. Bastide and B. Corbucci},
  title = {Understanding metro station usage using Closed Circuit TeleVision cameras analysis},
  booktitle = {11th Int. IEEE Conf. on Intelligent Transportation Systems (ITSC)},
  year = {2008},
  address = {Beijing, China},
  month = {October 12-15},
  note = {dpt:img*grp:vs*prj:caretaker*lg:en},
  abstract = {In this paper, we propose to show how video data available in standard CCTV transportation systems can represent a useful source of information for transportation infrastructure management, optimization and planning if adequately analyzed (e.g. to facilitate equipment usage understanding, to ease diagnostic and planning for system managers). More precisely, we present two algorithms allowing to estimate the number of people in a camera view and to measure the platform time-occupancy by trains. A statistical analysis of the results of each algorithm provide interesting insights regarding station usage. It is also shown that combining information from the algorithms in different views provide a finer understanding of the station usage. An end-user point of view confirms the interest of the proposed analysis.},
  url = {2008_IEEE-ITSC_CARETAKER.pdf},
  pages = {420-427},
};
  • pdf icon
53 X. Radu, A. Lapeyronnie and C. Craeye. Numerical and Experimental Analysis of a Wire Medium Collimator for MRI. Electromagnetic, Special issue on Metamaterials , Vol. 28(7) : p. 531-543. October 2008.
Abstract: This article analyzes the collimation behavior of a wire medium devoted to magnetic resonance imaging. In the first part, the point-spread function of a doubly infinite wire medium is analyzed for the magnetic field with the help of the array scanning method. In the second part, we present two methods to evaluate the field transmission by the wire medium based on the measure of the magnetic field computed with the method of moments. Simulation results obtained with the MoM are shown. This behavior is validated in the third part with experimental results obtained with a magnetic resonance imaging instrument at the University Hospital of Lieacutege with a straight-wires collimator. Both simulation and experimental results confirm the ability of the wire medium to transfer electromagnetic fields in magnetic resonance imaging operational conditions.

Keywords: MAGNETIC RESONANCE IMAGING, METHOD OF MOMENTS, WIRE MEDIUM.

Bibentry:
@ARTICLE{Radu:2008,
  author = {X. Radu and A. Lapeyronnie and C. Craeye},
  title = {Numerical and Experimental Analysis of a Wire Medium Collimator for MRI},
  year = {2008},
  month = {October},
  note = {dpt:img*lg:en*prj:irm-focus},
  abstract = {This article analyzes the collimation behavior of a wire medium devoted to magnetic resonance imaging. In the first part, the point-spread function of a doubly infinite wire medium is analyzed for the magnetic field with the help of the array scanning method. In the second part, we present two methods to evaluate the field transmission by the wire medium based on the measure of the magnetic field computed with the method of moments. Simulation results obtained with the MoM are shown. This behavior is validated in the third part with experimental results obtained with a magnetic resonance imaging instrument at the University Hospital of Lieacutege with a straight-wires collimator. Both simulation and experimental results confirm the ability of the wire medium to transfer electromagnetic fields in magnetic resonance imaging operational conditions.},
  url = {2008_ELECTROMAGNETIC_IRM-FOCUS.pdf},
  keywords = {wire medium, magnetic resonance imaging, method of moments},
  journal = {Electromagnetic, Special issue on Metamaterials},
  volume = {28},
  pages = {531-543},
  number = {7},
  publisher = {Taylor & Francis},
};
  • pdf icon
52 C. Carincotte, X. Desurmont and A. Bastide. Adaptive metadata management system for distributed video content analysis. Advanced Concepts for Intelligent Vision Systems (ACIVS)  : p. 334-345. Juan-les-Pins, France. October 2008.
Abstract: Scientific advances in the development of video processing algorithms now allow various distributed and collaborative vision-based applications. However, the lack of recognised standard in this area drives system developers to build specific systems, preventing from e.g. content analysis components upgrade or system reuse in different environments. As a result, the need for a generic, context-independent and adaptive system for storing and managing video analysis results comes out as conspicuous. In order to address this issue, we propose a data schema-independent data warehouse backed by a multiagent system. This system relies on the semantic web knowledge representation format, namely the RDF, to guarantee maximum adaptability and flexibility regarding schema transformation and knowledge retrieval. The storage system itself, namely data warehouse, comes from the state-of-the-art technologies of knowledge management, providing efficient analysis and reporting capabilities within the monitoring system.
Bibentry:
@INPROCEEDINGS{Carincotte:2008:a,
  author = {C. Carincotte and X. Desurmont and A. Bastide},
  title = {Adaptive metadata management system for distributed video content analysis},
  booktitle = {Advanced Concepts for Intelligent Vision Systems (ACIVS)},
  year = {2008},
  address = {Juan-les-Pins, France},
  month = {October 20-24},
  note = {dpt:img*grp:vs*prj:caretaker*lg:en},
  abstract = {Scientific advances in the development of video processing algorithms now allow various distributed and collaborative vision-based applications. However, the lack of recognised standard in this area drives system developers to build specific systems, preventing from e.g. content analysis components upgrade or system reuse in different environments. As a result, the need for a generic, context-independent and adaptive system for storing and managing video analysis results comes out as conspicuous. In order to address this issue, we propose a data schema-independent data warehouse backed by a multiagent system. This system relies on the semantic web knowledge representation format, namely the RDF, to guarantee maximum adaptability and flexibility regarding schema transformation and knowledge retrieval. The storage system itself, namely data warehouse, comes from the state-of-the-art technologies of knowledge management, providing efficient analysis and reporting capabilities within the monitoring system.},
  url = {2008_ACIVS_CARETAKER.pdf},
  pages = {334-345},
};
  • pdf icon
51 L. Benyoussef, C. Carincotte and S. Derrode. Extension of Higher-Order HMC Modeling with Application to Image Segmentation. Digital Signal Processing , Vol. 18 : p. 849-860. September 2008.
Abstract: In this work, we propose to improve the neighboring relationship ability of the Hidden Markov Chain (HMC) model, by extending the memory lengthes of both the Markov chain process and the data-driven densities arising in the model. The new model is able to learn more complex noise structures, with respect to the configuration of several previous states and observations. Model parameters estimation is performed from an extension of the general Iterative Conditional Estimation (ICE) method to take into account memories, which makes the classification algorithm unsupervised. The higher-order HMC model is then evaluated in the image segmentation context. A comparative study conducted on a simulated image is carried out according to the order of the chain. Experimental results on a Synthetic Aperture Radar (SAR) image show that higher-order model can provide more homogeneous segmentations than the classical model, but to the cost of higher memory and computing time requirements.

Keywords: HIGHER-ORDER HIDDEN MARKOV CHAIN, ITERATIVE CONDITIONAL ESTIMATION, MAXIMAL POSTERIOR MODE., UNSUPERVISED IMAGE SEGMENTATION.

Bibentry:
@ARTICLE{Benyoussef:2008,
  author = {L. Benyoussef and C. Carincotte and S. Derrode},
  title = {Extension of Higher-Order HMC Modeling with Application to Image Segmentation},
  year = {2008},
  month = {September},
  note = {dpt:img*lg:en},
  abstract = {In this work, we propose to improve the neighboring relationship ability of the Hidden Markov Chain (HMC) model, by extending the memory lengthes of both the Markov chain process and the data-driven densities arising in the model. The new model is able to learn more complex noise structures, with respect to the configuration of several previous states and observations. Model parameters estimation is performed from an extension of the general Iterative Conditional Estimation (ICE) method to take into account memories, which makes the classification algorithm unsupervised. The higher-order HMC model is then evaluated in the image segmentation context. A comparative study conducted on a simulated image is carried out according to the order of the chain. Experimental results on a Synthetic Aperture Radar (SAR) image show that higher-order model can provide more homogeneous segmentations than the classical model, but to the cost of higher memory and computing time requirements.},
  url = {2008_DSP.pdf},
  keywords = {Unsupervised Image segmentation, Higher-order Hidden Markov Chain, Iterative Conditional Estimation, Maximal Posterior Mode.},
  journal = {Digital Signal Processing},
  volume = {18},
  pages = {849-860},
};
  • pdf icon
50 J. Meessen. Interactive classification of visual surveillance scenes. PhD Thesis, University catholique de Louvain Louvain-la-Neuve, Belgium. June 2008.
Abstract: Efficient management of images and video sequences is a key challenge of today, due to the massive accumulation of digital content we face. In particular, content-based classification is a critical problem, which forms the basis of numerous important applications such as annotation and retrieval. It is especially true for video surveillance. In this context, we present a novel approach for interactive retrieval of visual surveillance scenes. The innovation of this work relies in taking real-life constraints into account at all stages of the retrieval process: from content processing to machine learning and human-machine interface. The learning method is formalised as an iterative support vector machine (SVM) classification based on training examples that are progressively provided by human users. Specific challenges were raised by the inherent nature of surveillance video. We tackle them by using a multiple-instance framework as well as by counterbalancing the rarity of the target scenes appropriately. Particular attention is paid on the human users' load, reduced thanks to new strategies of relevance feedback, active learning and results display. We also propose a low-cost method for searching the most appropriate level of SVM regularisation, so as to improve the system performance. Last, we present a flexible system architecture inspired from remote image browsing and a graphical user interface allowing simultaneous visualization of the scenes selected by both the retrieval engine and the active learning. The system has been validated with real life data and has shown excellent retrieval performance.
Bibentry:
@PHDTHESIS{Meessen:2008,
  author = {J. Meessen},
  title = {Interactive classification of visual surveillance scenes},
  year = {2008},
  address = {Louvain-la-Neuve, Belgium},
  month = {June 10},
  note = {dpt:img*grp:mm|vs*prj:irma*lg:en},
  abstract = {Efficient management of images and video sequences is a key challenge of today, due to the massive accumulation of digital content we face. In particular, content-based classification is a critical problem, which forms the basis of numerous important applications such as annotation and retrieval. It is especially true for video surveillance. In this context, we present a novel approach for interactive retrieval of visual surveillance scenes. The innovation of this work relies in taking real-life constraints into account at all stages of the retrieval process: from content processing to machine learning and human-machine interface. The learning method is formalised as an iterative support vector machine (SVM) classification based on training examples that are progressively provided by human users. Specific challenges were raised by the inherent nature of surveillance video. We tackle them by using a multiple-instance framework as well as by counterbalancing the rarity of the target scenes appropriately. Particular attention is paid on the human users' load, reduced thanks to new strategies of relevance feedback, active learning and results display. We also propose a low-cost method for searching the most appropriate level of SVM regularisation, so as to improve the system performance. Last, we present a flexible system architecture inspired from remote image browsing and a graphical user interface allowing simultaneous visualization of the scenes selected by both the retrieval engine and the active learning. The system has been validated with real life data and has shown excellent retrieval performance.},
  url = {2008_PHD_MEESSEN.pdf},
  school = {University catholique de Louvain},
  type = {PhD Thesis},
};

2007

  • pdf icon
49 F.-O. Devaux, J. Meessen, C. Parisot, J.-F. Delaigle, C. De Vleeschouwer and B. Macq. A Flexible Video Transmission System Based on JPEG 2000 Conditional Replenishment with Multiple References. IEEE Int. Conf. on Acoustic, Speech and Signal Processing (ICASSP) . Hawaai, USA. April 2007.
Abstract: The image compression standard JPEG 2000 offers a high compression efficiency as well as a great flexibility in the way it accesses the content in terms of spatial location, quality level, and resolution. This paper explores how transmission systems conveying video surveillance sequences can benefit from this flexibility. Rather than transmitting each frame independently as it is generally done in the literature for JPEG 2000 based systems, we adopt a conditional replenishment scheme to exploit the temporal correlation of the video sequence. As a first contribution, we propose a rate-distortion optimal strategy to select the most pro..table packets to transmit. As a second contribution, we provide the client with two references, the previous reconstructed frame and an estimation of the current scene background, which improves the transmission system performances.

Keywords: ADAPTIVE DELIVERY, INTRA CODING, JPEG 2000, REPLENISHMENT, SEMANTIC BASED CODING.

Bibentry:
@INPROCEEDINGS{Devaux:2007,
  author = {F.-O. Devaux and J. Meessen and C. Parisot and J.-F. Delaigle and C. De Vleeschouwer and B. Macq},
  title = {A Flexible Video Transmission System Based on JPEG 2000 Conditional Replenishment with Multiple References},
  booktitle = {IEEE Int. Conf. on Acoustic, Speech and Signal Processing (ICASSP)},
  year = {2007},
  address = {Hawaai, USA},
  month = {April},
  note = {dpt:img*grp:vs*lg:en*prj:wcam},
  abstract = {The image compression standard JPEG 2000 offers a high compression efficiency as well as a great flexibility in the way it accesses the content in terms of spatial location, quality level, and resolution. This paper explores how transmission systems conveying video surveillance sequences can benefit from this flexibility. Rather than transmitting each frame independently as it is generally done in the literature for JPEG 2000 based systems, we adopt a conditional replenishment scheme to exploit the temporal correlation of the video sequence. As a first contribution, we propose a rate-distortion optimal strategy to select the most pro..table packets to transmit. As a second contribution, we provide the client with two references, the previous reconstructed frame and an estimation of the current scene background, which improves the transmission system performances.},
  url = {2007_IEEE-ICASSP_WCAM.pdf},
  keywords = {JPEG 2000, Intra Coding, Replenishment, Adaptive Delivery, Semantic Based Coding},
};
  • pdf icon
48 X. Desurmont, J. Bruyelle, D. Ruiz, J. Meessen and B. Macq. Real-Time 3D Video Conference On Generic Hardware. Real-Time Image Processing 2007, part of the IS&T SPIE Symp. on Electronic Imaging . San Jose, CA USA. January 2007.
Abstract: Nowadays, video-conference tends to be more and more advantageous because of the economical and ecological cost of transport. Several platforms exist. The goal of the TIFANIS immersive platform is to let users interact as if they were physically together. Unlike previous teleimmersion systems, TIFANIS uses generic hardware to achieve an economically realistic implementation. The basic functions of the system are to capture the scene, transmit it through digital networks to other partners, and then render it according to each partner�s viewing characteristics. The image processing part should run in real-time. We propose to analyze the whole system. it can be split into different services like central processing unit (CPU), graphical rendering, direct memory access (DMA), and communications trough the network. Most of the processing is done by CPU resource. It is composed of the 3D reconstruction and the detection and tracking of faces from the video stream. However, the processing needs to be parallelized in several threads that have as little dependencies as possible. In this paper, we present these issues, and the way we deal with them.

Keywords: 3D, DUAL CORE, REAL-TIME, VIDEO CONFERENCE.

Bibentry:
@INPROCEEDINGS{Desurmont:2007:b,
  author = {X. Desurmont and J. Bruyelle and D. Ruiz and J. Meessen and B. Macq},
  title = {Real-Time 3D Video Conference On Generic Hardware},
  booktitle = {Real-Time Image Processing 2007, part of the IS&T SPIE Symp. on Electronic Imaging},
  year = {2007},
  address = {San Jose, CA USA},
  month = {January 28-30},
  note = {dpt:img*grp:mm*lg:en*prj:tifanis},
  abstract = {Nowadays, video-conference tends to be more and more advantageous because of the economical and ecological cost of transport. Several platforms exist. The goal of the TIFANIS immersive platform is to let users interact as if they were physically together. Unlike previous teleimmersion systems, TIFANIS uses generic hardware to achieve an economically realistic implementation. The basic functions of the system are to capture the scene, transmit it through digital networks to other partners, and then render it according to each partner�s viewing characteristics. The image processing part should run in real-time. We propose to analyze the whole system. it can be split into different services like central processing unit (CPU), graphical rendering, direct memory access (DMA), and communications trough the network. Most of the processing is done by CPU resource. It is composed of the 3D reconstruction and the detection and tracking of faces from the video stream. However, the processing needs to be parallelized in several threads that have as little dependencies as possible. In this paper, we present these issues, and the way we deal with them.},
  url = {2007_SPIE-EI_TIFANIS.pdf},
  keywords = {real-time, 3D, video conference, dual core},
};
  • pdf icon
47 X. Desurmont, J.-B. Hayet, C. Machy, J.-F. Delaigle and B. Macq. On the performance evaluation of tracking systems using multiple pan-tilt-zoom cameras. Videometrics IX, part of the IS&T SPIE Symposium on Electronic Imaging . San Jose, CA USA. January 2007.
Abstract: Object tracking from multiple Pan Tilt Zoom (PTZ) cameras is an important task. This paper deals with the evaluation of the result of such a system. This performance evaluation is conducted by first considering the characterization of the PTZ parameters and then by the trajectories themselves. The camera parameters with be evaluated with the homography errors; the trajectories will be evaluated according to the location and miss-identification errors.

Keywords: METRICS., MULTI-VIEW, PERFORMANCE EVALUATION, PTZ-CAMERAS, TRACKING.

Bibentry:
@INPROCEEDINGS{Desurmont:2007:a,
  author = {X. Desurmont and J.-B. Hayet and C. Machy and J.-F. Delaigle and B. Macq},
  title = {On the performance evaluation of tracking systems using multiple pan-tilt-zoom cameras},
  booktitle = {Videometrics IX, part of the IS&T SPIE Symposium on Electronic Imaging},
  year = {2007},
  address = {San Jose, CA USA},
  month = {January 28-30},
  note = {dpt:img*grp:mm*lg:en*prj:trictrac},
  abstract = {Object tracking from multiple Pan Tilt Zoom (PTZ) cameras is an important task. This paper deals with the evaluation of the result of such a system. This performance evaluation is conducted by first considering the characterization of the PTZ parameters and then by the trajectories themselves. The camera parameters with be evaluated with the homography errors; the trajectories will be evaluated according to the location and miss-identification errors.},
  url = {2007_SPIE-EI_TRICTRAC.pdf},
  keywords = {performance evaluation, tracking, multi-view, ptz-cameras, metrics.},
};
  • pdf icon
46 H. Dupuis. En toute sécurité. Athena May 2007.
Abstract: Le projet européen Serket vise à conférer plus d'intelligence aux systèmes de sécurité et de surveillance. De quoi aider les contrôleurs à prendre les bonnes décisions. La Wallonie est bien représentée dans ce programme notamment grâce aux ingénieurs de la Faculté polytechnique de Mons, de la spin-off Acic et du Centre de recherche Multitel
Bibentry:
@MISC{Dupuis:2007,
  author = {H. Dupuis},
  title = {En toute sécurité},
  year = {2007},
  month = {May},
  note = {dpt:img*grp:vs*lg:fr*prj:serket},
  abstract = {Le projet européen Serket vise à conférer plus d'intelligence aux systèmes de sécurité et de surveillance. De quoi aider les contrôleurs à prendre les bonnes décisions. La Wallonie est bien représentée dans ce programme notamment grâce aux ingénieurs de la Faculté polytechnique de Mons, de la spin-off Acic et du Centre de recherche Multitel},
  url = {2007_ATHENA_SERKET.pdf},
  number = {231},
  howpublished = {Athena},
};
  • pdf icon
45 D. Ruiz, B. Maison, J. Bruyelle, X. Desurmont and B. Macq. A point-based tele-immersion system: from acquisition to stereoscopic display. Proc. of Stereoscopic Displays and Virtual Reality Systems XIV, part of the IS&T SPIE Symp. on Electronic Imaging . San Jose, CA USA. January 2007.
Abstract: We present a point based reconstruction and transmission pipeline for a collaborative tele-immersion system. Two or more users in different locations collaborate with each other in a shared, simulated environment as if they were in the same physical room. Each user perceives point-based models of distant users along with collaborative data like molecule models. Disparity maps, computed by a commercial stereo solution, are filtered and transformed into clouds of 3D points. The clouds are compressed and transmitted over the network to distant users. At the other side the clouds are decompressed and incorporated into the 3D scene. The viewpoint used to display the 3D scene is dependent on the position of the head of the user. Collaborative data is manipulated through natural hand gestures. We analyse the performance of the system in terms of computation time, latency and photo realistic quality of the reconstructed models.

Keywords: HAND GESTURES, POINT-BASED, STEREO, STEREOSCOPIC, TELE-IMMERSION, VISION-BASED.

Bibentry:
@INPROCEEDINGS{Ruiz:2007,
  author = {D. Ruiz and B. Maison and J. Bruyelle and X. Desurmont and B. Macq},
  title = {A point-based tele-immersion system: from acquisition to stereoscopic display},
  booktitle = {Proc. of Stereoscopic Displays and Virtual Reality Systems XIV, part of the IS&T SPIE Symp. on Electronic Imaging},
  year = {2007},
  address = {San Jose, CA USA},
  month = {January 28-30},
  note = {dpt:img*grp:mm*lg:en*prj:tifanis},
  abstract = {We present a point based reconstruction and transmission pipeline for a collaborative tele-immersion system. Two or more users in different locations collaborate with each other in a shared, simulated environment as if they were in the same physical room. Each user perceives point-based models of distant users along with collaborative data like molecule models. Disparity maps, computed by a commercial stereo solution, are filtered and transformed into clouds of 3D points. The clouds are compressed and transmitted over the network to distant users. At the other side the clouds are decompressed and incorporated into the 3D scene. The viewpoint used to display the 3D scene is dependent on the position of the head of the user. Collaborative data is manipulated through natural hand gestures. We analyse the performance of the system in terms of computation time, latency and photo realistic quality of the reconstructed models.},
  url = {2007_SPIE-EI_TIFANIS-2.pdf},
  keywords = {tele-immersion, point-based, stereo, stereoscopic, vision-based, hand gestures},
};
  • pdf icon
44 C. Marchessoux, X. Desurmont, F. Bremond, D. Makris, S. Boughorbel, R. Koeleman, W. Favoreel, C. Machy and E. Jaspers. Performance evaluation of multimedia analysis for surveillance applications. British Machine Vision Association Symp. . London, Great Britain. December 2007.
Abstract: The general goal of the European CANTATA project (Content Aware Networked systems Towards Advanced and Tailored Assistance) is to energize the European industry with respect to the development of multimedia content-aware systems. Three main application domains are studied via three different scenarios in video surveillance, home multimedia and medical applications. Performance evaluation of such systems is a major challenge since many different methods exist and the criteria for evaluation are highly subjective. By combining results and state-of-the-art knowledge from different research communities, the CANTATA project avoids re-development of existing methods or tools and only develops extensions where necessary to propose a single standard validation framework. This paper gives an overview of the state-of-art in performance evaluation and proposed datasets (e.g. ETISO, PETS, iLIDS, trictrac, ovvv, level crossing, Traficon...).
Bibentry:
@INPROCEEDINGS{Marchessoux:2007,
  author = {C. Marchessoux and X. Desurmont and F. Bremond and D. Makris and S. Boughorbel and R. Koeleman and W. Favoreel and C. Machy and E. Jaspers},
  title = {Performance evaluation of multimedia analysis for surveillance applications},
  booktitle = {British Machine Vision Association Symp.},
  year = {2007},
  address = {London, Great Britain},
  month = {December 12},
  note = {dpt:img*grp:vs*lg:en*prj:cantata},
  abstract = {The general goal of the European CANTATA project (Content Aware Networked systems Towards Advanced and Tailored Assistance) is to energize the European industry with respect to the development of multimedia content-aware systems. Three main application domains are studied via three different scenarios in video surveillance, home multimedia and medical applications. Performance evaluation of such systems is a major challenge since many different methods exist and the criteria for evaluation are highly subjective. By combining results and state-of-the-art knowledge from different research communities, the CANTATA project avoids re-development of existing methods or tools and only develops extensions where necessary to propose a single standard validation framework. This paper gives an overview of the state-of-art in performance evaluation and proposed datasets (e.g. ETISO, PETS, iLIDS, trictrac, ovvv, level crossing, Traficon...).},
  url = {2007_MBVA_CANTATA.pdf},
};
  • pdf icon
43 C. Machy, X. Desurmont, J.-F. Delaigle and A. Bastide. Introduction of CCTV at Level Crossings With automatic detection of potentially Dangerous situations. 2nd Selcat Workshop . Marrakech, Morroco. November 2007.
Abstract: While rail remains the safest form of land transport, reports from a road user's point of view are not so conclusive. The levelcrossing accidents contribute to as much as 50 percent of all fatalities caused by railway operations. Through this paper, we aim at introducing a new approach based on video monitoring in order to improve the safety at level crossing. The idea is to automatically detect obstacles such as stopped vehicles on the level crossing when the train is approaching. We provide an overview of the methodology and present results of this algorithm on real sequences.
Bibentry:
@INPROCEEDINGS{Machy:2007,
  author = {C. Machy and X. Desurmont and J.-F. Delaigle and A. Bastide},
  title = {Introduction of CCTV at Level Crossings With automatic detection of potentially Dangerous situations},
  booktitle = {2nd Selcat Workshop},
  year = {2007},
  address = {Marrakech, Morroco},
  month = {November 22-23},
  note = {dpt:img*lg:en*prj:selcat},
  abstract = {While rail remains the safest form of land transport, reports from a road user's point of view are not so conclusive. The levelcrossing accidents contribute to as much as 50 percent of all fatalities caused by railway operations. Through this paper, we aim at introducing a new approach based on video monitoring in order to improve the safety at level crossing. The idea is to automatically detect obstacles such as stopped vehicles on the level crossing when the train is approaching. We provide an overview of the methodology and present results of this algorithm on real sequences.},
  url = {2007_SELCAT.pdf},
};
  • pdf icon
42 N. Lazarevic, C. Machy, L. Khoudour and E.M E.Koursi. An intelligent Level crossing: technical solutions for improved safety and security. XVIIth Int. Scientific Conf. on Transport . Sofia, Bulgaria. November 2007.
Abstract: The level crossings have been identified as particularly weak point in the rail transport infrastructure, seriously affecting the safety of both transport operators and users. In this paper we propose an Intelligent LevelCrossing model with an intention to set a standard for the future research in the area of the levelcrossing safety. Furthermore, we provide an overview and discuss advantages and drawbacks of existing and new technologies most likely to improve safety of the road and rail users at levelcrossings.

Keywords: LEVELCROSSING SAFETY, OBSTACLE DETECTION, SENSING.

Bibentry:
@INPROCEEDINGS{Lazarevic:2007,
  author = {N. Lazarevic and C. Machy and L. Khoudour and E.M E.Koursi},
  title = {An intelligent Level crossing: technical solutions for improved safety and security},
  booktitle = {XVIIth Int. Scientific Conf. on Transport},
  year = {2007},
  address = {Sofia, Bulgaria},
  month = {November 16-17},
  note = {dpt:img*lg:en*prj:selcat},
  abstract = {The level crossings have been identified as particularly weak point in the rail transport infrastructure, seriously affecting the safety of both transport operators and users. In this paper we propose an Intelligent LevelCrossing model with an intention to set a standard for the future research in the area of the levelcrossing safety. Furthermore, we provide an overview and discuss advantages and drawbacks of existing and new technologies most likely to improve safety of the road and rail users at levelcrossings.},
  url = {2007_ISCT_SELCAT.pdf},
  keywords = {levelcrossing safety, sensing, obstacle detection},
};
  • pdf icon
41 J. Meessen, X. Desurmont, C. De Vleeschouwer and J.-F. Delaigle. User-Centric Retrieval of Visual Surveillance Content. Int. Conf. on Semantic and Digital Media Technologies (SAMT) . Genova, Italy. December 2007.
Abstract: An interactive retrieval method adapted to surveillance video is presented. The approach is formulated as an iterative SVM classification and builds upon the two major specificities of the surveillance context, namely the multiple instance nature of the data and the reduced number of training examples the user can provide at each round. The later issue is solved thanks to a new adaptive active learning strategy as well as an intuitive graphical user interface. The system has been validated on both synthetic and real datasets.

Keywords: ACTIVE LEARNING, GUI., MULTIPLE-INSTANCE, RELEVANCE FEEDBACK, SURVEILLANCE VIDEO RETRIEVAL.

Bibentry:
@INPROCEEDINGS{Meessen:2007:a,
  author = {J. Meessen and X. Desurmont and C. De Vleeschouwer and J.-F. Delaigle},
  title = {User-Centric Retrieval of Visual Surveillance Content},
  booktitle = {Int. Conf. on Semantic and Digital Media Technologies (SAMT)},
  year = {2007},
  address = {Genova, Italy},
  month = {December 5-7},
  note = {dpt:img*grp:mm|vs*lg:en*prj:irma},
  abstract = {An interactive retrieval method adapted to surveillance video is presented. The approach is formulated as an iterative SVM classification and builds upon the two major specificities of the surveillance context, namely the multiple instance nature of the data and the reduced number of training examples the user can provide at each round. The later issue is solved thanks to a new adaptive active learning strategy as well as an intuitive graphical user interface. The system has been validated on both synthetic and real datasets.},
  url = {2007_SAMT_IRMA.pdf},
  keywords = {Surveillance video retrieval, multiple-instance, relevance feedback, active learning, GUI.},
};
  • pdf icon
40 B. Lienard, A. Hubaux, C. Carincotte, X. Desurmont and B. Barrie. On the Use of Real-Time Agents in Distributed Video Analysis Systems. Real-Time Image Processing, part of the IS&T SPIE Symposium on Electronic Imaging . San Jose, CA USA. January 2007.
Abstract: Today's technologies in video analysis use state of the art systems and formalisms like onthologies and datawarehousing to handle huge amount of data generated from low-level descriptors to high-level descriptors. In the IST CARETAKER project we develop a multi-dimensional database with distributed features to add a centric data view of the scene shared between all the sensors of a network. We propose to enhance possibilities of this kind of system by delegating the intelligence to a lot of other entities, also known as "Agents" which are specialized little applications, able to walk across the network and work on dedicated sets of data related to their core domain. In other words, we can reduce, or enhance, the complexity of the analysis by adding or not feature specific agents, and processing is limited to the data concerned by the processing. This article explains how to design and develop an agent oriented systems which can be used by a video analysis datawarehousing. We also describe how this methodology can distribute the intelligence over the system, and how the system can be extended to obtain a self reasoning architecture using cooperative agents. We will demonstrate this approach.

Keywords: AGENTS, MIDDLEWARE, REAL-TIME, VIDEO ANALYSIS.

Bibentry:
@INPROCEEDINGS{Lienard:2007,
  author = {B. Lienard and A. Hubaux and C. Carincotte and X. Desurmont and B. Barrie},
  title = {On the Use of Real-Time Agents in Distributed Video Analysis Systems},
  booktitle = {Real-Time Image Processing, part of the IS&T SPIE Symposium on Electronic Imaging},
  year = {2007},
  address = {San Jose, CA USA},
  month = {January 28-30},
  note = {dpt:img*grp:vs*lg:en*prj:caretaker},
  abstract = {Today's technologies in video analysis use state of the art systems and formalisms like onthologies and datawarehousing to handle huge amount of data generated from low-level descriptors to high-level descriptors. In the IST CARETAKER project we develop a multi-dimensional database with distributed features to add a centric data view of the scene shared between all the sensors of a network. We propose to enhance possibilities of this kind of system by delegating the intelligence to a lot of other entities, also known as "Agents" which are specialized little applications, able to walk across the network and work on dedicated sets of data related to their core domain. In other words, we can reduce, or enhance, the complexity of the analysis by adding or not feature specific agents, and processing is limited to the data concerned by the processing. This article explains how to design and develop an agent oriented systems which can be used by a video analysis datawarehousing. We also describe how this methodology can distribute the intelligence over the system, and how the system can be extended to obtain a self reasoning architecture using cooperative agents. We will demonstrate this approach.},
  url = {2007_SPIE-EI_CARETAKER.pdf},
  keywords = {Real-time, Video analysis, Agents, Middleware},
};
  • pdf icon
39 J. Meessen, X. Desurmont, J.-F. Delaigle, C. De Vleeschouwer and B. Macq. Progressive Learning for Interactive Surveillance Scenes Retrieval. 7th Int. Workshop on Visual Surveillance (CVPR-VS) . Minneapolis, USA. June 2007.
Abstract: This paper tackles the challenge of interactively retrieving visual scenes within surveillance sequences acquired with fixed camera. Contrarily to today's solutions, we assume that no a-priori knowledge is available so that the system must progressively learn the target scenes thanks to interactive labelling of a few frames by the user. The proposed method is based on very low-cost features extraction and integrates relevance feedback, multiple-instance SVM classification and active learning. Each of these 3 steps runs iteratively over the session, and takes advantage of the progressively increasing training set. Repeatable experiments on both simulated and real data demonstrate the efficiency of the approach and show how it allows reaching high retrieval performances.
Bibentry:
@INPROCEEDINGS{Meessen:2007:b,
  author = {J. Meessen and X. Desurmont and J.-F. Delaigle and C. De Vleeschouwer and B. Macq},
  title = {Progressive Learning for Interactive Surveillance Scenes Retrieval},
  booktitle = {7th Int. Workshop on Visual Surveillance (CVPR-VS)},
  year = {2007},
  address = {Minneapolis, USA},
  month = {June 22},
  note = {dpt:img*grp:mm|vs*lg:en*prj:irma},
  abstract = {This paper tackles the challenge of interactively retrieving visual scenes within surveillance sequences acquired with fixed camera. Contrarily to today's solutions, we assume that no a-priori knowledge is available so that the system must progressively learn the target scenes thanks to interactive labelling of a few frames by the user. The proposed method is based on very low-cost features extraction and integrates relevance feedback, multiple-instance SVM classification and active learning. Each of these 3 steps runs iteratively over the session, and takes advantage of the progressively increasing training set. Repeatable experiments on both simulated and real data demonstrate the efficiency of the approach and show how it allows reaching high retrieval performances.},
  url = {2007_IEEE-CVPR-VS_IRMA.pdf},
};
  • pdf icon
38 C. Simon, J. Meessen, D. Tzovaras and C. De Vleeschouwer. Using decision trees for knowledge-assisted topologically structured data analysis. Int. Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS) . Santorini, Greece. June 2007.
Abstract: Supervised learning of an ensemble of randomized trees is considered to recognize classes of events in topologically structured data (e.g. images or time series). We are primarily interested in classification problems that are characterized by severe scarcity of the training samples. The main idea of our paper consists in favoring the selection of attributes that are known to efficiently discriminate the minority class in those nodes of the tree that are close to the leaves and where classes are represented by a small number of training examples. In practice, the knowledge about the ability of an attribute to discriminate the classes represented in a particular node is either provided by an expert or inferred based on a pre-analysis the entire initial training set. The experimental validation of our approach considers sign language and human behavior recognition. It reveals that the proposed knowledgeassisted tree induction mechanism efficiently compensates for the shortage of the training samples, and significantly improves the tree classifier accuracy in such scenarios.
Bibentry:
@INPROCEEDINGS{Simon:2007,
  author = {C. Simon and J. Meessen and D. Tzovaras and C. De Vleeschouwer},
  title = {Using decision trees for knowledge-assisted topologically structured data analysis},
  booktitle = {Int. Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS)},
  year = {2007},
  address = {Santorini, Greece},
  month = {June 6-8},
  note = {dpt:img*grp:mm*lg:en*prj:arcade},
  abstract = {Supervised learning of an ensemble of randomized trees is considered to recognize classes of events in topologically structured data (e.g. images or time series). We are primarily interested in classification problems that are characterized by severe scarcity of the training samples. The main idea of our paper consists in favoring the selection of attributes that are known to efficiently discriminate the minority class in those nodes of the tree that are close to the leaves and where classes are represented by a small number of training examples. In practice, the knowledge about the ability of an attribute to discriminate the classes represented in a particular node is either provided by an expert or inferred based on a pre-analysis the entire initial training set. The experimental validation of our approach considers sign language and human behavior recognition. It reveals that the proposed knowledgeassisted tree induction mechanism efficiently compensates for the shortage of the training samples, and significantly improves the tree classifier accuracy in such scenarios.},
  url = {2007_WIAMIS_ARCADE.pdf},
};
  • pdf icon
37 G. Jeney, C. Lamy-Bergot, X. Desurmont, R.L. da Silva, R.A. Garcia-Sanchidrian, M. Bonte, M. Berbineau, M. Csapodi, O. Cantineau and N. Malouch. Communications Challenges in the Celtic-BOSS Project. Next Generation Teletraffic and Wired-Wireless Advanced Networking, Lecture Notes In Computer Science , Vol. 4712 : p. 431-442. August 2007.
Abstract: The BOSS project aims at developing an innovative and bandwidth efficient communication system to transmit large data rate communications between public transport vehicles and the wayside to answer to the increasing need from Public Transport operators for new and-or enhanced on-board functionality and services, such as passenger security and exploitation such as remote diagnostic or predictive maintenance. As a matter of fact, security issues, traditionally covered in stations by means of video-surveillance are clearly lacking on-board trains, due to the absence of efficient transmission means from the train to a supervising control centre. Similarly, diagnostic or maintenance issues are generally handled when the train arrives in stations or during maintenance stops, which prevents proactive actions to be carried out. The aim of the project is to circumvent these limitations and offer a system level solution. This article focuses on the communication system challenges.
Bibentry:
@ARTICLE{Jeney:2007,
  author = {G. Jeney and C. Lamy-Bergot and X. Desurmont and R.L. da Silva and R.A. Garcia-Sanchidrian and M. Bonte and M. Berbineau and M. Csapodi and O. Cantineau and N. Malouch},
  title = {Communications Challenges in the Celtic-BOSS Project},
  year = {2007},
  month = {August},
  note = {dpt:img*lg:en*prj:boss},
  abstract = {The BOSS project aims at developing an innovative and bandwidth efficient communication system to transmit large data rate communications between public transport vehicles and the wayside to answer to the increasing need from Public Transport operators for new and-or enhanced on-board functionality and services, such as passenger security and exploitation such as remote diagnostic or predictive maintenance. As a matter of fact, security issues, traditionally covered in stations by means of video-surveillance are clearly lacking on-board trains, due to the absence of efficient transmission means from the train to a supervising control centre. Similarly, diagnostic or maintenance issues are generally handled when the train arrives in stations or during maintenance stops, which prevents proactive actions to be carried out. The aim of the project is to circumvent these limitations and offer a system level solution. This article focuses on the communication system challenges.},
  url = {2007_LNCS_BOSS.pdf},
  journal = {Next Generation Teletraffic and Wired-Wireless Advanced Networking, Lecture Notes In Computer Science},
  volume = {4712},
  pages = {431-442},
};

2006

  • pdf icon
36 X. Desurmont, A. Bastide, J. Czyz, C. Parisot, J.-F., Delaigle and B. Macq. Chapter 5: A General-Purpose System for Distributed Surveillance and Communication, in Intelligent Distributed Video Surveillance Systems. S.A Velastin and P Remagnino Eds., Institution of Electrical Engineers, ISBN: 0-86341-504-0 . London. April 2006.
Abstract: Video security is becoming more and more important today. CCTV (closed circuit television), after broadcast television, is currently migrating from analogue to digital. At the same time, electronics have rapidly progressed in miniaturizing components and standardization initiatives have became popular in the IT world. Due to these innovations, it is now possible to deploy easily and rapidly CCTV in site for permanent or temporary uses. Examples of challenging surveillance applications are monitoring metro stations, detection of loitering or abandoned objects, etc. This chapter describes a practical implementation of a distributed surveillance system with emphasis on video transmission issues (acquisition, visualisation) and image processing necessary for useful event detection. The requirements for these systems are to be easy to use, robust and flexible. Our goals are to obtain efficiently implemented systems that can meet these strong industrial requirements. A computer cluster based approach with network connections is the innovative solution proposed. The main advantage of this approach is its flexibility. Since mobile objects are important in videosurveillance, these systems will include image analysis tools such as segmentation and object tracking. First we present the typical requirements of such a system besides the typical robustness of the analysis (e.g. low false alarm rate and low missed detection rate). We consider issues like the facility to deploy and administer network-connected real-time multicameras, with reusable modular and generic technologies. Then we analyze how to cope with the needs to integrate a solution with state-of-the-art technologies. As an answer we then propose a global system architecture and we describe its main features to explain each underlying module. To illustrate the applicability of the proposed system architecture in real case studies, we develop some scenarios of deployment for indoors or outdoors applications.
Bibentry:
@INBOOK{Desurmont:2006:e,
  author = {X. Desurmont and A. Bastide and J. Czyz and C. Parisot and J.-F. and Delaigle and B. Macq},
  title = {Chapter 5: A General-Purpose System for Distributed Surveillance and Communication, in Intelligent Distributed Video Surveillance Systems},
  year = {2006},
  address = {London},
  note = {dpt:img*grp:vs*lg:en},
  abstract = {Video security is becoming more and more important today. CCTV (closed circuit television), after broadcast television, is currently migrating from analogue to digital. At the same time, electronics have rapidly progressed in miniaturizing components and standardization initiatives have became popular in the IT world. Due to these innovations, it is now possible to deploy easily and rapidly CCTV in site for permanent or temporary uses. Examples of challenging surveillance applications are monitoring metro stations, detection of loitering or abandoned objects, etc. This chapter describes a practical implementation of a distributed surveillance system with emphasis on video transmission issues (acquisition, visualisation) and image processing necessary for useful event detection. The requirements for these systems are to be easy to use, robust and flexible. Our goals are to obtain efficiently implemented systems that can meet these strong industrial requirements. A computer cluster based approach with network connections is the innovative solution proposed. The main advantage of this approach is its flexibility. Since mobile objects are important in videosurveillance, these systems will include image analysis tools such as segmentation and object tracking. First we present the typical requirements of such a system besides the typical robustness of the analysis (e.g. low false alarm rate and low missed detection rate). We consider issues like the facility to deploy and administer network-connected real-time multicameras, with reusable modular and generic technologies. Then we analyze how to cope with the needs to integrate a solution with state-of-the-art technologies. As an answer we then propose a global system architecture and we describe its main features to explain each underlying module. To illustrate the applicability of the proposed system architecture in real case studies, we develop some scenarios of deployment for indoors or outdoors applications.},
  url = {2006_BOOK_IEE.pdf},
  publisher = {S.A Velastin and P Remagnino Eds., Institution of Electrical Engineers, ISBN: 0-86341-504-0},
  editor = {S.A Velastin and P Remagnino Eds.},
  chapter = {Chapter 5: A General-Purpose System for Distributed Surveillance and Communication},
};
  • pdf icon
35 X. Desurmont, I. Ponte, J. Meessen and J.-F. Delaigle. Nonintrusive viewpoint tracking for 3D for perception in smart video conference. Three-Dimensional Image Capture and Applications VI, part of the IS&T SPIE Symposium on Electronic Imaging . San Jose, CA USA. January 2006.
Abstract: Globalisation of people's interaction in the industrial world and ecological cost of transport make videoconference an interesting solution for collaborative work. However, the lack of immersive perception makes videoconference not appealing. TIFANIS teleimmersion system was conceived to let users interact as if they were physically together. In this paper, we focus on an important feature of the immersive system: the automatic tracking of the user's point of view in order to render correctly in his display the scene from the other site. Viewpoint information has to be computed in a very short time and the detection system should be no intrusive, otherwise it would become cumbersome for the user, i.e. he would lose the feeling of "being there". The viewpoint detection system consists of several modules. First, an analysis module identifies and follows regions of interest (ROI) where faces are detected. We will show the cooperative approach between spatial detection and temporal tracking. Secondly, an eye detector finds the position of the eyes within faces. Then, the 3D positions of the eyes are deduced using stereoscopic images from a binocular camera. Finally, the 3D scene is rendered in realtime according to the new point of view.

Keywords: FACE DETECTION, TELEIMMERSION., VIDEOCONFERENCE, VIEWPOINT TRACKING.

Bibentry:
@INPROCEEDINGS{Desurmont:2006:d,
  author = {X. Desurmont and I. Ponte and J. Meessen and J.-F. Delaigle},
  title = {Nonintrusive viewpoint tracking for 3D for perception in smart video conference},
  booktitle = {Three-Dimensional Image Capture and Applications VI, part of the IS&T SPIE Symposium on Electronic Imaging},
  year = {2006},
  address = {San Jose, CA USA},
  month = {January 16-19},
  note = {dpt:img*lg:en*prj:tifanis},
  abstract = {Globalisation of people's interaction in the industrial world and ecological cost of transport make videoconference an interesting solution for collaborative work. However, the lack of immersive perception makes videoconference not appealing. TIFANIS teleimmersion system was conceived to let users interact as if they were physically together. In this paper, we focus on an important feature of the immersive system: the automatic tracking of the user's point of view in order to render correctly in his display the scene from the other site. Viewpoint information has to be computed in a very short time and the detection system should be no intrusive, otherwise it would become cumbersome for the user, i.e. he would lose the feeling of "being there". The viewpoint detection system consists of several modules. First, an analysis module identifies and follows regions of interest (ROI) where faces are detected. We will show the cooperative approach between spatial detection and temporal tracking. Secondly, an eye detector finds the position of the eyes within faces. Then, the 3D positions of the eyes are deduced using stereoscopic images from a binocular camera. Finally, the 3D scene is rendered in realtime according to the new point of view.},
  url = {2006_SPIE-EI_TIFANIS.pdf},
  keywords = {viewpoint tracking, videoconference, face detection, teleimmersion.},
};
  • pdf icon
34 X. Desurmont, J.-B. Hayet, J.-F. Delaigle, J. Piater and B. Macq. TRICTRAC Video Dataset: Public HDTV Synthetic Soccer Video Sequences With Ground Truth. Workshop on Computer Vision Based Analysis in Sport Environments (CVBASE) . April 2006.
Abstract: Object tracking in video sequences is an important task in many applications such as video surveillance, traffic monitoring, marketing and sport analysis. In order to enhance these technologies, an objective performance evaluation is needed. This evaluation requires to test the system with a given dataset and compare the output with the ground truth. One of the contributions of the TRICTRAC project is the supply to the video processing community of synthetic, high-definition video content of Pan-Tilt-Zoom (PTZ) cameras with 3D ground truth including the parameters of the cameras and the mobile objects. This paper presents this novel dataset.
Bibentry:
@INPROCEEDINGS{Desurmont:2006:c,
  author = {X. Desurmont and J.-B. Hayet and J.-F. Delaigle and J. Piater and B. Macq},
  title = {TRICTRAC Video Dataset: Public HDTV Synthetic Soccer Video Sequences With Ground Truth},
  booktitle = {Workshop on Computer Vision Based Analysis in Sport Environments (CVBASE)},
  year = {2006},
  note = {dpt:img*grp:mm*lg:en*prj:trictrac},
  abstract = {Object tracking in video sequences is an important task in many applications such as video surveillance, traffic monitoring, marketing and sport analysis. In order to enhance these technologies, an objective performance evaluation is needed. This evaluation requires to test the system with a given dataset and compare the output with the ground truth. One of the contributions of the TRICTRAC project is the supply to the video processing community of synthetic, high-definition video content of Pan-Tilt-Zoom (PTZ) cameras with 3D ground truth including the parameters of the cameras and the mobile objects. This paper presents this novel dataset.},
  url = {2006_CVBASE_TRICTRAC.pdf},
};
  • pdf icon
33 C. Carincotte, X. Desurmont, B. Ravera, F. Bremond, J. Orwell, S.A. Velastin, J.-M. Odobez, B. Corbucci, J. Palo and J. Cernocky. Toward generic intelligent knowledge extraction from video and audio: the EU-funded CARETAKER project. Imaging for Crime Detection and Prevention (ICDP), part of The Institution of Engineering and Technology Conf. on Crime and Security . London, UK. June 2006.
Abstract: The CARETAKER project, which is a 30-month project that has just kicked off, aims at studying, developing and assessing multimedia knowledge-based content analysis, knowledge extraction components, and metadata management sub-systems in the context of automated situation awareness, diagnosis and decision support. More precisely, CARETAKER will focus on the extraction of structured knowledge from large multimedia collections recorded over networks of cameras and microphones deployed in real sites. The produced audio-visual streams, in addition to surveillance and safety issues, could represent a useful source of information if stored and automatically analyzed, in urban/environment planning, resource optimization, disabled/elderly person monitoring...

Keywords: CONTENT ANALYSIS/RETRIEVAL, KNOWLEDGE EXTRACTION, MASSIVE RECORDING..

Bibentry:
@INPROCEEDINGS{Carincotte:2006,
  author = {C. Carincotte and X. Desurmont and B. Ravera and F. Bremond and J. Orwell and S.A. Velastin and J.-M. Odobez and B. Corbucci and J. Palo and J. Cernocky},
  title = {Toward generic intelligent knowledge extraction from video and audio: the EU-funded CARETAKER project},
  booktitle = {Imaging for Crime Detection and Prevention (ICDP), part of The Institution of Engineering and Technology Conf. on Crime and Security},
  year = {2006},
  address = {London, UK},
  month = {June 13-14},
  note = {dpt:img*grp:vs*lg:en*prj:caretaker},
  abstract = {The CARETAKER project, which is a 30-month project that has just kicked off, aims at studying, developing and assessing multimedia knowledge-based content analysis, knowledge extraction components, and metadata management sub-systems in the context of automated situation awareness, diagnosis and decision support. More precisely, CARETAKER will focus on the extraction of structured knowledge from large multimedia collections recorded over networks of cameras and microphones deployed in real sites. The produced audio-visual streams, in addition to surveillance and safety issues, could represent a useful source of information if stored and automatically analyzed, in urban/environment planning, resource optimization, disabled/elderly person monitoring...},
  url = {2006_ICDP_CARETAKER.pdf},
  keywords = {Content analysis/retrieval, Knowledge extraction, Massive Recording.},
};
  • pdf icon
32 J. Meessen, L.-Q. Xu and B. Macq. Content Browsing and Semantic Context Viewing through JPEG 2000 Based Scalable Summary. IEE Proc. Vision, Image \& Signal Processing , Vol. 153(3) : p. 274-283. June 2006.
Abstract: This paper presents a novel method and software platform for remote and interactive browsing of a summary of long video sequences as well as revealing the semantic links between shots and scenes in their temporal context. The solution is based on interactive navigation in a scalable mega image resulting from a JPEG 2000 coded key-frame-based video summary. Each key-frame could represent an automatically detected shot, event, or scene, which is then properly annotated using some semi-automatic tools or learning methods. The presented system is compliant with the new JPEG 2000 part 9 "JPIP - JPEG 2000 Interactivity, API and Protocols," which lends itself to working under varying transmission channel conditions such as GPRS or 3G wireless networks. While keeping the advantages of a single 2D video summary, like the limited storage cost, the flexibility offered by JPEG 2000 allows the application to highlight interactively key-frames corresponding to the desired content first within a low quality and low-resolution version of the full video summary. It then offers fine grain scalability for a user to navigate and zoom in to particular scenes or events represented by the key-frames. This possibility of visualising key-frames of interest and playing back the corresponding video shots within the context of the whole sequence (e.g., an episode of a media file) enables the user to understand the temporal relations between semantically related events / actions / physical settings, providing a new way to present and search for contents in video sequences.
Bibentry:
@INPROCEEDINGS{Meessen:2006:b,
  author = {J. Meessen and L.-Q. Xu and B. Macq},
  title = {Content Browsing and Semantic Context Viewing through JPEG 2000 Based Scalable Summary},
  booktitle = {IEE Proc. Vision, Image \& Signal Processing},
  year = {2006},
  month = {June},
  note = {dpt:img*grp:mm*lg:en},
  abstract = {This paper presents a novel method and software platform for remote and interactive browsing of a summary of long video sequences as well as revealing the semantic links between shots and scenes in their temporal context. The solution is based on interactive navigation in a scalable mega image resulting from a JPEG 2000 coded key-frame-based video summary. Each key-frame could represent an automatically detected shot, event, or scene, which is then properly annotated using some semi-automatic tools or learning methods. The presented system is compliant with the new JPEG 2000 part 9 "JPIP - JPEG 2000 Interactivity, API and Protocols," which lends itself to working under varying transmission channel conditions such as GPRS or 3G wireless networks. While keeping the advantages of a single 2D video summary, like the limited storage cost, the flexibility offered by JPEG 2000 allows the application to highlight interactively key-frames corresponding to the desired content first within a low quality and low-resolution version of the full video summary. It then offers fine grain scalability for a user to navigate and zoom in to particular scenes or events represented by the key-frames. This possibility of visualising key-frames of interest and playing back the corresponding video shots within the context of the whole sequence (e.g., an episode of a media file) enables the user to understand the temporal relations between semantically related events / actions / physical settings, providing a new way to present and search for contents in video sequences.},
  url = {2006_IEE-VIS.pdf},
  volume = {153},
  pages = {274-283},
  number = {3},
};
  • pdf icon
31 J. Meessen, M. Coulanges, X. Desurmont and J.-F. Delaigle. Content-Based Retrieval of Video Surveillance Scenes. Multimedia Content Representation, Classification and Security, Lecture Notes in Computer Science , Vol. 4105 : p. 785-792. September 2006.
Abstract: A novel method for content-based retrieval of surveillance video data is presented. The study starts from the realistic assumption that the automatic feature extraction is kept simple, i.e. only segmentation and low-cost filtering operations have been applied. The solution is based on a new and generic dissimilarity measure for discriminating video surveillance scenes. This weighted compound measure can be interactively adapted during a session in order to capture the user's subjectivity. Upon this, a key-frame selection and a content-based retrieval system have been developed and tested on several actual surveillance sequences. Experiments have shown how the proposed method is efficient and robust to segmentation errors.
Bibentry:
@ARTICLE{Meessen:2006:a,
  author = {J. Meessen and M. Coulanges and X. Desurmont and J.-F. Delaigle},
  title = {Content-Based Retrieval of Video Surveillance Scenes},
  year = {2006},
  month = {September},
  note = {dpt:img*grp:mm|vs*lg:en*prj:irma},
  abstract = {A novel method for content-based retrieval of surveillance video data is presented. The study starts from the realistic assumption that the automatic feature extraction is kept simple, i.e. only segmentation and low-cost filtering operations have been applied. The solution is based on a new and generic dissimilarity measure for discriminating video surveillance scenes. This weighted compound measure can be interactively adapted during a session in order to capture the user's subjectivity. Upon this, a key-frame selection and a content-based retrieval system have been developed and tested on several actual surveillance sequences. Experiments have shown how the proposed method is efficient and robust to segmentation errors.},
  url = {2006_MRCS-LNCS_IRMA.pdf},
  journal = {Multimedia Content Representation, Classification and Security, Lecture Notes in Computer Science},
  volume = {4105},
  pages = {785-792},
  editor = {Springer Berlin, Heidelberg},
};
  • pdf icon
30 X. Desurmont, R. Sebbe, F. Martin, C. Machy and J.-F. Delaigle. Performance Evaluation of Frequent Events Detection Systems. IEEE Int. Workshop on Performance Evaluation of Tracking and Surveillance (PETS) . New York, USA. June 2006.
Abstract: In recent years, the demand on video analysis applications such as video surveillance and marketing is growing rapidly. A number of solutions exist but they need to be evaluated. This point is very important for two reasons: the proof of the objective quality of the system for industrials, the possibility to highlight improvement during research and thus to understand better how the system works to improve it adequately. This paper describes a new algorithm that can evaluate a class of detection systems in the case of frequent events; for example, people detection in a corridor or cars on the motorways. To do so, we introduce an automatic re-alignment between results and ground truth by using dynamic programming. The second point of the paper describes, as an example, a practical implementation of a stand alone system that can perform counting of people in a shopping center. Finally we evaluate the performance of this system.

Keywords: ALIGNMENT, COUNTING PEOPLE, DYNAMIC PROGRAMMING., EVENT DETECTION, PERFORMANCE EVALUATION.

Bibentry:
@INPROCEEDINGS{Desurmont:2006:b,
  author = {X. Desurmont and R. Sebbe and F. Martin and C. Machy and J.-F. Delaigle},
  title = {Performance Evaluation of Frequent Events Detection Systems},
  booktitle = {IEEE Int. Workshop on Performance Evaluation of Tracking and Surveillance (PETS)},
  year = {2006},
  address = {New York, USA},
  month = {June 18},
  note = {dpt:img*grp:vs*lg:en*prj:serket},
  abstract = {In recent years, the demand on video analysis applications such as video surveillance and marketing is growing rapidly. A number of solutions exist but they need to be evaluated. This point is very important for two reasons: the proof of the objective quality of the system for industrials, the possibility to highlight improvement during research and thus to understand better how the system works to improve it adequately. This paper describes a new algorithm that can evaluate a class of detection systems in the case of frequent events; for example, people detection in a corridor or cars on the motorways. To do so, we introduce an automatic re-alignment between results and ground truth by using dynamic programming. The second point of the paper describes, as an example, a practical implementation of a stand alone system that can perform counting of people in a shopping center. Finally we evaluate the performance of this system.},
  url = {2006_IEEE-PETS_SERKET.pdf},
  keywords = {Performance evaluation, event detection, counting people, alignment, dynamic programming.},
};
  • pdf icon
29 X. Desurmont, C. Machy, C. Mancas-Thillou, D. Severin and J.-F. Delaigle. Effects of Parameters Variations in Particle Filter Tracking. IEEE Int. Conf. on Image Processing (ICIP) . Atlanta, GA USA. October 2006.
Abstract: Many implementations of visual tracking have been proposed since many years. The lack of standard evaluation process has prevented fair comparison between them. In this paper, we simply propose to evaluate different particle filter methods in people tracking applications. We introduce an objective metric and give results according to different parameter variations. Finally, based on our evaluations, we can propose a new particle filter configuration that outperforms other current implementations.

Keywords: PARTICLE FILTER, PERFORMANCE EVALUATION., TRACKING.

Bibentry:
@INPROCEEDINGS{Desurmont:2006:a,
  author = {X. Desurmont and C. Machy and C. Mancas-Thillou and D. Severin and J.-F. Delaigle},
  title = {Effects of Parameters Variations in Particle Filter Tracking},
  booktitle = {IEEE Int. Conf. on Image Processing (ICIP)},
  year = {2006},
  address = {Atlanta, GA USA},
  month = {October},
  note = {dpt:img*lg:en*prj:candela},
  abstract = {Many implementations of visual tracking have been proposed since many years. The lack of standard evaluation process has prevented fair comparison between them. In this paper, we simply propose to evaluate different particle filter methods in people tracking applications. We introduce an objective metric and give results according to different parameter variations. Finally, based on our evaluations, we can propose a new particle filter configuration that outperforms other current implementations.},
  url = {2006_IEEE-ICIP_CANDELA.pdf},
  keywords = {particle filter, tracking, performance evaluation.},
};
  • pdf icon
28 R. Enficiaud, B. Lienard, N. Allezard, R. Sebbe, S. Beucher, X. Desurmont, P. Sayd and J.-F. Delaigle. Clovis - A generic framework for general purpose visual surveillance applications. IEEE Int. Workshop on Visual Surveillance (VS), held in conjunction with European Conf. on Computer Vision (ECCV) . Graz, Austria. May 2006.
Abstract: Today's video-surveillance software are often based upon monolitic software running on PCs or embedded systems also called intelligent sensors; but, no real interactions exists between elements of a network dedicated to a video-surveillance scenario. The new framework, named "Clovis" (which stands for `Composant LOgiciel pour la VIdeo-Surveillance' - Software Component for Video-Surveillance) proposes, for the video-analysis world, a new approach to develop and deploy modular software for and in those sensors on scalable network. Through three sample applications, we present in this article, how the underlined framework can be used to easily develop software on a stand-alone manner or using distributed computing to enhance video-analysis.

Keywords: ALIGNMENT, COUNTING PEOPLE, DYNAMIC PROGRAMMING., EVENT DETECTION, PERFORMANCE EVALUATION.

Bibentry:
@INPROCEEDINGS{Enficiaud:2006,
  author = {R. Enficiaud and B. Lienard and N. Allezard and R. Sebbe and S. Beucher and X. Desurmont and P. Sayd and J.-F. Delaigle},
  title = {Clovis - A generic framework for general purpose visual surveillance applications},
  booktitle = {IEEE Int. Workshop on Visual Surveillance (VS), held in conjunction with European Conf. on Computer Vision (ECCV)},
  year = {2006},
  address = {Graz, Austria},
  month = {May 7-13},
  note = {dpt:img*grp:vs*lg:en*prj:clovis},
  abstract = {Today's video-surveillance software are often based upon monolitic software running on PCs or embedded systems also called intelligent sensors; but, no real interactions exists between elements of a network dedicated to a video-surveillance scenario. The new framework, named "Clovis" (which stands for `Composant LOgiciel pour la VIdeo-Surveillance' - Software Component for Video-Surveillance) proposes, for the video-analysis world, a new approach to develop and deploy modular software for and in those sensors on scalable network. Through three sample applications, we present in this article, how the underlined framework can be used to easily develop software on a stand-alone manner or using distributed computing to enhance video-analysis.},
  url = {2006_IEEE-VS_CLOVIS.pdf},
  keywords = {Performance evaluation, event detection, counting people, alignment, dynamic programming.},
};
  • pdf icon
27 B. Lienard, X. Desurmont, B. Barrie and J.-F. Delaigle. Real-time high-level video understanding using data warehouse. Real-Time Image Processing III (Invited Paper), part of the IS&T SPIE Symposium on Electronic Imaging . San Jose, CA USA. January 2006.
Abstract: High-level Video content analysis such as video-surveillance is often limited by computational aspects of automatic image understanding, i.e. it requires huge computing resources for reasoning processes like categorization and huge amount of data to represent knowledge of objects, scenarios and other models. This article explains how to design and develop a "near real-time adaptive image datamart", used, as a decisional support system for vision algorithms, and then as a mass storage system. Using RDF specification as storing format of vision algorithms meta-data, we can optimise the data warehouse concepts for video analysis, add some processes able to adapt the current model and pre-process data to speed-up queries. In this way, when new data is sent from a sensor to the data warehouse for long term storage, using remote procedure call embedded in object-oriented interfaces to simplified queries, they are processed and in memory data-model is updated. After some processing, possible interpretations of this data can be returned back to the sensor. To demonstrate this new approach, we will present typical scenarios applied to this architecture such as people tracking and events detection in a multi-camera network. Finally we will show how this system becomes a high-semantic data container for external data-mining.

Keywords: CONTEXT, DATA WAREHOUSE, DATAMART, IMAGE PROCESSING, KNOWLEDGE, RDF, REAL-TIME, UNDERSTANDING.

Bibentry:
@INPROCEEDINGS{Lienard:2006,
  author = {B. Lienard and X. Desurmont and B. Barrie and J.-F. Delaigle},
  title = {Real-time high-level video understanding using data warehouse},
  booktitle = {Real-Time Image Processing III (Invited Paper), part of the IS&T SPIE Symposium on Electronic Imaging},
  year = {2006},
  address = {San Jose, CA USA},
  month = {January 16-19},
  note = {dpt:img*lg:en*prj:clovis},
  abstract = {High-level Video content analysis such as video-surveillance is often limited by computational aspects of automatic image understanding, i.e. it requires huge computing resources for reasoning processes like categorization and huge amount of data to represent knowledge of objects, scenarios and other models. This article explains how to design and develop a "near real-time adaptive image datamart", used, as a decisional support system for vision algorithms, and then as a mass storage system. Using RDF specification as storing format of vision algorithms meta-data, we can optimise the data warehouse concepts for video analysis, add some processes able to adapt the current model and pre-process data to speed-up queries. In this way, when new data is sent from a sensor to the data warehouse for long term storage, using remote procedure call embedded in object-oriented interfaces to simplified queries, they are processed and in memory data-model is updated. After some processing, possible interpretations of this data can be returned back to the sensor. To demonstrate this new approach, we will present typical scenarios applied to this architecture such as people tracking and events detection in a multi-camera network. Finally we will show how this system becomes a high-semantic data container for external data-mining.},
  url = {2006_SPIE-EI_CLOVIS.pdf},
  keywords = {data warehouse, datamart, image processing, context, real-time, RDF, understanding, knowledge},
};

2005

  • pdf icon
26 X. Desurmont, B. Lienard, J.Meessen and J.F. Delaigle. Real-time optimizations for integrated smart network camera. Conf. on Real-Time Imaging IX, part of the IS&T SPIE Symposium on Electronic Imaging . San Jose, CA USA. January 2005.
Abstract: We present an integrated real-time smart network camera. This system is composed of an image sensor, an embedded PC based electronic card for image processing and some network capabilities. The application detects events of interest in visual scenes, highlights alarms and computes statistics. The system also produces meta-data information that could be shared between other cameras in a network. We describe the requirements of such a system and then show how the design of the system is optimized to process and compress video in real-time. Indeed, typical video-surveillance algorithms as background differencing, tracking and event detection should be highly optimized and simplified to be used in this hardware. To have a good adequation between hardware and software in this light embedded system, the software management is written on top of the java based middle-ware specification established by the OSGi alliance. We can integrate easily software and hardware in complex environments thanks to the Java Real-Time specification for the virtual machine and some network and service oriented java specifications (like RMI and Jini). Finally, we will report some outcomes and typical case studies of such a camera like counter-flow detection.

Keywords: CONTEXT AWARE., EMBEDDED SYSTEM, IMAGE PROCESSING, REAL-TIME, SMART NETWORK CAMERA.

Bibentry:
@INPROCEEDINGS{Desurmont:2005:g,
  author = {X. Desurmont, B. Lienard, J.Meessen and J.F. Delaigle},
  title = {Real-time optimizations for integrated smart network camera},
  booktitle = {Conf. on Real-Time Imaging IX, part of the IS&T SPIE Symposium on Electronic Imaging},
  year = {2005},
  address = {San Jose, CA USA},
  month = {January},
  note = {dpt:img*grp:vs*lg:en*prj:clovis},
  abstract = {We present an integrated real-time smart network camera. This system is composed of an image sensor, an embedded PC based electronic card for image processing and some network capabilities. The application detects events of interest in visual scenes, highlights alarms and computes statistics. The system also produces meta-data information that could be shared between other cameras in a network. We describe the requirements of such a system and then show how the design of the system is optimized to process and compress video in real-time. Indeed, typical video-surveillance algorithms as background differencing, tracking and event detection should be highly optimized and simplified to be used in this hardware. To have a good adequation between hardware and software in this light embedded system, the software management is written on top of the java based middle-ware specification established by the OSGi alliance. We can integrate easily software and hardware in complex environments thanks to the Java Real-Time specification for the virtual machine and some network and service oriented java specifications (like RMI and Jini). Finally, we will report some outcomes and typical case studies of such a camera like counter-flow detection.},
  url = {2005_SPIE-EI_CLOVIS.pdf},
  keywords = {real-time, image processing, embedded system, smart network camera, context aware.},
};
  • pdf icon
25 X. Desurmont, R. Wijnhoven, E. Jaspert, O. Caignard, M. Barais, W. Favoreel and J.-F. Delaigle. Performance evaluation of real-time video content analysis systems in the CANDELA project. Conf. on Real-Time Imaging IX, part of the IS&T SPIE Symposium on Electronic Imaging . San Jose, CA USA. January 2005.
Abstract: The CANDELA project aims at realizing a system for real-time image processing in traffic and surveillance applications. The system performs segmentation, labels the extracted blobs and tracks their movements in the scene. Performance evaluation of such a system is a major challenge since no standard methods exist and the criteria for evaluation are highly subjective. This paper proposes a performance evaluation approach for video content analysis (VCA) systems and identifies the involved research areas. For these areas we give an overview of the state-of-the-art in performance evaluation and introduce a classification into different semantic levels. The proposed evaluation approach compares the results of the VCA algorithm with a ground-truth (GT) counterpart, which contains the desired results. Both the VCA results and the ground truth comprise description files that are formatted in MPEG-7. The evaluation is required to provide an objective performance measure and a mean to choose between competitive methods. In addition, it enables algorithm developers to measure the progress of their work at the different levels in the design process. From these requirements and the state-of-the-art overview we conclude that standardization is highly desirable for which many research topics still need to be addressed.

Keywords: COMPUTER VISION, MPEG-7, PERFORMANCE EVALUATION, REAL-TIME PROCESSING, VIDEO CONTENT ANALYSIS..

Bibentry:
@INPROCEEDINGS{Desurmont:2005:f,
  author = {X. Desurmont and R. Wijnhoven and E. Jaspert and O. Caignard and M. Barais and W. Favoreel and J.-F. Delaigle},
  title = {Performance evaluation of real-time video content analysis systems in the CANDELA project},
  booktitle = {Conf. on Real-Time Imaging IX, part of the IS&T SPIE Symposium on Electronic Imaging},
  year = {2005},
  address = {San Jose, CA USA},
  month = {January},
  note = {dpt:img*grp:vs*lg:en*prj:candela},
  abstract = {The CANDELA project aims at realizing a system for real-time image processing in traffic and surveillance applications. The system performs segmentation, labels the extracted blobs and tracks their movements in the scene. Performance evaluation of such a system is a major challenge since no standard methods exist and the criteria for evaluation are highly subjective. This paper proposes a performance evaluation approach for video content analysis (VCA) systems and identifies the involved research areas. For these areas we give an overview of the state-of-the-art in performance evaluation and introduce a classification into different semantic levels. The proposed evaluation approach compares the results of the VCA algorithm with a ground-truth (GT) counterpart, which contains the desired results. Both the VCA results and the ground truth comprise description files that are formatted in MPEG-7. The evaluation is required to provide an objective performance measure and a mean to choose between competitive methods. In addition, it enables algorithm developers to measure the progress of their work at the different levels in the design process. From these requirements and the state-of-the-art overview we conclude that standardization is highly desirable for which many research topics still need to be addressed.},
  url = {2005_SPIE-EI_CANDELA.pdf},
  keywords = {real-time processing, computer vision, performance evaluation, MPEG-7, video content analysis.},
};
  • pdf icon
24 J. Meessen, C. Parisot, C. Le Barz, D. Nicholson and J.-F. Delaigle. Smart Encoding for Wireless Video Surveillance. Conf. on Image and Video Communications and Processing, part of the IS&T SPIE Symposium on Electronic Imaging . San Jose, CA USA. January 2005.
Abstract: In this paper, we present an integrated system for smart encoding in video surveillance. This system, developed within the European IST WCAM project, aims at defining an optimized JPEG 2000 codestream organization directly based on the semantic content of the video surveillance analysis module. The proposed system produces a fully compliant Motion JPEG 2000 stream that contains regions of interest (typically mobile objects) data in a separate layer than regions of less interest (e.g. static background). First the system performs a real-time unsupervised segmentation of mobiles in each frame of the video. The smart encoding module uses these regions of interest maps in order to construct a Motion JPEG 2000 codestream that allows an optimized rendering of the video surveillance stream in low bandwidth wireless applications, allocating more quality to mobiles than for the background. Our integrated system improves the coding representation of the video content without data overhead. It can also be used in applications requiring selective scrambling of regions of interest as well as for any other application dealing with regions of interest.
Bibentry:
@INPROCEEDINGS{Meessen:2005:e,
  author = {J. Meessen and C. Parisot and C. Le Barz and D. Nicholson and J.-F. Delaigle},
  title = {Smart Encoding for Wireless Video Surveillance},
  booktitle = {Conf. on Image and Video Communications and Processing, part of the IS&T SPIE Symposium on Electronic Imaging},
  year = {2005},
  address = {San Jose, CA USA},
  month = {January},
  note = {dpt:img*grp:vs*lg:en*prj:wcam},
  abstract = {In this paper, we present an integrated system for smart encoding in video surveillance. This system, developed within the European IST WCAM project, aims at defining an optimized JPEG 2000 codestream organization directly based on the semantic content of the video surveillance analysis module. The proposed system produces a fully compliant Motion JPEG 2000 stream that contains regions of interest (typically mobile objects) data in a separate layer than regions of less interest (e.g. static background). First the system performs a real-time unsupervised segmentation of mobiles in each frame of the video. The smart encoding module uses these regions of interest maps in order to construct a Motion JPEG 2000 codestream that allows an optimized rendering of the video surveillance stream in low bandwidth wireless applications, allocating more quality to mobiles than for the background. Our integrated system improves the coding representation of the video content without data overhead. It can also be used in applications requiring selective scrambling of regions of interest as well as for any other application dealing with regions of interest.},
  url = {2005_SPIE-EI_WCAM.pdf},
};
  • pdf icon
23 I. Martinez-Ponte, X. Desurmont, J. Meessen and J.-F. Delaigle. Robust human face hiding ensuring privacy. 6th Int. Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS) . Montreux, Switzerland. April 2005.
Abstract: Nowadays, video surveillance of people must ensure privacy. In this paper, we propose a seamless solution to that problem by masking faces in video sequences, which keeps people anonymous. The system consists of two modules. First, an analysis module identifies and follows regions of interest (ROI's) where faces are detected. Second, the JPEG 2000 encoding module compresses the frames keeping the ROI�s in a separate data layer, so that the correct rendering of human faces can be restricted. The analysis module combines two complementary methods: face detection, to locate faces in the image, and tracking, to follow them seamlessly along the time. The fusion of these two methods increases robustness: once a face has already been detected in a frame, tracking may locate it in the consecutive frames, even when a face detection algorithm would not. In addition, detection of faces prevents tracking from loosing its targets. The encoding module downshifts the JPEG 2000 data corresponding to the identified ROI's to the lowest quality layer of the codestream. When the transmission bandwidth is limited, the human faces will then be decoded with a lower visual quality, up to invisibility when required. The proposed solution has been tested on different types of sequences. The results are presented in the paper.
Bibentry:
@INPROCEEDINGS{Martinez-Ponte:2005,
  author = {I. Martinez-Ponte and X. Desurmont and J. Meessen and J.-F. Delaigle},
  title = {Robust human face hiding ensuring privacy},
  booktitle = {6th Int. Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS)},
  year = {2005},
  address = {Montreux, Switzerland},
  month = {April 13-15},
  note = {dpt:img*grp:vs*lg:en},
  abstract = {Nowadays, video surveillance of people must ensure privacy. In this paper, we propose a seamless solution to that problem by masking faces in video sequences, which keeps people anonymous. The system consists of two modules. First, an analysis module identifies and follows regions of interest (ROI's) where faces are detected. Second, the JPEG 2000 encoding module compresses the frames keeping the ROI�s in a separate data layer, so that the correct rendering of human faces can be restricted. The analysis module combines two complementary methods: face detection, to locate faces in the image, and tracking, to follow them seamlessly along the time. The fusion of these two methods increases robustness: once a face has already been detected in a frame, tracking may locate it in the consecutive frames, even when a face detection algorithm would not. In addition, detection of faces prevents tracking from loosing its targets. The encoding module downshifts the JPEG 2000 data corresponding to the identified ROI's to the lowest quality layer of the codestream. When the transmission bandwidth is limited, the human faces will then be decoded with a lower visual quality, up to invisibility when required. The proposed solution has been tested on different types of sequences. The results are presented in the paper.},
  url = {2005_WIAMIS.pdf},
};
  • pdf icon
22 J.-L. Leonard. Multitel, champion de la recherche européenne. Athena December 2005.
Abstract: Pour sa conférence Communicating European Research (Cer 2005) qui s'est tenue les 14 et 15 novembre derniers au Heysel, à Bruxelles, la Commission européenne a sélectionné des laboratoires de recherche parmi les plus performants d'Europe. Objectif: leur donner une visibilité accrue aux yeux des décideurs politiques et des médias. Parmi les heureux élus, figurait le centre wallon Multitel, un modèle en la matière, puisqu'il participe à la bagatelle de vingt projets de recherche européens et qu'il prépare sa contribution à vingt autres. Pas moins
Bibentry:
@MISC{Leonard:2005,
  author = {J.-L. Leonard},
  title = {Multitel, champion de la recherche européenne},
  year = {2005},
  month = {December},
  note = {dpt:img*grp:vs*lg:fr},
  abstract = {Pour sa conférence Communicating European Research (Cer 2005) qui s'est tenue les 14 et 15 novembre derniers au Heysel, à Bruxelles, la Commission européenne a sélectionné des laboratoires de recherche parmi les plus performants d'Europe. Objectif: leur donner une visibilité accrue aux yeux des décideurs politiques et des médias. Parmi les heureux élus, figurait le centre wallon Multitel, un modèle en la matière, puisqu'il participe à la bagatelle de vingt projets de recherche européens et qu'il prépare sa contribution à vingt autres. Pas moins},
  url = {2005_ATHENA.pdf},
  number = {216},
  howpublished = {Athena},
};
  • pdf icon
21 D. Nicholson and J. Meessen. Technologies for multimedia and video surveillance convergence. Conf. on Image and Video Communications and Processing, part of the IS&T SPIE Symposium on Electronic Imaging . San Jose, CA USA. January 2005.
Abstract: In this paper, we present an integrated system for video surveillance developed within the European IST WCAM project, using only standard multimedia and networking tools. The advantages of such a system, while allowing cost reduction and interoperability, is to benefit from the fast technological evolution of the video encoding and distribution tools.
Bibentry:
@INPROCEEDINGS{Nicholson:2005,
  author = {D. Nicholson and J. Meessen},
  title = {Technologies for multimedia and video surveillance convergence},
  booktitle = {Conf. on Image and Video Communications and Processing, part of the IS&T SPIE Symposium on Electronic Imaging},
  year = {2005},
  address = {San Jose, CA USA},
  month = {January},
  note = {dpt:img*grp:vs*lg:en*prj:wcam},
  abstract = {In this paper, we present an integrated system for video surveillance developed within the European IST WCAM project, using only standard multimedia and networking tools. The advantages of such a system, while allowing cost reduction and interoperability, is to benefit from the fast technological evolution of the video encoding and distribution tools.},
  url = {2005_SPIE-EI_WCAM-2.pdf},
};
  • pdf icon
20 J. Meessen, J.-F. Delaigle, L.-Q. Xu and B. Macq. JPEG 2000 Based Scalable Summary for Understanding Long Video Surveillance Sequences. Image and Video Communications and Processing, part of the IS&T SPIE Symposium on Electronic Imaging . San Jose, CA USA. January 2005.
Abstract: This paper presents a new method for remote and interactive browsing of long video surveillance sequences. The solution is based on interactive navigation in JPEG 2000 coded mega-images. We assume that the video "key-frames" are available through automatic detection of scene changes or abnormal behaviors. These key-frames are concatenated in raster scanning order forming a very large 2D image, which is then compressed with JPEG 2000 to produce a scalable video summary of the sequence. We then exploit a mega image navigation platform, designed in full compliance with JPEG 2000 part 9 "JPIP", to search and visualize desirable content, based on client requests. The flexibility offered by JPEG 2000 allows highlighting key-frames corresponding to the required content within a low quality and lowresolution version of the whole summary. Such a fine grain scalability is a unique feature of our proposed JPEG 2000 video summaries expansion. This possibility to visualize key-frames of interests and playback the corresponding video shots within the context of the whole sequence enables the user to understand the temporal relations between semantically similar events. It is then particularly suited to analyzing complex incidents consisting of many successive events spread over a long period.

Keywords: CONTEXTUAL UNDERSTANDING, JPEG 2000, JPIP., SCALABLE SUMMARY, VIDEO BROWSING, VIDEO SURVEILLANCE.

Bibentry:
@INPROCEEDINGS{Meessen:2005:d,
  author = {J. Meessen and J.-F. Delaigle and L.-Q. Xu and B. Macq},
  title = {JPEG 2000 Based Scalable Summary for Understanding Long Video Surveillance Sequences},
  booktitle = {Image and Video Communications and Processing, part of the IS&T SPIE Symposium on Electronic Imaging},
  year = {2005},
  address = {San Jose, CA USA},
  month = {January},
  note = {dpt:img*grp:vs*lg:en*prj:schema},
  abstract = {This paper presents a new method for remote and interactive browsing of long video surveillance sequences. The solution is based on interactive navigation in JPEG 2000 coded mega-images. We assume that the video "key-frames" are available through automatic detection of scene changes or abnormal behaviors. These key-frames are concatenated in raster scanning order forming a very large 2D image, which is then compressed with JPEG 2000 to produce a scalable video summary of the sequence. We then exploit a mega image navigation platform, designed in full compliance with JPEG 2000 part 9 "JPIP", to search and visualize desirable content, based on client requests. The flexibility offered by JPEG 2000 allows highlighting key-frames corresponding to the required content within a low quality and lowresolution version of the whole summary. Such a fine grain scalability is a unique feature of our proposed JPEG 2000 video summaries expansion. This possibility to visualize key-frames of interests and playback the corresponding video shots within the context of the whole sequence enables the user to understand the temporal relations between semantically similar events. It is then particularly suited to analyzing complex incidents consisting of many successive events spread over a long period.},
  url = {2005_SPIE_SCHEMA.pdf},
  keywords = {Video browsing, contextual understanding, scalable summary, video surveillance, JPEG 2000, JPIP.},
};
  • pdf icon
19 J. Meessen, C. Parisot, X. Desurmont and J.-F. Delaigle. Reduction du debit et de la complexite de decodage par l' analyse de scene pour la tele surveillance basee sur Motion JPEG 2000. 20e Colloque GRETSI 2005 sur le traitement du signal et des images . Louvain-la-neuve, Belgium. September 2005.
Abstract: Dans ce papier, nous proposons un système de codage/transmission vidéo orienté objet utilisant le standard Motion JPEG 2000 7 pour le stockage et la diffusion efficaces de vidéo surveillance sur les canaux bas débit. La méthode présente les mêmes avantages que les techniques actuelles de codage de régions d'intérêt tout en améliorant considérablement le rapport débit/qualité moyen de la vidéo transmise, lorsque les caméras sont statiques. Nous proposons ici de transmettre le flux vidéo en transmettant d'une part un flux Motion JPEG 2000 contenant une estimation du fond rafraichie régulièrement et, d'autre part, un second flux Motion JPEG 2000 ne contenant que les objets mobiles de la scène. Cette méthode fournit une meilleure qualité de la vidéo décodée, qui se rapproche de celle obtenue avec un codeur MPEG-4, mais aussi une complexité réduite pour le client par rapport à un encodage Motion JPEG2000 standard, avec un surcout de stockage négligeable. Les flux de surveillance stockés sur le serveur sont complètement conformes au standard Motion JPEG 2000.
Bibentry:
@INPROCEEDINGS{Meessen:2005:a,
  author = {J. Meessen and C. Parisot and X. Desurmont and J.-F. Delaigle},
  title = {Reduction du debit et de la complexite de decodage par l' analyse de scene pour la tele surveillance basee sur Motion JPEG 2000},
  booktitle = {20e Colloque GRETSI 2005 sur le traitement du signal et des images},
  year = {2005},
  address = {Louvain-la-neuve, Belgium},
  month = {September},
  note = {dpt:img*grp:mm|vs*lg:fr*prj:wcam},
  abstract = {Dans ce papier, nous proposons un système de codage/transmission vidéo orienté objet utilisant le standard Motion JPEG 2000 7 pour le stockage et la diffusion efficaces de vidéo surveillance sur les canaux bas débit. La méthode présente les mêmes avantages que les techniques actuelles de codage de régions d'intérêt tout en améliorant considérablement le rapport débit/qualité moyen de la vidéo transmise, lorsque les caméras sont statiques. Nous proposons ici de transmettre le flux vidéo en transmettant d'une part un flux Motion JPEG 2000 contenant une estimation du fond rafraichie régulièrement et, d'autre part, un second flux Motion JPEG 2000 ne contenant que les objets mobiles de la scène. Cette méthode fournit une meilleure qualité de la vidéo décodée, qui se rapproche de celle obtenue avec un codeur MPEG-4, mais aussi une complexité réduite pour le client par rapport à un encodage Motion JPEG2000 standard, avec un surcout de stockage négligeable. Les flux de surveillance stockés sur le serveur sont complètement conformes au standard Motion JPEG 2000.},
  url = {2005_GRETSI_WCAM.pdf},
};
  • pdf icon
18 H. Aiache, V. Conan, G. Guibe, J. Leguay, J.M. Barcelo, L. Cerdaand J. Garcia, R. Knopp, N. Nikaein, X. Gonzalez, A. Zeini, O. Apilo, A. Boukalov, J. Karvo, H. Koskinen, L.R. Bergonzi, C. Le Martret, J. Concejo Diaz, J. Meessen, C. Blondia, P. Decleyn, E. Van de Velde and M. Voorhaen. WIDENS: Advanced Wireless Ad-Hoc Networks for Public Safety. IST Mobile & Wireless Communications Summit (IST Summit) . Dresden, Germany. June 2005.
Abstract: This paper provides an overview of the on-going European Project called WIreless DEployable Network System (WIDENS) which aims at defining a rapidly deployable communication system for public safety or emergency services. In this context, users expect a highly reliable communication system that can support real time applications to allow teams to collaborate in an efficient way. They also want the system to work in a spontaneous fashion and with no pre-installed infrastructures. To fit all the requirements, WIDENS takes advantage of the technology of wireless ad hoc networks to establish high data rate communication links on the fly. In this paper we describe the overall architecture of the WIDENS network and highlight the design of its major components.

Keywords: AD-HOC NETWORKS, PUBLIC SAFETY..

Bibentry:
@INPROCEEDINGS{Aiache:2005,
  author = {H. Aiache and V. Conan and G. Guibe and J. Leguay and J.M. Barcelo and L. Cerdaand J. Garcia and R. Knopp and N. Nikaein and X. Gonzalez and A. Zeini and O. Apilo and A. Boukalov and J. Karvo and H. Koskinen and L.R. Bergonzi and C. Le Martret and J. Concejo Diaz and J. Meessen and C. Blondia and P. Decleyn and E. Van de Velde and M. Voorhaen},
  title = {WIDENS: Advanced Wireless Ad-Hoc Networks for Public Safety},
  booktitle = {IST Mobile & Wireless Communications Summit (IST Summit)},
  year = {2005},
  address = {Dresden, Germany},
  month = {June},
  note = {dpt:img*grp:mm|vs*lg:en*prj:widens},
  abstract = {This paper provides an overview of the on-going European Project called WIreless DEployable Network System (WIDENS) which aims at defining a rapidly deployable communication system for public safety or emergency services. In this context, users expect a highly reliable communication system that can support real time applications to allow teams to collaborate in an efficient way. They also want the system to work in a spontaneous fashion and with no pre-installed infrastructures. To fit all the requirements, WIDENS takes advantage of the technology of wireless ad hoc networks to establish high data rate communication links on the fly. In this paper we describe the overall architecture of the WIDENS network and highlight the design of its major components.},
  url = {2005_IST-SUMMIT_WIDENS.pdf},
  keywords = {Ad-hoc networks, Public Safety.},
};
  • pdf icon
17 J. Meessen, C. Parisot, D. Agrafiotis, C. Le Barz, Y. Sadourny, J.-F. Delaigle, D. Bull and D. Nicholson. WCAM: Content-Based Coding and Securing of Motion JPEG 2000 and H.264 for Wireless Surveillance Video Transmission. Proc. of European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology (EWIMT)  : p. 329-334. London, U.K.. December 2005.
Abstract: This paper presents the integrated method of the IST WCAM project for content-based Motion JPEG 20000 and H.264 video coding. The idea is to link statistical segmentation with the video coders so as to guarantee high visual quality for semantically relevant objects while meeting the wireless bandwidth constraints. In the case of Motion JPEG 2000 coding, the segmentation results are also used for selective encryption. WCAM's contributions to this challenging problem and ongoing work are presented with current results. Possible extensions are discussed.

Keywords: MOTION JPEG 2000, MPEG-4 AVC / H.264, REGION OF INTEREST, SEGMENTATION, SELECTIVE ENCRYPTION.

Bibentry:
@INPROCEEDINGS{Meessen:2005:b,
  author = {J. Meessen and C. Parisot and D. Agrafiotis and C. Le Barz and Y. Sadourny and J.-F. Delaigle and D. Bull and D. Nicholson},
  title = {WCAM: Content-Based Coding and Securing of Motion JPEG 2000 and H.264 for Wireless Surveillance Video Transmission},
  booktitle = {Proc. of European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology (EWIMT)},
  year = {2005},
  address = {London, U.K.},
  month = {December},
  note = {dpt:img*grp:mm|vs*lg:en*prj:wcam},
  abstract = {This paper presents the integrated method of the IST WCAM project for content-based Motion JPEG 20000 and H.264 video coding. The idea is to link statistical segmentation with the video coders so as to guarantee high visual quality for semantically relevant objects while meeting the wireless bandwidth constraints. In the case of Motion JPEG 2000 coding, the segmentation results are also used for selective encryption. WCAM's contributions to this challenging problem and ongoing work are presented with current results. Possible extensions are discussed.},
  url = {2005_EWIMT_WCAM.pdf},
  keywords = {Segmentation, Region of interest, Motion JPEG 2000, MPEG-4 AVC / H.264, Selective encryption},
  pages = {329-334},
};
  • pdf icon
16 X. Desurmont, J. Messen, C. Parisot and J.-F. Delaigle. A step to cognitive vision systems for common videosurveillance. 6th Int. Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS) . Montreux, Switzerland. April 2005.
Abstract: Automated videosurveillance for homeland security is a very hot topic today. Commercial systems are emerging but the configuration setup time and installation cost are still major issues. These systems usually detect unexpected or unauthorised events in a visual scene (e.g. unattended object, human motion) and then trigger alarms or record forensic video data and metadata in tamperproof databases. Currently, the configuration phase requires the intervention of experts for encoding precisely how the system has to work. In our approach we propose new functional capabilities for intelligent videosurveillance systems to learn events of interest from a base of representative examples. Such cognitive vision systems are based on basic capabilities like object detection, characterisation and tracking for lowlevel (image processing) and highlevel (semantic) event recognition. We show how this semantic domain is useful to generalise the recognition. With this goal in mind, a performance evaluation of the system is performed and validated on test sequences including CAVIAR.
Bibentry:
@INPROCEEDINGS{Desurmont:2005:e,
  author = {X. Desurmont and J. Messen and C. Parisot and J.-F. Delaigle},
  title = {A step to cognitive vision systems for common videosurveillance},
  booktitle = {6th Int. Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS)},
  year = {2005},
  address = {Montreux, Switzerland},
  month = {April 13-15},
  note = {dpt:img*grp:vs*lg:en},
  abstract = {Automated videosurveillance for homeland security is a very hot topic today. Commercial systems are emerging but the configuration setup time and installation cost are still major issues. These systems usually detect unexpected or unauthorised events in a visual scene (e.g. unattended object, human motion) and then trigger alarms or record forensic video data and metadata in tamperproof databases. Currently, the configuration phase requires the intervention of experts for encoding precisely how the system has to work. In our approach we propose new functional capabilities for intelligent videosurveillance systems to learn events of interest from a base of representative examples. Such cognitive vision systems are based on basic capabilities like object detection, characterisation and tracking for lowlevel (image processing) and highlevel (semantic) event recognition. We show how this semantic domain is useful to generalise the recognition. With this goal in mind, a performance evaluation of the system is performed and validated on test sequences including CAVIAR.},
  url = {2005_WIAMIS-2.pdf},
};
  • pdf icon
15 J. Meessen, C. Parisot, X. Desurmont and J.-F. Delaigle. Scene Analysis for Reducing Motion JPEG 2000 video Surveillance Delivery Bandwidth and Complexity. IEEE Int. Conf. on Image Processing (ICIP) . Genova, Italy. September 2005.
Abstract: In this paper, we propose a new object-based video coding/transmission system using the emerging Motion JPEG 2000 standard for the efficient storage and delivery of video surveillance over low bandwidth channels. Some recent papers deal with JPEG 2000 coding/transmission based on the Region Of Interest (ROI) feature and the multi-layer capability provided by this coding system [2][3]. Those approaches allow delivering more quality for mobile objects (or ROI) than for the background when bandwidth is too narrow for a sufficient video quality. The method proposed here provides the same features while significantly improving the average bitrate/quality ratio of delivered video when cameras are static. We transmit only ROIs of each frame as well as an automatic estimation of the background at a lower frame rate in two separate Motion JPEG 2000 streams. The frames are then reconstructed at the client side without the need of other external data. Our method provides both better video quality and reduced client CPU usage with negligible storage overhead. Video surveillance streams stored on the server are fully compliant with existing Motion JPEG 2000 decoders.

Keywords: SEGMENTATION; OBJECT-BASED CODING; MOTION JPEG 2000; VIDEO SURVEILLANCE..

Bibentry:
@INPROCEEDINGS{Meessen:2005:c,
  author = {J. Meessen and C. Parisot and X. Desurmont and J.-F. Delaigle},
  title = {Scene Analysis for Reducing Motion JPEG 2000 video Surveillance Delivery Bandwidth and Complexity},
  booktitle = {IEEE Int. Conf. on Image Processing (ICIP)},
  year = {2005},
  address = {Genova, Italy},
  month = {September},
  note = {dpt:img*grp:vs*lg:en*prj:wcam},
  abstract = {In this paper, we propose a new object-based video coding/transmission system using the emerging Motion JPEG 2000 standard for the efficient storage and delivery of video surveillance over low bandwidth channels. Some recent papers deal with JPEG 2000 coding/transmission based on the Region Of Interest (ROI) feature and the multi-layer capability provided by this coding system [2][3]. Those approaches allow delivering more quality for mobile objects (or ROI) than for the background when bandwidth is too narrow for a sufficient video quality. The method proposed here provides the same features while significantly improving the average bitrate/quality ratio of delivered video when cameras are static. We transmit only ROIs of each frame as well as an automatic estimation of the background at a lower frame rate in two separate Motion JPEG 2000 streams. The frames are then reconstructed at the client side without the need of other external data. Our method provides both better video quality and reduced client CPU usage with negligible storage overhead. Video surveillance streams stored on the server are fully compliant with existing Motion JPEG 2000 decoders.},
  url = {2005_IEEE-ICIP_WCAM.pdf},
  keywords = {segmentation; object-based coding; Motion JPEG 2000; video surveillance.},
};
  • pdf icon
14 X. Desurmont, J. Meessen, A. Bastide, C. Parisot and J.-F. Delaigle. Design of vision technology for automatic monitoring of unexpected events. 5th Int. Conf. on Methods and Techniques in Behavioral Research . Wageningen, The Netherlands. January 2005.
Abstract: Due to the emergence of digital standards and systems it is now possible to deploy easily and rapidly vision technology on site for permanent or temporary uses for automatic monitoring of unexpected events. Examples of challenging applications [1,2] are surveillance, traffic monitoring, marketing, etc. These techniques could be also useful for instrumental recording of human or animal behavior for research purpose. This talk describes a practical implementation of a distributed video system with emphasis on video hardware issues like acquisition and image processing necessary for useful event detection. The requirements for these systems are to be easy to use, robust and flexible. Our goals are to obtain efficiently implemented systems that can meet these strong industrial requirements. A computer cluster based approach with network connections is the innovative solution proposed. The main advantage of this approach is its flexibility. Since mobile objects are important in video surveillance, these systems will include image analysis tools such as segmentation and object tracking. First we present the typical requirements of such a system. We consider issues like the facility to deploy network-connected real-time multi-cameras, with reusable modular and generic technologies. Then we analyze how to cope with the needs to integrate a solution with state-of-the-art technologies. As an answer we then propose global system architecture and we describe its main features to explain each underlying module. To illustrate the applicability of the proposed system architecture in real case studies, we show some scenarios of deployment for indoors or outdoors applications.

Keywords: BEHAVIOR, COMPUTER VISION, MULTI-CAMERA, REAL-TIME, TRACKING.

Bibentry:
@INPROCEEDINGS{Desurmont:2005:b,
  author = {X. Desurmont and J. Meessen and A. Bastide and C. Parisot and J.-F. Delaigle},
  title = {Design of vision technology for automatic monitoring of unexpected events},
  booktitle = {5th Int. Conf. on Methods and Techniques in Behavioral Research},
  year = {2005},
  address = {Wageningen, The Netherlands},
  month = {August 30 - September 2},
  note = {dpt:img*grp:vs*lg:en},
  abstract = {Due to the emergence of digital standards and systems it is now possible to deploy easily and rapidly vision technology on site for permanent or temporary uses for automatic monitoring of unexpected events. Examples of challenging applications [1,2] are surveillance, traffic monitoring, marketing, etc. These techniques could be also useful for instrumental recording of human or animal behavior for research purpose. This talk describes a practical implementation of a distributed video system with emphasis on video hardware issues like acquisition and image processing necessary for useful event detection. The requirements for these systems are to be easy to use, robust and flexible. Our goals are to obtain efficiently implemented systems that can meet these strong industrial requirements. A computer cluster based approach with network connections is the innovative solution proposed. The main advantage of this approach is its flexibility. Since mobile objects are important in video surveillance, these systems will include image analysis tools such as segmentation and object tracking. First we present the typical requirements of such a system. We consider issues like the facility to deploy network-connected real-time multi-cameras, with reusable modular and generic technologies. Then we analyze how to cope with the needs to integrate a solution with state-of-the-art technologies. As an answer we then propose global system architecture and we describe its main features to explain each underlying module. To illustrate the applicability of the proposed system architecture in real case studies, we show some scenarios of deployment for indoors or outdoors applications.},
  url = {2005_MTBR.pdf},
  keywords = {Multi-camera, real-time, computer vision, tracking, behavior},
};
  • pdf icon
13 X. Desurmont, A. Bastide, C. Chaudy, C. Parisot, J.-F. Delaigle and B. Macq. Image Analysis Architectures and Techniques for Intelligent Surveillance Systems. Special Issue on Intelligent Distributed Surveillance Systems on the IEE Proc.-Vis. Image & Signal Process. , Vol. 152(2) : p. 224-231. April 2005.
Abstract: Video security is becoming more and more important today, as the number of installed cameras can attest. There are many challenging commercial applications to monitor people or vehicle traffic. The work reported here has both research and commercial motivations. Our goals are first to obtain an efficient intelligent system that can meet strong industrial surveillance system requirements and therefore be real-time, distributed, generic and robust. Our second goal is to have a development platform that allows researchers to conceive and easily test new vision algorithms thanks to its modularity and easy set-up. This paper focuses on the image analysis modules. It considers the different kind of inputs, algorithm models as well as delay and the need of generality.

Keywords: COMPUTER VISION, DISTRIBUTED ARCHITECTURE, INTELLIGENT VISUAL SURVEILLANCE, MULTI-CAMERA, REAL-TIME, TRACKING..

Bibentry:
@ARTICLE{Desurmont:2005:d,
  author = {X. Desurmont and A. Bastide and C. Chaudy and C. Parisot and J.-F. Delaigle and B. Macq},
  title = {Image Analysis Architectures and Techniques for Intelligent Surveillance Systems},
  year = {2005},
  month = {April},
  note = {dpt:img*grp:vs*lg:en},
  abstract = {Video security is becoming more and more important today, as the number of installed cameras can attest. There are many challenging commercial applications to monitor people or vehicle traffic. The work reported here has both research and commercial motivations. Our goals are first to obtain an efficient intelligent system that can meet strong industrial surveillance system requirements and therefore be real-time, distributed, generic and robust. Our second goal is to have a development platform that allows researchers to conceive and easily test new vision algorithms thanks to its modularity and easy set-up. This paper focuses on the image analysis modules. It considers the different kind of inputs, algorithm models as well as delay and the need of generality.},
  url = {2005_IEE-VIS.pdf},
  keywords = {multi-camera, distributed architecture, real-time, computer vision, intelligent visual surveillance, tracking.},
  journal = {Special Issue on Intelligent Distributed Surveillance Systems on the IEE Proc.-Vis. Image & Signal Process.},
  volume = {152},
  pages = {224-231},
  number = {2},
};
  • pdf icon
12 X. Desurmont, C. Chaudy, C. Parisot and J.-F. Delaigle et B. Macq. Techniques de vision par ordinateur pour la video surveillance. ORASIS . Fournol, France. May 2005.
Abstract: La sécurité vidéo devient de plus en plus importante de nos jours ainsi que le nombre de systèmes installés en atteste. Il y a de nombreux défis d'applications commerciales pour surveiller le trafic routier ou même la sécurité des personnes. Le travail présenté est issu de motivations tant au niveau recherche qu'au niveau industriel. En effet, nos buts sont en premier lieu de répondre aux contraintes fortes des systèmes de surveillance industrielle, c'est à dire le fonctionnement temps réel, être distribués, génériques et robustes, mais c'est aussi d'avoir une plate-forme de développement qui permet aux chercheurs d'imaginer et de tester facilement de nouveaux algorithmes grâce à une modularité et un paramétrage facile. Ce papier sera axé sur les modules d'analyses d'images. Nous considérerons les différents types d'entrées, les étapes du traitement et les modèles d'algorithmes.

Keywords: ARCHITECTURE DISTRIBUÉE, MULTI-CAMÉRA, SUIVI, TEMPS-RÉEL, VIDÉOSURVEILLANCE INTELLIGENTE., VISION PAR ORDINATEUR.

Bibentry:
@INPROCEEDINGS{Desurmont:2005:c,
  author = {X. Desurmont and C. Chaudy and C. Parisot and J.-F. Delaigle et B. Macq},
  title = {Techniques de vision par ordinateur pour la video surveillance},
  booktitle = {ORASIS},
  year = {2005},
  address = {Fournol, France},
  month = {May 24-27},
  note = {dpt:img*grp:vs*lg:en},
  abstract = {La sécurité vidéo devient de plus en plus importante de nos jours ainsi que le nombre de systèmes installés en atteste. Il y a de nombreux défis d'applications commerciales pour surveiller le trafic routier ou même la sécurité des personnes. Le travail présenté est issu de motivations tant au niveau recherche qu'au niveau industriel. En effet, nos buts sont en premier lieu de répondre aux contraintes fortes des systèmes de surveillance industrielle, c'est à dire le fonctionnement temps réel, être distribués, génériques et robustes, mais c'est aussi d'avoir une plate-forme de développement qui permet aux chercheurs d'imaginer et de tester facilement de nouveaux algorithmes grâce à une modularité et un paramétrage facile. Ce papier sera axé sur les modules d'analyses d'images. Nous considérerons les différents types d'entrées, les étapes du traitement et les modèles d'algorithmes.},
  url = {2005_ORASIS.pdf},
  keywords = {multi-caméra, architecture distribuée, temps-réel, vision par ordinateur, suivi, vidéosurveillance intelligente.},
};
  • pdf icon
11 E. Jaspers, R. Wijnhoven, R. Albers, X. Desurmont, M. Barais, J. Hamaide and B. Lienard. CANDELA - Storage, Analysis and Retrieval of Video Content in Distributed Systems: Real-time Video Surveillance and Retrieval. Proc. of Int. Conf. on Multimedia and Expo (ICME) . Amsterdam, The Netherlands. July 2005.
Abstract: Although many different types of technologies for information systems have evolved over the last decades (such as databases, video systems, the Internet and mobile telecommunication), the integration of these technologies is just in its infancy and has the potential to introduce "intelligent" systems. This paper describes the novelties of a video content analysis in a surveillance system, demonstrating the benefits for fast retrieval in huge video databases.
Bibentry:
@INPROCEEDINGS{Jaspers:2005:b,
  author = {E. Jaspers and R. Wijnhoven and R. Albers and X. Desurmont and M. Barais and J. Hamaide and B. Lienard},
  title = {CANDELA - Storage, Analysis and Retrieval of Video Content in Distributed Systems: Real-time Video Surveillance and Retrieval},
  booktitle = {Proc. of Int. Conf. on Multimedia and Expo (ICME)},
  year = {2005},
  address = {Amsterdam, The Netherlands},
  month = {July 6-8},
  note = {dpt:img*grp:vs*lg:en*prj:candela},
  abstract = {Although many different types of technologies for information systems have evolved over the last decades (such as databases, video systems, the Internet and mobile telecommunication), the integration of these technologies is just in its infancy and has the potential to introduce "intelligent" systems. This paper describes the novelties of a video content analysis in a surveillance system, demonstrating the benefits for fast retrieval in huge video databases.},
  url = {2005_ICME_CANDELA.pdf},
};
  • pdf icon
10 E. Jaspers, R. Wijnhoven, R. Albers, J. Nesvadba, J. Lukkien, A. Sinitsyn, J. Nisula, X. Desurmont, P. Pieterila, J. Palo and R. Truyen. CANDELA - Storage, Analysis and Retrieval of Video Content in Distributed Systems. Proc. of Int.. Workshop on Adaptive Multimedia Retrieval (AMR) . Glasgow, U.K.. July 2005.
Abstract: Although many different types of technologies for information systems have evolved over the last decades (such as databases, video-systems, the Internet and mobile telecommunication), the integration of these technologies is just in its infancy and has the potential to introduce "intelligent" systems. The CANDELA project, which is part of the European ITEA program, focuses on the integration of video content analysisin combination with networked delivery and storage technologies. To unleash the full potential of such integration, adaptive video-content analysis and retrieval techniques are being explored by developing several pilot applications...
Bibentry:
@INPROCEEDINGS{Jaspers:2005:a,
  author = {E. Jaspers and R. Wijnhoven and R. Albers and J. Nesvadba and J. Lukkien and A. Sinitsyn and J. Nisula and X. Desurmont and P. Pieterila and J. Palo and R. Truyen},
  title = {CANDELA - Storage, Analysis and Retrieval of Video Content in Distributed Systems},
  booktitle = {Proc. of Int.. Workshop on Adaptive Multimedia Retrieval (AMR)},
  year = {2005},
  address = {Glasgow, U.K.},
  month = {July 28-29},
  note = {dpt:img*grp:mm|vs*lg:en*prj:candela},
  abstract = {Although many different types of technologies for information systems have evolved over the last decades (such as databases, video-systems, the Internet and mobile telecommunication), the integration of these technologies is just in its infancy and has the potential to introduce "intelligent" systems. The CANDELA project, which is part of the European ITEA program, focuses on the integration of video content analysisin combination with networked delivery and storage technologies. To unleash the full potential of such integration, adaptive video-content analysis and retrieval techniques are being explored by developing several pilot applications...},
  url = {2005_AMR_CANDELA.pdf},
};

2004

  • pdf icon
9 C. Chaudy, J.-F. Delaigle and B. Macq. Analyse du trafic routier par caméra intelligente: MvTraffic. PROMOPTICA . April 2004.
Abstract: Today, road traffic monitoring is essential for numerous concerns like mobility, safety or environmental issues. We present MvTraffic, an autonomous system for road traffic analysis based on video image processing. This system is able to measure traffic's amount and attributes in real time, in daytime and nighttime conditions. MvTraffic uses a set of complementary techniques to extract from the image flow all the vehicles trajectories. Its limited requirements allow that system to be integrated in an embedded processing unit.
Bibentry:
@INPROCEEDINGS{Chaudy:2004,
  author = {C. Chaudy and J.-F. Delaigle and B. Macq},
  title = {Analyse du trafic routier par caméra intelligente: MvTraffic},
  booktitle = {PROMOPTICA},
  year = {2004},
  note = {dpt:img*grp:vs*lg:fr},
  abstract = {Today, road traffic monitoring is essential for numerous concerns like mobility, safety or environmental issues. We present MvTraffic, an autonomous system for road traffic analysis based on video image processing. This system is able to measure traffic's amount and attributes in real time, in daytime and nighttime conditions. MvTraffic uses a set of complementary techniques to extract from the image flow all the vehicles trajectories. Its limited requirements allow that system to be integrated in an embedded processing unit.},
  url = {2004_PROMOPTICA.pdf},
};
  • pdf icon
8 X. Desurmont, J.-F. Delaigle and B. Macq. Characterisation of geometric distortions attacks in robust Watermarking. Proc. of Security, Steganography, and Watermarking of Multimedia Contents VI, part of the IS&T SPIE Symp. of Electronic imaging . April 2004.
Abstract: Robust image watermarking algorithms have been proposed among others as methods for discouraging illicit copying and distribution of copyright material. Having robustness to pixels modifications in mind, many watermarking designers use techniques coming from the communications domain such as spreading spectrum to embed hidden information, be it in the spatial or in the transform domain. Most of the attacks dedicated to make watermarking algorithms inefficient degrade images through geometric distortions. One solution to counter them is to add synchronization information. In this paper we present an analysis of this type of distortions and we propose a metric to estimate the distortion undergone by an image. This metric is content independent, invariant to global translation, rotation and scaling, which can be considered as non-meaningful transformations. To demonstrate the relevance of this metric, we compare some of its results with the subjective degradation of the image produced by the Stirmark software.

Keywords: ATTACK, GEOMETRIC DISTORTION, HVS, MOS., WATERMARKING.

Bibentry:
@INPROCEEDINGS{Desurmont:2004:d,
  author = {X. Desurmont and J.-F. Delaigle and B. Macq},
  title = {Characterisation of geometric distortions attacks in robust Watermarking},
  booktitle = {Proc. of Security, Steganography, and Watermarking of Multimedia Contents VI, part of the IS&T SPIE Symp. of Electronic imaging},
  year = {2004},
  note = {dpt:img*grp:vs*lg:en},
  abstract = {Robust image watermarking algorithms have been proposed among others as methods for discouraging illicit copying and distribution of copyright material. Having robustness to pixels modifications in mind, many watermarking designers use techniques coming from the communications domain such as spreading spectrum to embed hidden information, be it in the spatial or in the transform domain. Most of the attacks dedicated to make watermarking algorithms inefficient degrade images through geometric distortions. One solution to counter them is to add synchronization information. In this paper we present an analysis of this type of distortions and we propose a metric to estimate the distortion undergone by an image. This metric is content independent, invariant to global translation, rotation and scaling, which can be considered as non-meaningful transformations. To demonstrate the relevance of this metric, we compare some of its results with the subjective degradation of the image produced by the Stirmark software.},
  url = {2004_SPIE-EI.pdf},
  keywords = {watermarking, attack, geometric distortion, HVS, MOS.},
};
  • pdf icon
7 P. Merkus, X. Desurmont, E. Jaspers, R. Wijnhoven, O. Caignart, J.-F. Delaigle and W. Favoreel. CANDELA - Integrated Storage, Analysis and Distribution og Video Content for Intelligent Information System. European Workshop on the Integration of Knowledge, Semantic and Digital Media Technologies (EWIMT) . London, U.K.. November 2004.
Abstract: The introduction of digital video has lead to a wide range of new video applications, including storage for information systems. Even though interactivity enables browsing and instant playback for such systems, the high information density and the large amounts of data result in cumbersome searching to find the information of interest. To solve this problem, the CANDELA project explores the combination of video content analysis, storage and retrieval for distributed systems. The concept of generating high-level content descriptions spans a wide range of new application. In this paper we elaborate on some parts by using the surveillance application as a pilot.
Bibentry:
@INPROCEEDINGS{Merkus:2004,
  author = {P. Merkus and X. Desurmont and E. Jaspers and R. Wijnhoven and O. Caignart and J.-F. Delaigle and W. Favoreel},
  title = {CANDELA - Integrated Storage, Analysis and Distribution og Video Content for Intelligent Information System},
  booktitle = {European Workshop on the Integration of Knowledge, Semantic and Digital Media Technologies (EWIMT)},
  year = {2004},
  address = {London, U.K.},
  month = {November 25-26},
  note = {dpt:img*grp:vs*lg:en*prj:candela},
  abstract = {The introduction of digital video has lead to a wide range of new video applications, including storage for information systems. Even though interactivity enables browsing and instant playback for such systems, the high information density and the large amounts of data result in cumbersome searching to find the information of interest. To solve this problem, the CANDELA project explores the combination of video content analysis, storage and retrieval for distributed systems. The concept of generating high-level content descriptions spans a wide range of new application. In this paper we elaborate on some parts by using the surveillance application as a pilot.},
  url = {2004_EWIMT_CANDELA.pdf},
};
  • pdf icon
6 X. Desurmont, C. Chaudy, A. Bastide, J.-F. Delaigle and B. Macq. A seamless modular image analysis architecture for surveillance systems. IEE Intelligent Distributed Surveillance Systems (IDSS) . London, U.K.. February 2004.
Abstract: Video security is becoming more and more important today, as the number of installed cameras can attest. There are many challenging commercial applications to monitor people or vehicle traffic. The work reported here has both research and commercial motivations. Our goals are first to obtain an efficient intelligent system that can meet strong industrial surveillance system requirements and therefore be real-time, distributed, generic and robust. Our second goal is to have a development platform that allows researchers to imagine and easily test new vision algorithms thanks to its modularity and easy set-up. Previous papers [4,7] dealt with the core architecture for handling such problems as heterogeneous inputs, encoding, distribution, and storage. This paper will focus more precisely on the image analysis modules. We will consider the different kind of inputs, algorithm models as well as optimisation of memory, delay and genericity needs.
Bibentry:
@INPROCEEDINGS{Desurmont:2004:a,
  author = {X. Desurmont and C. Chaudy and A. Bastide and J.-F. Delaigle and B. Macq},
  title = {A seamless modular image analysis architecture for surveillance systems},
  booktitle = {IEE Intelligent Distributed Surveillance Systems (IDSS)},
  year = {2004},
  address = {London, U.K.},
  month = {February 23},
  note = {dpt:img*grp:vs*lg:en},
  abstract = {Video security is becoming more and more important today, as the number of installed cameras can attest. There are many challenging commercial applications to monitor people or vehicle traffic. The work reported here has both research and commercial motivations. Our goals are first to obtain an efficient intelligent system that can meet strong industrial surveillance system requirements and therefore be real-time, distributed, generic and robust. Our second goal is to have a development platform that allows researchers to imagine and easily test new vision algorithms thanks to its modularity and easy set-up. Previous papers [4,7] dealt with the core architecture for handling such problems as heterogeneous inputs, encoding, distribution, and storage. This paper will focus more precisely on the image analysis modules. We will consider the different kind of inputs, algorithm models as well as optimisation of memory, delay and genericity needs.},
  url = {2004_IEE-IDSS.pdf},
};
  • pdf icon
5 X. Desurmont and A. Bastide J.-F. Delaigle. Intelligent video storage of visual evidences on site in fast deployment. XIII Visual Information Processing, part of the SPIE Defense & Security Symp. . April 2004.
Abstract: In this article we present a generic, flexible and robust approach for an intelligent real-time video-surveillance system. A previous version of the system was presented in [1]. The goal of these advanced tools is to provide help to operators by detecting events of interest in visual scenes, highlighting alarms and computing statistics. The proposed system is a multi-camera platform able to handle different standards of video inputs (composite, IP, IEEE1394 ) and which can basically compress (MPEG4), store and display them. This platform also integrates advanced video analysis tools, such as motion detection, segmentation, tracking and interpretation. The design of the architecture is optimized to playback, display, and process video flows in an efficient way for video-surveillance applications. The implementation is distributed on a scalable computer cluster based on Linux and IP network. It relies on POSIX threads for multitasking scheduling. Data flows are transmitted between the different modules using multicast technology and under control of a TCP-based command network (e.g. for bandwidth occupation control). We report here some results and we show the potential use of such a flexible system in third generation video surveillance systems. We illustrate the interest of the system in a real case study, which is the indoor surveillance.

Keywords: COMPUTER VISION, DISTRIBUTED ARCHITECTURE, INTELLIGENT VISUAL SURVEILLANCE, REAL-TIME.

Bibentry:
@INPROCEEDINGS{Desurmont:2004:b,
  author = {X. Desurmont and A. Bastide J.-F. Delaigle},
  title = {Intelligent video storage of visual evidences on site in fast deployment},
  booktitle = {XIII Visual Information Processing, part of the SPIE Defense & Security Symp.},
  year = {2004},
  month = {April},
  note = {dpt:img*grp:vs*lg:en*prj:candela},
  abstract = {In this article we present a generic, flexible and robust approach for an intelligent real-time video-surveillance system. A previous version of the system was presented in [1]. The goal of these advanced tools is to provide help to operators by detecting events of interest in visual scenes, highlighting alarms and computing statistics. The proposed system is a multi-camera platform able to handle different standards of video inputs (composite, IP, IEEE1394 ) and which can basically compress (MPEG4), store and display them. This platform also integrates advanced video analysis tools, such as motion detection, segmentation, tracking and interpretation. The design of the architecture is optimized to playback, display, and process video flows in an efficient way for video-surveillance applications. The implementation is distributed on a scalable computer cluster based on Linux and IP network. It relies on POSIX threads for multitasking scheduling. Data flows are transmitted between the different modules using multicast technology and under control of a TCP-based command network (e.g. for bandwidth occupation control). We report here some results and we show the potential use of such a flexible system in third generation video surveillance systems. We illustrate the interest of the system in a real case study, which is the indoor surveillance.},
  url = {2004_SPIE-DS.pdf},
  keywords = {real-time, intelligent visual surveillance, computer vision, distributed architecture},
};
  • pdf icon
4 X. Desurmont, J.-F., Delaigle, A. Bastide and B. Macq. A generic flexible and robust approach for intelligent real-time video-surveillance systems. Proc. of Real-Time Imaging VIII, IS&T SPIE Symp. of Electronic imaging . April 2004.
Abstract: In this article we present a generic, flexible and robust approach for an intelligent real-time video-surveillance system. A previous version of the system was presented in [1]. The goal of these advanced tools is to provide help to operators by detecting events of interest in visual scenes, highlighting alarms and computing statistics. The proposed system is a multi-camera platform able to handle different standards of video inputs (composite, IP, IEEE1394 ) and which can basically compress (MPEG4), store and display them. This platform also integrates advanced video analysis tools, such as motion detection, segmentation, tracking and interpretation. The design of the architecture is optimized to playback, display, and process video flows in an efficient way for video-surveillance applications. The implementation is distributed on a scalable computer cluster based on Linux and IP network. It relies on POSIX threads for multitasking scheduling. Data flows are transmitted between the different modules using multicast technology and under control of a TCP-based command network (e.g. for bandwidth occupation control). We report here some results and we show the potential use of such a flexible system in third generation video surveillance systems. We illustrate the interest of the system in a real case study, which is the indoor surveillance.

Keywords: COMPUTER VISION, DISTRIBUTED ARCHITECTURE., INTELLIGENT VISUAL SURVEILLANCE, REAL-TIME.

Bibentry:
@INPROCEEDINGS{Desurmont:2004:c,
  author = {X. Desurmont and J.-F. and Delaigle and A. Bastide and B. Macq},
  title = {A generic flexible and robust approach for intelligent real-time video-surveillance systems},
  booktitle = {Proc. of Real-Time Imaging VIII, IS&T SPIE Symp. of Electronic imaging},
  year = {2004},
  note = {dpt:img*grp:vs*lg:en},
  abstract = {In this article we present a generic, flexible and robust approach for an intelligent real-time video-surveillance system. A previous version of the system was presented in [1]. The goal of these advanced tools is to provide help to operators by detecting events of interest in visual scenes, highlighting alarms and computing statistics. The proposed system is a multi-camera platform able to handle different standards of video inputs (composite, IP, IEEE1394 ) and which can basically compress (MPEG4), store and display them. This platform also integrates advanced video analysis tools, such as motion detection, segmentation, tracking and interpretation. The design of the architecture is optimized to playback, display, and process video flows in an efficient way for video-surveillance applications. The implementation is distributed on a scalable computer cluster based on Linux and IP network. It relies on POSIX threads for multitasking scheduling. Data flows are transmitted between the different modules using multicast technology and under control of a TCP-based command network (e.g. for bandwidth occupation control). We report here some results and we show the potential use of such a flexible system in third generation video surveillance systems. We illustrate the interest of the system in a real case study, which is the indoor surveillance.},
  url = {2004_SPIE-EI_2.pdf},
  keywords = {real-time, intelligent visual surveillance, computer vision, distributed architecture.},
};
  • pdf icon
3 J. Meessen, C. Parisot, C. Le Barz, J.-F. Delaigle and D. Nicholson. IST WCAM Project : Smart and Secure Video Coding Based on Content Detection. Proc. of European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology (EWIMT)  : p. 359-366. London, U.K.. November 2004.
Abstract: This paper proposes an integrated solution for smart delivery of video surveillance data. The system is developed within the IST project WCAM "Wireless Cameras and Audio- Visual Seamless Networking" which is presented as well [1]. The main feature of our system is it includes smart video coding based on automatic scene analysis and understanding. Specifically, the segmentation results are used for encoding regions of interest (ROI) in Motion JPEG 2000 guarantying good quality for the semantically relevant objects while keeping a low average data rate. Evaluation of different strategies for JPEG 2000 ROI encoding based on segmentation is presented. We propose a ROI coding strategy optimising the overall quality of the frames while keeping the average data rate low enough for wireless video transmission.
Bibentry:
@INPROCEEDINGS{Meessen:2004,
  author = {J. Meessen and C. Parisot and C. Le Barz and J.-F. Delaigle and D. Nicholson},
  title = {IST WCAM Project : Smart and Secure Video Coding Based on Content Detection},
  booktitle = {Proc. of European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology (EWIMT)},
  year = {2004},
  address = {London, U.K.},
  month = {November},
  note = {dpt:img*grp:vs*lg:en*prj:wcam},
  abstract = {This paper proposes an integrated solution for smart delivery of video surveillance data. The system is developed within the IST project WCAM "Wireless Cameras and Audio- Visual Seamless Networking" which is presented as well [1]. The main feature of our system is it includes smart video coding based on automatic scene analysis and understanding. Specifically, the segmentation results are used for encoding regions of interest (ROI) in Motion JPEG 2000 guarantying good quality for the semantically relevant objects while keeping a low average data rate. Evaluation of different strategies for JPEG 2000 ROI encoding based on segmentation is presented. We propose a ROI coding strategy optimising the overall quality of the frames while keeping the average data rate low enough for wireless video transmission.},
  url = {2004_EWIMT_WCAM.pdf},
  pages = {359-366},
};
  • pdf icon
2 X. Desurmont, A. Bastide, J.-F. Delaigle and B. Macq. A Seamless Modular Approach for Real-Time Video Analysis for Surveillance. 5th Int. Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS) . Lisboa, Portugal. April 2004.
Abstract: In this paper we present a generic, flexible and robust approach for an intelligent real-time visual surveillance system. The proposed architecture integrates advanced video analysis tools, such as segmentation, tracking and events detection. The goal of these advanced tools is to provide help to operators by detecting events of interest in visual scenes. The approach is new because it is a seamless collaboration of many processing techniques from image space to dynamic scene space. We describe thoroughly an important section the vision system and prove its intrinsic modularity. The system is then demonstrated in an antiterrorist scenario of automatic detection of events in public facilities. A performance evaluation of the system is performed.
Bibentry:
@INPROCEEDINGS{Desurmont:2004:e,
  author = {X. Desurmont and A. Bastide and J.-F. Delaigle and B. Macq},
  title = {A Seamless Modular Approach for Real-Time Video Analysis for Surveillance},
  booktitle = {5th Int. Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS)},
  year = {2004},
  address = {Lisboa, Portugal},
  month = {April 21-23},
  note = {dpt:img*grp:vs*lg:en},
  abstract = {In this paper we present a generic, flexible and robust approach for an intelligent real-time visual surveillance system. The proposed architecture integrates advanced video analysis tools, such as segmentation, tracking and events detection. The goal of these advanced tools is to provide help to operators by detecting events of interest in visual scenes. The approach is new because it is a seamless collaboration of many processing techniques from image space to dynamic scene space. We describe thoroughly an important section the vision system and prove its intrinsic modularity. The system is then demonstrated in an antiterrorist scenario of automatic detection of events in public facilities. A performance evaluation of the system is performed.},
  url = {2004_WIAMIS.pdf},
};

2003

  • pdf icon
1 B. Georis, X. Desurmont, D. Demaret, S. Redureau, J.-F. Delaigle and B. Macq. IP-Distributed Computer-Aided Video-Surveillance System. Proc. of IEE Intelligent distributed surveillance systems . February 2003.
Abstract: In this article we present a generic, flexible and robust approach for an intelligent real-time videosurveillance system. The proposed system is a multi-camera platform that is able to handle different standards of video inputs (composite, IP, IEEE1394). The system implementation is distributed over a scalable computer cluster based on Linux and IP network. Data flows are transmitted between the different modules using multicast technology, video flows are compressed with the MPEG4 standard and the flow control is realized through a TCP-based command network (e.g. for bandwidth occupation control). The design of the architecture is optimized to display, compress, store and playback data and video flows in an efficient way. This platform also integrates advanced video analysis tools, such as motion detection, segmentation, tracking and neural networks modules. The goal of these advanced tools is to provide help to operators by detecting events of interest in visual scenes and store them with appropriate descriptions. This indexation process allows one to rapidly browse through huge amounts of stored surveillance data and play back only interesting sequences. We report here some preliminary results and we show the potential use of such a flexible system in third generation video surveillance system. We illustrate the interest of the system in a real case study, which is the surveillance of a reception desk.

Keywords: COMPUTER VISION, DISTRIBUTED ARCHITECTURE, INTELLIGENT STORAGE, INTELLIGENT VISUAL SURVEILLANCE, MULTI-CAMERA, MULTI-THREADING, MULTICAST, REAL-TIME.

Bibentry:
@INPROCEEDINGS{Georis:2003,
  author = {B. Georis and X. Desurmont and D. Demaret and S. Redureau and J.-F. Delaigle and B. Macq},
  title = {IP-Distributed Computer-Aided Video-Surveillance System},
  booktitle = {Proc. of IEE Intelligent distributed surveillance systems},
  year = {2003},
  month = {February 26},
  note = {dpt:img*grp:vs*lg:en},
  abstract = {In this article we present a generic, flexible and robust approach for an intelligent real-time videosurveillance system. The proposed system is a multi-camera platform that is able to handle different standards of video inputs (composite, IP, IEEE1394). The system implementation is distributed over a scalable computer cluster based on Linux and IP network. Data flows are transmitted between the different modules using multicast technology, video flows are compressed with the MPEG4 standard and the flow control is realized through a TCP-based command network (e.g. for bandwidth occupation control). The design of the architecture is optimized to display, compress, store and playback data and video flows in an efficient way. This platform also integrates advanced video analysis tools, such as motion detection, segmentation, tracking and neural networks modules. The goal of these advanced tools is to provide help to operators by detecting events of interest in visual scenes and store them with appropriate descriptions. This indexation process allows one to rapidly browse through huge amounts of stored surveillance data and play back only interesting sequences. We report here some preliminary results and we show the potential use of such a flexible system in third generation video surveillance system. We illustrate the interest of the system in a real case study, which is the surveillance of a reception desk.},
  url = {2003_IEE-IDSS.pdf},
  keywords = {multi-camera, distributed architecture, real-time, multi-threading, multicast, computer vision, intelligent visual surveillance, intelligent storage},
};
Print
rss2_feed