Project Proposal (PDF) - Oxford Brookes University

More documents

Recommendations

Info

FP7-ICT-2011-9 STREP proposal 18/01/12 v1 [Dynact] Work package description Work package number 3 Start date or starting event: Month 1 Work package title Dynamical discriminative modeling Activity type 14 RTD Participant number 1 Participant short name OBU Person-months per participant 39 Objectives The goal of this WP is to develop a novel framework of discriminative models able to explicitly capture the spatio-temporal structure of human motions. This involves: – identifying the most discriminative parts of an action or activity via Multiple Instance Learning (MIL) from a weakly labeled training set; – learning and classifying pictorial structure models representing constellations of elementary, most discriminative parts. Description of work The work can be articulated into the following tasks: Task 3.1 – Multiple instance learning of discriminative action parts. Let an action-part be defined as a Bag of Features model bounded in a space-time cube. The task here is to learn models for a set of most discriminative action parts, given a training set of video sequences in which an action class is assigned to each video clip, assuming that one action occurs in each clip. This is a weakly labeled scenario, where it is known that a positive example of the action exists within the clip, but the exact location of the action is unknown. The learning task can then be cast in a Multiple Instance Learning (MIL) framework, in which the training set consists of a set of “bags” (the training sequences), containing a number of BoF models (in our case SVM classifiers learnt for each sub-volume of the spatio-temporal sequence), and the corresponding ground truth class labels. Task 3.2 – Learning and classifying structured discriminative models of actions. Once the most discriminative action parts are learnt by applying MIL to the “bags” corresponding to the given training spatio-temporal sequences, we can construct tree-like ensembles of action parts which can be later used for both localizing and efficiently classifying complex activities. A cost function can be defined as a function of both the appearance models of the individual parts and of the relative positions between pairs of action parts, whose maximization yields the best action configuration. The problem can be efficiently solved by dynamic programming. Deliverables D3.1 A scientific report on the Multiple Instance Learning framework for the detection of the most discriminative action parts (month 16). D3.2 The efficient algorithmic implementation of the MIL framework as a software prototype (month 18). D3.3 A scientific report detailing the approach based on the optimization of constellations of elementary actions for the localization and classification of complex activities (month 24). D3.4 The efficient implementation of our novel framework for “structural” discriminative action modelling, localization and classification as a software prototype (month 24). 14 Please indicate one activity per work package: RTD = Research and technological development; DEM = Demonstration; MGT = Management of the consortium. <strong>Proposal</strong> Part B: page [30] of [67]
FP7-ICT-2011-9 STREP proposal 18/01/12 v1 [Dynact] Work package description Work package number 4 Start date or starting event: Month 1 Work package title Data collection and feature extraction Activity type 15 RTD Participant number 1 4 5 Participant short name OBU SUP DYN Person-months per participant 18 18 6 Objectives The goal of this WP is to develop the infrastructure for the validation of the methodological breakthroughs developed in WP1-3 on the scenarios we isolated in WP5. To this purpose it is necessary to: – define and acquire novel test-beds in multiple modalities: synchronised video sequences from multiple cameras (stereo), monocular videos, and range camera sequences; we will focus on complex activities and multiple actions taking place at different locations in the same video; – create and manage a data repository for dissemination among the partners; – design and implement the selection and extraction of salient pieces of information (features) in all the three modalities; features are later fed to parameter estimation algorithms (imprecise EM, WPs 1,2) or discriminative models (WP3) to locate and classify actions in video sequences. State-of-the-art feature selection algorithms are designed, in both 2D (e.g. dense trajectories) and 3D. Description of work Task 4.1 – Novel benchmark datasets gathering. Video sequence test-beds depicting multiple actors performing different actions and gestures at different spatio-temporal locations, with or without missing data and occlusions, under different environmental conditions, in different scenarios (human computer interaction, smart room, surveillance, identity recognition) are collected in three different modalities: stereo, monocular video, and range cameras. A data repository is created to store and efficiently share the gathered test-bed data among all participants, and eventually the public via a secure web site, or an intranet section of the project website developed for wider dissemination purposes. Task 4.2 – Discriminative feature selection and extraction algorithms are designed starting from stateof-the-art approaches, such as dense trajectories in spatio-temporal volumes. Similar automatic feature selection algorithms are specifically designed for range data (depth maps). In this case reliable silhouettes or rough 3D volumetric representations of the moving person/body-part can be extracted. Unsupervised techniques for tracking volumetric features in an embedding space are also explored. Active Appearance Models of realistic animations are built via integration of range data by genetic algorithms. Deliverables D4.1 A demonstrator including a corpus of video sequences acquired from multiple synchronised cameras, depicting multiple actors performing different actions/activities in different S/T locations in different scenarios (human computer interaction, smart room, surveillance, identity recognition) (month 12). D4.2 A demonstrator including a corpus of range data sequences, also picturing different people performing different gestures or facial expressions at the same time, mostly in indoor scenarios (virtual animations, gaming) due to inherent limitations of range cameras (month 12). D4.3 Shared data repository storing the gathered test-bed data in the three selected modalities, to be later made available to the wider academic community and the public (month 12). D4.4 A library of routines for automatic robust discriminative feature extraction from monocular videos and synchronised 2D frames (month 15). D4.5 A library of routines for automatic robust discriminative feature extraction from range data (sequences of depth maps) (month 18). 15 Please indicate one activity per work package: RTD = Research and technological development; DEM = Demonstration; MGT = Management of the consortium. <strong>Proposal</strong> Part B: page [31] of [67]
Page 1 and 2: FP7-ICT-2011-9 STREP proposal 18/01
Page 29: FP7-ICT-2011-9 STREP proposal 18/01
Page 67: FP7-ICT-2011-9 STREP proposal 18/01

Project Proposal (PDF) - Oxford Brookes University

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?