28.07.2013 Views

Project Proposal (PDF) - Oxford Brookes University

Project Proposal (PDF) - Oxford Brookes University

Project Proposal (PDF) - Oxford Brookes University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

FP7-ICT-2011-9 STREP proposal<br />

18/01/12 v1 [Dynact]<br />

Issues with generative modelling. As we also argued above, classical dynamical models can be too<br />

rigid to describe complex activities, or prone to over-fitting the existing training examples. In probabilistic<br />

graphical models, such as (for instance) hidden Markov models, a major cause of overfitting is that they need<br />

to estimate from the training data unique, “precise” probability distributions describing, say, the conditional<br />

probabilities in an MRF, or the transition probabilities between the states of a traditional or a hierarchical<br />

hidden Markov model. To remain in the HMM example, there are efficient ways of dealing with this<br />

estimation problem, involving, respectively, the Expectation-Maximization (EM, [67]) and the Viterbi<br />

algorithm [65]. When little training data are available, the resulting model will depend quite strongly on the<br />

prior assumptions (probabilities) about the behaviour of the dynamical model.<br />

1.2.3 Contributions: Pushing the boundaries in generative and discriminative approaches<br />

From our brief discussion it follows that: on one side, discriminative approaches, thought successful in<br />

limited experiments in controlled environment, need to be extended to include a description of the spatiotemporal<br />

structure of an action, if they are to tackle issues such as action segmentation/localization, multiagent<br />

activities, and the classification of more complex activities. On the other hand, generative graphical<br />

models have attractive features in terms of automatic segmentation, localization and extraction of plots from<br />

videos, but suffer from a tendency to overfit the available, limited training data. On top of that, more<br />

advanced techniques for the classification of generative models are necessary to cope with inherent<br />

variability and presence of covariates.<br />

With this project we propose to break new ground in all these respects, with significant impact on the real<br />

world deployment of action recognition tools, by designing novel modelling techniques (both generative and<br />

discriminative) able to incorporate the spatio-temporal structure of the data, while allowing for the necessary<br />

flexibility induced by a generally limited amount of training information.<br />

Introducing structure in discriminative models. In the field of discriminative modelling, we plan<br />

to build on recent progresses on the use of part-based discriminative models for the detection of 2D objects<br />

in cluttered images. If we think of actions (and even more so for complex activities) as spatio-temporal<br />

“objects”, composed by distinct but coordinated “parts” (elementary motions, simple actions), the notion of<br />

generalizing part based models originally designed for 2D object detection to actions becomes natural and<br />

appealing. In particular, as it is the case for objects, discriminative action parts can be learned in the<br />

framework of Multiple Instance Learning (MIL) [126, 127]. Consider a one-versus-all classification<br />

problem. In MIL, a discriminative model is learned starting from a bag of negative (of the wrong class)<br />

examples, and a bag of examples some of which are positive (of the right class) and some are negative (but<br />

we do not know which ones). Think, in our case, of all possible spatio-temporal sub-volumes in a video<br />

sequence within which, we know, a positive example of a certain action category is indeed present (but we<br />

do not know where). An initial “positive” model is learned by assuming that all examples in the positive bag<br />

are indeed positive (all sub-volumes of the sequence do contain the action at hand), while a negative one is<br />

learned from the examples in the negative bag (videos labelled with a different action category). Initial<br />

models are updated in an iterative process: eventually, only the most discriminative examples in the initial,<br />

positive bag are retained as positive. A flexible constellation of the most discriminative “action parts” can<br />

then be built from the training data to take into account the spatio-temporal structure of the activity at hand.<br />

Such an approach builds on the already significant results of discriminative models, but addresses at the<br />

same time several of the challenges we isolated in our analysis of the state of the art: 1 – complex activities<br />

can be learned and discriminated; 2 – localization in both space and time becomes an integral part of the<br />

recognition process; 3 – multi-agent action recognition now becomes standard practise as the presence of<br />

more than one action is assumed by default.<br />

Move towards imprecise-probabilistic generative models. As for generative modelling,<br />

addressing the issues of inherent variability (which causes data overfitting) and influence of the covariate<br />

factors (which make rigid classification techniques inadequate) requires, on one side, to move beyond<br />

classical, “precise” graphical models; on the other, to develop a theory of classification for generative models<br />

which allows for robustness and flexibility. We have seen above that classical graphical models require to<br />

estimate a number of probability distributions from the training data. If the training data form a small subset<br />

of the whole universe of examples (as it is always the case), the constraint of having to determine single<br />

probabilities necessarily leads to overfitting.<br />

In opposition, imprecise-probabilistic models replace such single (“precise”) probability distributions by<br />

whole convex closed sets of them, or “credal sets” [81]. Graphical models which handle credal sets, or<br />

“credal networks” [66], are a promising way of solving the overfitting problem, as they allow the actual<br />

evidence provided by a necessarily limited training set to determine only a set of linear constraints on the<br />

<strong>Proposal</strong> Part B: page [10] of [67]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!