Project Proposal (PDF) - Oxford Brookes University
Project Proposal (PDF) - Oxford Brookes University
Project Proposal (PDF) - Oxford Brookes University
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
FP7-ICT-2011-9 STREP proposal<br />
18/01/12 v1 [Dynact]<br />
Issues with generative modelling. As we also argued above, classical dynamical models can be too<br />
rigid to describe complex activities, or prone to over-fitting the existing training examples. In probabilistic<br />
graphical models, such as (for instance) hidden Markov models, a major cause of overfitting is that they need<br />
to estimate from the training data unique, “precise” probability distributions describing, say, the conditional<br />
probabilities in an MRF, or the transition probabilities between the states of a traditional or a hierarchical<br />
hidden Markov model. To remain in the HMM example, there are efficient ways of dealing with this<br />
estimation problem, involving, respectively, the Expectation-Maximization (EM, [67]) and the Viterbi<br />
algorithm [65]. When little training data are available, the resulting model will depend quite strongly on the<br />
prior assumptions (probabilities) about the behaviour of the dynamical model.<br />
1.2.3 Contributions: Pushing the boundaries in generative and discriminative approaches<br />
From our brief discussion it follows that: on one side, discriminative approaches, thought successful in<br />
limited experiments in controlled environment, need to be extended to include a description of the spatiotemporal<br />
structure of an action, if they are to tackle issues such as action segmentation/localization, multiagent<br />
activities, and the classification of more complex activities. On the other hand, generative graphical<br />
models have attractive features in terms of automatic segmentation, localization and extraction of plots from<br />
videos, but suffer from a tendency to overfit the available, limited training data. On top of that, more<br />
advanced techniques for the classification of generative models are necessary to cope with inherent<br />
variability and presence of covariates.<br />
With this project we propose to break new ground in all these respects, with significant impact on the real<br />
world deployment of action recognition tools, by designing novel modelling techniques (both generative and<br />
discriminative) able to incorporate the spatio-temporal structure of the data, while allowing for the necessary<br />
flexibility induced by a generally limited amount of training information.<br />
Introducing structure in discriminative models. In the field of discriminative modelling, we plan<br />
to build on recent progresses on the use of part-based discriminative models for the detection of 2D objects<br />
in cluttered images. If we think of actions (and even more so for complex activities) as spatio-temporal<br />
“objects”, composed by distinct but coordinated “parts” (elementary motions, simple actions), the notion of<br />
generalizing part based models originally designed for 2D object detection to actions becomes natural and<br />
appealing. In particular, as it is the case for objects, discriminative action parts can be learned in the<br />
framework of Multiple Instance Learning (MIL) [126, 127]. Consider a one-versus-all classification<br />
problem. In MIL, a discriminative model is learned starting from a bag of negative (of the wrong class)<br />
examples, and a bag of examples some of which are positive (of the right class) and some are negative (but<br />
we do not know which ones). Think, in our case, of all possible spatio-temporal sub-volumes in a video<br />
sequence within which, we know, a positive example of a certain action category is indeed present (but we<br />
do not know where). An initial “positive” model is learned by assuming that all examples in the positive bag<br />
are indeed positive (all sub-volumes of the sequence do contain the action at hand), while a negative one is<br />
learned from the examples in the negative bag (videos labelled with a different action category). Initial<br />
models are updated in an iterative process: eventually, only the most discriminative examples in the initial,<br />
positive bag are retained as positive. A flexible constellation of the most discriminative “action parts” can<br />
then be built from the training data to take into account the spatio-temporal structure of the activity at hand.<br />
Such an approach builds on the already significant results of discriminative models, but addresses at the<br />
same time several of the challenges we isolated in our analysis of the state of the art: 1 – complex activities<br />
can be learned and discriminated; 2 – localization in both space and time becomes an integral part of the<br />
recognition process; 3 – multi-agent action recognition now becomes standard practise as the presence of<br />
more than one action is assumed by default.<br />
Move towards imprecise-probabilistic generative models. As for generative modelling,<br />
addressing the issues of inherent variability (which causes data overfitting) and influence of the covariate<br />
factors (which make rigid classification techniques inadequate) requires, on one side, to move beyond<br />
classical, “precise” graphical models; on the other, to develop a theory of classification for generative models<br />
which allows for robustness and flexibility. We have seen above that classical graphical models require to<br />
estimate a number of probability distributions from the training data. If the training data form a small subset<br />
of the whole universe of examples (as it is always the case), the constraint of having to determine single<br />
probabilities necessarily leads to overfitting.<br />
In opposition, imprecise-probabilistic models replace such single (“precise”) probability distributions by<br />
whole convex closed sets of them, or “credal sets” [81]. Graphical models which handle credal sets, or<br />
“credal networks” [66], are a promising way of solving the overfitting problem, as they allow the actual<br />
evidence provided by a necessarily limited training set to determine only a set of linear constraints on the<br />
<strong>Proposal</strong> Part B: page [10] of [67]