Outline Proposal - Oxford Brookes University

More documents

Recommendations

Info

Request 69009 Page 6 of 11 To quantify the respective algorithm performance, we report the accuracy (Acc), mean average precision (mAP), and mean F1 scores (mF1). (2) Recognition performance achieved so far: Figure 4. Left: preliminary localisation results (left) on a Hollywood2 video [45]. The colour of each box (subvolume) indicates the positive rank score of it belonging to the action class (red = high). In actioncliptest00058, a woman gets out of her car roughly around the middle of the video, as indicated by the detected subvolumes. Right: performance of MIL discriminative modelling (Step 3) with Dense Trajectory Features as features on the most common datasets, compared to the traditional BoF baseline. Even when using traditional feature, learning the most discriminative action parts via MIL much improve performance on challenging testbeds. Figure 5: performance of BoF global models with Fisher representation (Step 2) on the most common datasets, compared to the State of the Art. Note how accuracy and average precision (recognition rate) dramatically improve w.r.t. to previous approaches. (3) Latency to recognize specific human activity (how many seconds after the occurrence or specific human activities, can the activities be recognized by the algorithm): ( ~2 ) sec for recognition on the KTH dataset: as features are computed from volumes frameper-seconds do not make much sense in our approach: anyway, the frame rate in all sequences is around 30fps. Computing the classification scores for 60,000 testing video instances (each 1000dim) on the KTH dataset takes 0.5 seconds on a standard laptop: this does not include feature computation and representation times, which can vary largely depending on choice of features, representation, classification methods, and pc hardware. (4) Possibility to predict the occurrence of specific human activities (please select a relevant one) Possible 23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com
Request 69009 Page 7 of 11 Though no such tests have been conducted yet, an online version of the algorithm can be imagined in which the recognition of elementary actions which can be part of a more complex activity is used to predict the likelihood of the latter happening before it actually takes place. (5) Necessary resolution of target objects in the image for proper analysis/recognition ( 360 ) x ( 240 ) pixel in most cases (see above datasets descriptions); however, subsampling is normally used to reduce computational time, so that actual videos have even lower dimensionality. (6) Illumination on target objects for proper analysis/recognition (qualitative description is all right, if it is hard to quantify illumination level, e.g., can be used not only in bright but in somewhat dark rooms) As so far we have used state of the art benchmarks, rather than in-house datasets, it is hard to answer quantitatively this particular question. However, all the most recent tested datasets (Hollywood, Hollywood 2, YouTube, HMDB51) contain videos characterized by widely varying degrees of illumination: indeed,a feature of our approach is to build models invariant to nuisance factors such as illumination. Please have a look at the relevant web pages for a closer inspection: http://www.di.ens.fr/~laptev/actions/hollywood2/ http://www.cs.ucf.edu/~liujg/YouTube_Action_dataset.html Especially HMDB http://serre-lab.clps.brown.edu/resources/HMDB/ contains very dark as well as very bright sequences. Development plan for establishing recognition technology for any of the followings or alike (Getting out of or falling off a bed / falling down / breathing / becoming feverish / having a fit or cough / having convulsions / experiencing pain / choking or difficulty swallowing )) Human activites already tackled (if recognition of any of the above human activities have been already realized, please indicate them, or briefly describe them): The activities specified above have not explicitely tackled to date, as we focussed so far on the most common publicly available datasets. However, as our approach is inherently flexible (part-based) and learns to discriminate from a training set we do not foresee difficulties in tackling a different set of action classes. Development plan and challenges to be overcome if recognition of any of abovementioned human activites will be newly tried Datasets focussed on the activities of interest to you are being search by our PhD student Sapienza. In case the search's outcome is not positive, we propose to collect our own testbed via both traditional and range cameras (e.g. the Kinect device) in our possession. Range cameras are particularly attractive for indoor scenarios, and two separate efforts to perform gesture/exercise recognition via Kinect for clinical purposes (which led to a pending NIHR i4i grant application) and for exercising (by a Msc student) are being pursued by as at the current time. Pose estimation algos from range data already exist, though more customized solutions can be studied. After mapping activities to series of human poses the above techniques can be applied to recognize them. Understanding in privacy issues in installing camera based surveillance system for monitoring people, or a network with organizations having such capability, if any 23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com
Page 1 and 2: Request Title: NineSigma Point of C
Page 3 and 4: Request 69009 Page 3 of 11 Title of
Page 5: Request 69009 Page 5 of 11 efficien
Page 9 and 10: Please include the following if app
Page 11: Request 69009 Page 11 of 11 Our Cli

Outline Proposal - Oxford Brookes University

Create successful ePaper yourself

Delete template?

Save as template?