Outline Proposal - Oxford Brookes University
Outline Proposal - Oxford Brookes University
Outline Proposal - Oxford Brookes University
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Request Title:<br />
NineSigma Point of Contact:<br />
PROPOSAL FOR REQUEST #69009<br />
Video Analysis/Recognition of Human Activities<br />
S. Akutagawa<br />
Submission Date: 28 January, 2013<br />
Contact Information<br />
• Name of organization: <strong>Oxford</strong> <strong>Brookes</strong> <strong>University</strong><br />
• Name of proposer(s): Dr Fabio Cuzzolin<br />
• Address Department of Computing and Communication<br />
Technologies, Wheatley campus<br />
• City, State, Zip: <strong>Oxford</strong>, OX33 1HX<br />
• Country: United Kingdom<br />
• Phone: +44 1865 484526<br />
• Email: fabio.cuzzolin@brookes.ac.uk<br />
• Direct Web Page Link: http://cms.brookes.ac.uk/staff/FabioCuzzolin/<br />
• Additional Organization Information<br />
• Size :<br />
o <strong>University</strong>: <strong>Oxford</strong> <strong>Brookes</strong> <strong>University</strong> is a premier learning and teaching institution with<br />
an outstanding research record. We are widely acknowledged to be the UK's leading<br />
modern university, surpassing many older institutions in newspaper league tables.<br />
The university had in 2009 a total of some 18,000 students, of which 22% at postgraduate<br />
level http://www.brookes.ac.uk/about/facts/statistics and some 200 PhD students.<br />
o The Department of Computing and Communication Technologies has some 32 faculty<br />
members and around 30 PhD students.<br />
o The Computer Vision http://cms.brookes.ac.uk/research/visiongroup and the Artificial<br />
Intelligence http://cms.brookes.ac.uk/staff/FabioCuzzolin/index-ml.html research groups are<br />
the strongest in the Department, with a combined total of 6 members of staff and almost 20<br />
postdocs/PhD students<br />
• Years in operation: <strong>Oxford</strong> <strong>Brookes</strong> <strong>University</strong> began life as the <strong>Oxford</strong> School of Art in 1865,<br />
become a <strong>University</strong> in 1992.<br />
• Annual sales: turnover of 170.2 million pounds and operating surplus of 15.3 million in 2011<br />
http://www.brookes.ac.uk/about/structure/annual_accounts/accounts1011.pdf<br />
• Contract/joint development with large companies, if sharable (name of the companies, type of<br />
relationship, etc.): The AI and Vision groups have continuing links, including KTPs, with companies<br />
such as Sony Entertainment, VICON, Microsoft Research Europe, Yotta as well newly developed<br />
23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com<br />
Request format and graphics© Copyright 2013 NineSigma, Inc
Request 69009 Page 2 of 11<br />
KTPs with HedgeVantage (a financial consulting company), Webmart, and Magna International (the<br />
multinational car component company). See our websites for more info.<br />
• Other information (sponsors, award, etc.): See our websites or call us for more info.<br />
Submission Terms<br />
By placing an “X” in the box below, I verify that I am submitting only non-confidential information. Further I<br />
agree to notify NineSigma, should this proposal result in a transaction with NineSigma's customer. (This effort<br />
is to ensure proper record keeping).<br />
I agree to NineSigma’s submission terms x<br />
Please insert your text below each heading in the form below, expanding as needed. Additional guidelines for<br />
preparing your proposal are included on the last page of this document.<br />
23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com
Request 69009 Page 3 of 11<br />
Title of <strong>Proposal</strong>: Locating and recognizing complex activities using part-based<br />
discriminative models<br />
Proposed Technical Approach<br />
Please include the following information by selecting options or giving brief description, possibly by<br />
using graph, figure, or drawing. Please append a copy of your paper pertaining to proposed<br />
technology.<br />
Category of proposed technology (please select a relevant one)<br />
Technology developed/optimized for monitoring people, in particular the localization and<br />
recognition of actions and complex activities by multiple actors<br />
Development stage (please select a relevant one)<br />
Under verification and improvement in both the lab and the field<br />
Overview of proposed technology<br />
(1) Analysis/recognition algorithm (what kind of human activities/behaviors can be analyzed in<br />
what processing steps):<br />
Action recognition is a hard problem for a number of reasons: #1 human motions possess a<br />
high degree of inherently variability, as quite distinct motions/gestures can carry the same<br />
meaning. As action models are typically learned from forcibly limited datasets (in terms of<br />
training videos and action classes) they have limited generalization power; #2 actions are<br />
subject to various nuisance or “covariate” factors, such as illumination, moving background,<br />
viewpoint, and many others (Figure 1). Tests have been often run in small controlled<br />
environments, while few attempts have been made to progress towards recognition “in the<br />
wild”; #3 detecting when and where an action takes place within a video is the first step in any<br />
action recognition framework: so far, however, the focus has largely been on the recognition of<br />
pre-segmented videos; #4 the presence of multiple actors (e.g., different players sitting in front<br />
of a single console) greatly complicates both localization and recognition; #5 a serious challenge<br />
arises when we move from simple, “atomic” actions to more sophisticated “activities”, series<br />
of elementary actions connected in a meaningful way common, for instance, in the smart home<br />
scenario.<br />
Figure 1: numerous nuisance factors, e.g. view, illumination, occlusions, multiple actors make<br />
the activity recognition problem hard.<br />
The most successful recent approaches, which mainly adopy kernel SVM classification of bags<br />
of local features (Figure 2), have reached their limits: only understanding the spatial and<br />
temporal structure of human activities can help us to successfully locate and recognize them in<br />
a robust and reliable way.<br />
23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com
Request 69009 Page 4 of 11<br />
Figure 2: BoF methods build histograms of frequencies of local video features: as any<br />
spatiotemporal relationship is lost, meaningless videos with almost the same histograms can be<br />
incorrectly recognized.<br />
Inspired by the successes of similar approaches in 2D object detection [20], we propose to<br />
represent human activities as spatio-temporal “objects” composed of distinct, coordinated<br />
“parts” (elementary actions).<br />
More specifically, instead of computing action descriptors on whole video clips (Figure 3 left), we<br />
do that for collections of space-time action parts associated with video subvolumes (middle);<br />
multiple instance learning (MIL) is used to learn which subvolumes are particularly<br />
discriminative of the action (solid-line green cubes), and which are not (dotted-line cubes); finally<br />
(right) a human action is represented as a “star model” of elementary BoF action parts.<br />
Figure 3: the proposed approach for learning and recognizing human activities as structured<br />
constellations of the most discriminative action parts.<br />
Step 1: Prior to modeling actions, video streams have to be processed to extract salient<br />
“features”, either frame by frame or from the entire spatio-temporal (S/T) volume which<br />
contains the action(s) of interest. A plethora of local video descriptors have been proposed for<br />
S/T volumes: Cuboid, 3D-SIFT, HoGHoF, HOG3D, extended SURF. Dense Trajectory<br />
Features, a combination of HoG-HoF with optical flow vectors and motion boundary histograms<br />
have been shown to outperformed all the other approaches. An appealing alternative to<br />
traditional video is provided by “range” (Time-of-Flight) cameras: feature extraction from<br />
range images and fusion of range and video features will be integral parts of this project.<br />
Step 2: from the local features extracted from each video subvolume, a Fisher vector<br />
representation is calculated, so that each subvolume is encoded by a single Fisher vector.<br />
Step 3: Multiple Instance Learning of the most discriminative (i.e. better characterizing an<br />
activity versus all the others) subvolumes. An initial “positive” model is learned by assuming<br />
that all examples in the positive bag (all the sub-volumes of the sequence) do contain the action<br />
at hand; a “negative” model is learned from the examples in the negative bags (videos labeled<br />
with a different action class). After an iterative process, only the most discriminative examples in<br />
each positive bag are retained. MIL reduces to a semi-convex optimisation problem, for which<br />
23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com
Request 69009 Page 5 of 11<br />
efficient heuristics exist [5]. The resulting model allows us to factor out the effect of common,<br />
shared context (similar background, common action elements).<br />
Step 4: Once the most discriminative action parts are learnt via MIL, we can construct tree-like<br />
ensembles of action parts (Figure 3 right) to use for both localization and classification of<br />
actions. Felzenszwalb and Huttenlocher have shown (in the object detection problem) that if the<br />
pictorial structure forms a star model, where each part is only connected to the root node, it is<br />
possible to compute the best match very efficiently by dynamic programming. Other<br />
approaches to building a constellation of discriminative parts have been proposed by Hoiem and<br />
Ramanan. Crucial will be the introduction of sparsity constraints in the Latent SVM semi-convex<br />
optimization problem proposed by Felzenszwalb to automatically identify the optimal number<br />
of parts.<br />
(2) Specifics in system configuration (please indicate required camera or system, if anything<br />
special is required as a basis of using the proposed algorithm):<br />
The approach is designed to work with both conventional cameras and range cameras, as in<br />
both cases a spatiotemporal volume can be constructed, from which the most discriminative<br />
parts can be learned and assembled in an overall model. A fusion of both would be pioneering<br />
work.<br />
(3) Applicability of the algorithm to versatile human activities (what should be overcome in<br />
applying algorithm developed for a specific human acitivity to any other human activities)<br />
The algorithm is being developed as general purpose: as such, it is designed to discriminate<br />
between any activities introduced in a training stage. In particular, it is explicitely designed to<br />
represent complex activities formed by a sequence of elementary actions; to cope with the<br />
presence of multiple actors/people; to localize the action of interested within a larger video in<br />
both space and time; to factor out the background (static or dynamic) in order to better<br />
discriminate different activities with common background or elementary components (i.e., parts<br />
in common).<br />
Current Performance (please answer to the following questions by showing a specific recognition<br />
task you have experienced so far as an example):<br />
(1) Recognition tasks/applications in brief (if proposers have experience in analyzing and<br />
recognizing one or some of the followings, please indicate those. If not, please briefly<br />
describe what kind of human activites proposers have experienced):<br />
The approach has been so far tested on most of the publicly available benchmarks for action<br />
recognition:<br />
The KTH dataset contains 6 action classes each performed by 25 actors, in four scenarios.<br />
People perform repetitive actions at different speeds and orientations. Sequences are longer<br />
when compared to the YouTube or the HMDB51 datasets, and contain clips in which the<br />
actors move in and out of the scene during the same sequence.<br />
The YouTube dataset contains 11 action categories and presents several challenges due to<br />
camera motion, object appearance, scale, viewpoint and cluttered backgrounds. The 1600<br />
video sequences are split into 25 groups, and we follow the author’s evaluation procedure of<br />
25-fold, leave-one-out cross validation.<br />
The Hollywood2 dataset contains 12 action classes collected from 69 different Hollywood<br />
movies. There are a total of 1707 action samples containing realistic, unconstrained human<br />
and camera motion. The dataset is divided into 823 training and 884 testing sequences, each<br />
from 5-25 seconds long.<br />
The HMDB dataset contains 51 action classes, with a total of 6849 video clips collected from<br />
movies, the Prelinger archive, YouTube and Google videos. Each action category contains a<br />
minimum of 101 clips.<br />
23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com
Request 69009 Page 6 of 11<br />
To quantify the respective algorithm performance, we report the accuracy (Acc), mean<br />
average precision (mAP), and mean F1 scores (mF1).<br />
(2) Recognition performance achieved so far:<br />
Figure 4. Left: preliminary localisation results (left) on a Hollywood2 video [45]. The colour of<br />
each box (subvolume) indicates the positive rank score of it belonging to the action class (red =<br />
high). In actioncliptest00058, a woman gets out of her car roughly around the middle of the<br />
video, as indicated by the detected subvolumes. Right: performance of MIL discriminative<br />
modelling (Step 3) with Dense Trajectory Features as features on the most common datasets,<br />
compared to the traditional BoF baseline. Even when using traditional feature, learning the most<br />
discriminative action parts via MIL much improve performance on challenging testbeds.<br />
Figure 5: performance of BoF global models with Fisher representation (Step 2) on the most<br />
common datasets, compared to the State of the Art. Note how accuracy and average precision<br />
(recognition rate) dramatically improve w.r.t. to previous approaches.<br />
(3) Latency to recognize specific human activity (how many seconds after the occurrence or<br />
specific human activities, can the activities be recognized by the algorithm):<br />
( ~2 ) sec for recognition on the KTH dataset: as features are computed from volumes frameper-seconds<br />
do not make much sense in our approach: anyway, the frame rate in all sequences<br />
is around 30fps.<br />
Computing the classification scores for 60,000 testing video instances (each 1000dim) on the<br />
KTH dataset takes 0.5 seconds on a standard laptop: this does not include feature computation<br />
and representation times, which can vary largely depending on choice of features,<br />
representation, classification methods, and pc hardware.<br />
(4) Possibility to predict the occurrence of specific human activities (please select a<br />
relevant one)<br />
Possible<br />
23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com
Request 69009 Page 7 of 11<br />
Though no such tests have been conducted yet, an online version of the algorithm can be<br />
imagined in which the recognition of elementary actions which can be part of a more complex<br />
activity is used to predict the likelihood of the latter happening before it actually takes place.<br />
(5) Necessary resolution of target objects in the image for proper analysis/recognition<br />
( 360 ) x ( 240 ) pixel in most cases (see above datasets descriptions); however, subsampling is<br />
normally used to reduce computational time, so that actual videos have even lower<br />
dimensionality.<br />
(6) Illumination on target objects for proper analysis/recognition (qualitative description is<br />
all right, if it is hard to quantify illumination level, e.g., can be used not only in bright but in<br />
somewhat dark rooms)<br />
As so far we have used state of the art benchmarks, rather than in-house datasets, it is hard to<br />
answer quantitatively this particular question.<br />
However, all the most recent tested datasets (Hollywood, Hollywood 2, YouTube, HMDB51)<br />
contain videos characterized by widely varying degrees of illumination: indeed,a feature of our<br />
approach is to build models invariant to nuisance factors such as illumination.<br />
Please have a look at the relevant web pages for a closer inspection:<br />
http://www.di.ens.fr/~laptev/actions/hollywood2/<br />
http://www.cs.ucf.edu/~liujg/YouTube_Action_dataset.html<br />
Especially HMDB<br />
http://serre-lab.clps.brown.edu/resources/HMDB/<br />
contains very dark as well as very bright sequences.<br />
Development plan for establishing recognition technology for any of the followings or alike<br />
(Getting out of or falling off a bed / falling down / breathing / becoming feverish / having a fit or<br />
cough / having convulsions / experiencing pain / choking or difficulty swallowing ))<br />
Human activites already tackled (if recognition of any of the above human activities have<br />
been already realized, please indicate them, or briefly describe them):<br />
The activities specified above have not explicitely tackled to date, as we focussed so far on<br />
the most common publicly available datasets. However, as our approach is inherently<br />
flexible (part-based) and learns to discriminate from a training set we do not foresee<br />
difficulties in tackling a different set of action classes.<br />
Development plan and challenges to be overcome if recognition of any of abovementioned<br />
human activites will be newly tried<br />
Datasets focussed on the activities of interest to you are being search by our PhD student<br />
Sapienza. In case the search's outcome is not positive, we propose to collect our own<br />
testbed via both traditional and range cameras (e.g. the Kinect device) in our possession.<br />
Range cameras are particularly attractive for indoor scenarios, and two separate efforts to<br />
perform gesture/exercise recognition via Kinect for clinical purposes (which led to a pending<br />
NIHR i4i grant application) and for exercising (by a Msc student) are being pursued by as at<br />
the current time.<br />
Pose estimation algos from range data already exist, though more customized solutions can<br />
be studied. After mapping activities to series of human poses the above techniques can be<br />
applied to recognize them.<br />
Understanding in privacy issues in installing camera based surveillance system for<br />
monitoring people, or a network with organizations having such capability, if any<br />
23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com
Request 69009 Page 8 of 11<br />
Privacy and Intellectual Property are managed centrally by <strong>Oxford</strong> <strong>Brookes</strong> <strong>University</strong> via<br />
RBDO (Research and Business Development Office).<br />
As for organizations with expertise in the surveillance systems, the group has links with<br />
HMGCC (http://www.hmgcc.gov.uk/), the UK government centre of excellence whose<br />
aim is to design and develop secure communication systems, hardware and software<br />
specifically for HM Government use, both at home and overseas.<br />
Proposed Budget and Conditions<br />
Preferred style of collaboration and proposed conditions<br />
For Phase 1 (validation) we believe the best strategy is to hire a Research Assistant for 1-2 years,<br />
which will work full time on the project to complete all the steps (1-4) of the algorithm (in<br />
collaboration with our Ph.D. student Michael Sapienza), and tune it towards the actions/activities<br />
specified by the client and the smart home scenario. In this perspective it makes sense to explore<br />
range camera technologies, as we are doing in the medical monitoring application (see below).<br />
The naked cost of a member of staff is 10,000-14000 pounds a year for an RA (postgraduate<br />
student), 27-30,000 pounds for a postdoctoral researcher. Indirect costs need to be added for about<br />
150% of the naked cost.<br />
Equipment is already in possession of the group in terms of computer clusters, range and<br />
traditional cameras, but possibly a few thousand pounds will be needed in this sense.<br />
For Phase 2 (commercialization) the most suitable scheme is probably that of a Knowledge Transfer<br />
Partnership (KTP), on which the groups and <strong>Oxford</strong> <strong>Brookes</strong> in general have a strong background.<br />
A KTP typically lasts for 2 years: each Associate has a total cost of some 75,000 pounds a year (of<br />
which 75% funded by the UK government, only 25% in charge of the industrial partner) – one or two<br />
Associates can be requested depending on the scale and complexity of the project.<br />
Status of Intellectual Property of proposed technology and the organizational policy regarding technology<br />
transfer, licencing, etc.<br />
At the present time the technology is being developed as a research project, so as such it has<br />
not been patented yet, though we plan to proceed in that direction given the very encouraging<br />
performances.<br />
Intellectual Property are managed centrally by <strong>Oxford</strong> <strong>Brookes</strong> <strong>University</strong> via RBDO (Research<br />
and Business Development Office) which has access to financial and other resources to<br />
enable Intellectual Property and its commercial exploitation to be effectively managed, whilst<br />
maximising the widespread dissemination of the research results. This includes finance for<br />
patenting and proof of concept funding; Intellectual Property, technology and market<br />
assessment; resources for defining and implementing a commercialisation strategy though<br />
licensing, start-up company or other routes.<br />
RBDO has a strong track record of commercialisation of its intellectual property. Income from<br />
licences was £1.5M in 2011 and this ranks the <strong>University</strong> in the top 10 in the UK of all<br />
universities for royalty income. OBU through RBDO holds a total portfolio of 20 patents. The<br />
<strong>University</strong> also supports the creation of spin out companies when appropriate and has some<br />
successful examples. In particular, <strong>Oxford</strong> <strong>Brookes</strong> is extremely active in the field of<br />
Knowledge Transfer Partnerships: the Artificial Intelligence group has two upcoming KTPs<br />
concerning machine learning techniques for trading and pricing, while the Computer Vision group<br />
has won the 2009 National KTP Award selected over hundreds of project by the Technology<br />
Strategy board.<br />
<strong>Proposal</strong> Team Experience<br />
23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com
Please include the following if applicable<br />
Request 69009 Page 9 of 11<br />
Selected articles/journal publications, patents, etc. related to proposed technology<br />
F. Cuzzolin, Using bilinear models for view-invariant action and identity recognition, Proc. of Computer<br />
Vision and Pattern Recognition (CVPR'06), 1701-1708, 2006.<br />
F. Cuzzolin, Multilinear modeling for robust identity recognition from gait, “Behavioral Biometrics for Human<br />
Identification: Intelligent Applications”, pp. 169-188, L. Wang and X. Geng Eds., IGI Publishing, 2010.<br />
F. Cuzzolin, Learning pullback manifolds of generative dynamical models for action recognition, IEEE<br />
Transactions on PAMI (2012, u/r).<br />
F. Cuzzolin, D. Mateus and R. Horaud, Robust coherent Laplacian protrusion segmentation along 3D<br />
sequences, International Journal of Computer Vision (2012, u/r).<br />
F. Cuzzolin, D. Mateus, D. Knossow, E. Boyer and R. Horaud, Coherent laplacian protrusion segmentation,<br />
Proc. of Computer Vision and Pattern Recognition (CVPR'08), pp. 1-8, June 2008.<br />
M. Sapienza, F. Cuzzolin and Ph. Torr, Learning discriminative space-time actions from weakly labelled<br />
videos (best poster prize recipient), INRIA Machine Learning Summer School, Grenoble, July 2012<br />
M. Sapienza, F. Cuzzolin and Ph. Torr, Learning discriminative space-time actions from weakly labelled<br />
videos, Proc. of the British Machine Vision Conference (BMVC'12), September 2012.<br />
M. Sapienza, F. Cuzzolin and Ph. Torr, Learning Fisher star models for action recognition in space-time<br />
videos, submitted to CVPR 2013.<br />
Track records of research and development or product development by principal developers<br />
The Artificial Intelligence and the Computer Vision groups are very active in computer vision and activity<br />
recognition in particular.<br />
Dr Cuzzolin has recently been awarded a 122K£ EPSRC First Grant for a project on “Tensorial modeling<br />
of dynamical systems for gait and activity recognition” which has received 6/6 reviews, and proposes a<br />
generative approach to action and identity recognition. He is in the process of submitting a a 200K£<br />
Leverhulme project on “Guessing plots for video googling” (strongly related to the current proposal).<br />
Also most relevant to the current proposal, Dr Fabio Cuzzolin and Professor Phil Torr have a joint pending<br />
EPSRC (the UK's Engineering and Physical Sciences Research Council) £ 650,000 (1.1M $) grant<br />
application on “Making action recognition work in the real world”. Dr Cuzzolin and Professor Helen<br />
Dawes have a joint pending £ 370,000 NIHR grant application on a project focussed on “Monitoring health<br />
conditions at home via Kinect”, which proposes to use action classification to monitor brain conditions in<br />
patients remotely. Contacts have been made with Magna International, the car component company, on<br />
the application of gait recognition for biometric purposes and smart vehicles able to gesturally interact with<br />
drivers and pedestrians.<br />
Dr Cuzzolin is preparing as the Coordinator a European Union collaborative 3 million euro (4 million dollar)<br />
STREP project on “Action Recognition for Video Management”, with Technicolor (France), INRIA<br />
TexMex (France), ETH Zurich (Switzerland) as partners, to submit to Horizon 2020.<br />
Professor Torr was involved in the startup company 2d3 (http://www.2d3.com/), part of the <strong>Oxford</strong> Metrics<br />
Group (OMG). Their first product, “boujou”, is used by special effects studios all over the world. Boujou is<br />
used to track the motion of the camera and allow for clean video insertion of objects, and has been used on<br />
the special effects of almost every major feature film in the last five years, including the “Harry Potter” and<br />
“Lord of the Rings” series. Prof. Torr has directly worked with the following companies based in the UK:<br />
2d3, Vicon Life, Yotta (http://www.yotta.tv/company/), Microsoft Research Europe, Sharp Laboratories,<br />
Sony Entertainments Europe, with contributions to commercial products appearing (or about to appear)<br />
with four of them. His work is currently in use in the film and game industry. His work with the <strong>Oxford</strong><br />
Metrics Group in a Knowledge Transfer Partnership 2005-9 won the 2009 National Best KTP of the year<br />
award, selected out of several hundred projects (http://www.ktponline.org.uk/awards2009/BestKTP.aspx).<br />
Professor Torr's segmentation work just appeared in Sony’s new flagship Christmas 2012 PS3 launch:<br />
“Wonderbook, Book of Spells”: http://www.brookes.ac.uk/business_employers/ktp/wonderbook/index_html.<br />
23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com
Request 69009 Page 10 of 11<br />
Prof. Torr has been the PI on several grants, several of which are related to the topic of this proposal. His<br />
EPSRC first grant (cash limited to 120K) was Markerless Motion Capture for Humans in Video<br />
(GR/T21790/01(P)) Oct 2004-Oct 2007, which has led to a large output of research, including four papers<br />
accepted as orals to the top vision conferences. His second EPSRC grant Automatic Generation of Content<br />
for 3D Displays EP/C006631/1, Nov 2005-May 2009, has led to a SIGGRAPH paper (and patent) as well as<br />
a paper prize at IEEE CVPR 2008 and at NIPS 2007, amongst others. The majority of these papers have<br />
been published on the main journals in the field: IJCV, JMLR, PAMI. The product arising from the<br />
SIGGRAPH paper (VideoTrace) has led to a spin off company.<br />
The groups have running KTPs (Knowledge Transfer Partnerships) with companies as diverse as<br />
HedgeVantage (a financial trading consultant company), Webmart (print brokerage), Sony Europe, VICON<br />
(the motion capture equipment company).<br />
Submitting Your <strong>Proposal</strong> (Please delete this section from your proposal document)<br />
READY TO SUBMIT?<br />
Overview<br />
QUESTIONS?<br />
All proposals should be submitted online at NineSights, the collaborative<br />
innovation community from NineSigma.<br />
• Already a member? Please login now.<br />
• Need to Register? Start here. Registration is free. You will be asked to<br />
agree to the NineSights Terms of Use as part of registration.<br />
Once you have logged in to NineSights:<br />
1. SAVE your completed proposal document to your computer<br />
2. CLICK HERE to open the RFP page<br />
3. Click the red RESPOND NOW button next to the RFP<br />
4. Enter a brief abstract on the submission form and attach your saved<br />
proposal and any supplemental files, then click SUBMIT<br />
5. Submitted proposals will appear on your Dashboard under Content. <strong>Proposal</strong>s<br />
are private – only you and NineSigma can view them online.<br />
View answers to<br />
Frequently Asked Questions<br />
Contact the Solution Provider Help Desk<br />
EMAIL: PhD@ninesigma.com<br />
PHONE: +1 216-283-3901<br />
Form Instructions (This page may be deleted from your proposal document)<br />
Your response is essentially an introduction to NineSigma’s client of who you are, your capabilities, and what type of<br />
possible solution you can offer. This is an initial opportunity to present your innovation for further discussion. Your<br />
response should be a non-enabling disclosure. Your response must not contain any confidential information or<br />
information that would enable someone else to replicate your invention without paying for it.<br />
Target Audience<br />
Your goal is to provide a compelling description of your proposed solution to trigger the interest of the Request<br />
sponsor’s decision makers and the people with the technical and business knowledge to make the final decision.<br />
NineSigma does not evaluate the technology in proposals or screen responses for our client. We do provide organized<br />
summaries of your capabilities as they compare to the Request specifications and the client’s evaluation criteria.<br />
<strong>Proposal</strong> Content<br />
Please insert your text below each heading in the form above. We recommend a 3-page limit, but you may use as<br />
many pages as necessary to present relevant and compelling information. In addition, you may delete this instruction<br />
page and any other italicized notes or make other customizations to the document to suit your needs.<br />
23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com
Request 69009 Page 11 of 11<br />
Our Client wants to learn about… Other Suggestions<br />
• WHAT your technology does and a general description of how<br />
it works (You may include a more detailed discussion if your<br />
intellectual property (IP) has been secured appropriately)<br />
• How your solution addresses the specifications in the Request<br />
• What differentiates your solution from others in the field<br />
− Unique aspects of your technology<br />
− How your solution overcomes drawbacks of other existing<br />
technologies<br />
• Performance or technical data (current or anticipated)<br />
• The readiness of your technology (e.g. at proof-of-concept<br />
phase, already in use, etc.)<br />
• IP you may have around the proposed technology<br />
• Who you are and the expertise of you and your team or<br />
organization with respect to the needs of the Request<br />
• What you need in order to continue the discussion or reveal<br />
the details of your solution. (e.g. confidentiality agreement)<br />
• Budget and timeline estimate for the initial phase or for other<br />
arrangements as appropriate<br />
−Consider the client’s funding amount (if listed) and<br />
your budget as starting points in the negotiation<br />
• Use the professional language of<br />
science/engineering/technology<br />
• Avoid jargon<br />
• Consider including photographs or a<br />
video clip if appropriate<br />
• Attach supplemental information<br />
(such as a resume, brochure, or<br />
publication) to the end of this<br />
document<br />
• OR you may upload up to 10<br />
supplemental files when submitting<br />
this proposal through our website<br />
Our Clients evaluate…<br />
• Partial solutions<br />
• <strong>Proposal</strong>s from collaborative teams<br />
• Statements of interest from<br />
government laboratories<br />
• <strong>Proposal</strong>s from outside the U.S.<br />
For additional guidance and suggestions, view our Guide to Writing a Compelling Non-Confidential <strong>Proposal</strong>.<br />
How <strong>Proposal</strong>s are Evaluated<br />
♦ Our client will use the information you provide to judge whether they should pursue more in-depth discussions,<br />
negotiations, or other arrangements directly with you. This initial evaluation requires about two months.<br />
♦ NineSigma will notify respondents if our client selected them for progression or not. If they were not selected, they<br />
may be able to receive feedback directly from the organization as to why they were not selected.<br />
♦ If selected for progression, the next step would be a conversation either with NineSigma or directly with the<br />
requesting organization to answer any outstanding questions.<br />
♦ If both parties wish to proceed, the requesting organization may initiate a contract such as a confidentiality<br />
agreement for further detailed discussion, a face-to-face meeting, or a submission of samples for evaluation.<br />
♦ The final step would be a contract establishing an official business relationship. This could include a supply<br />
agreement, licensing, a research contract, or a joint development agreement.<br />
23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com