17.08.2013 Views

Outline Proposal - Oxford Brookes University

Outline Proposal - Oxford Brookes University

Outline Proposal - Oxford Brookes University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Request Title:<br />

NineSigma Point of Contact:<br />

PROPOSAL FOR REQUEST #69009<br />

Video Analysis/Recognition of Human Activities<br />

S. Akutagawa<br />

Submission Date: 28 January, 2013<br />

Contact Information<br />

• Name of organization: <strong>Oxford</strong> <strong>Brookes</strong> <strong>University</strong><br />

• Name of proposer(s): Dr Fabio Cuzzolin<br />

• Address Department of Computing and Communication<br />

Technologies, Wheatley campus<br />

• City, State, Zip: <strong>Oxford</strong>, OX33 1HX<br />

• Country: United Kingdom<br />

• Phone: +44 1865 484526<br />

• Email: fabio.cuzzolin@brookes.ac.uk<br />

• Direct Web Page Link: http://cms.brookes.ac.uk/staff/FabioCuzzolin/<br />

• Additional Organization Information<br />

• Size :<br />

o <strong>University</strong>: <strong>Oxford</strong> <strong>Brookes</strong> <strong>University</strong> is a premier learning and teaching institution with<br />

an outstanding research record. We are widely acknowledged to be the UK's leading<br />

modern university, surpassing many older institutions in newspaper league tables.<br />

The university had in 2009 a total of some 18,000 students, of which 22% at postgraduate<br />

level http://www.brookes.ac.uk/about/facts/statistics and some 200 PhD students.<br />

o The Department of Computing and Communication Technologies has some 32 faculty<br />

members and around 30 PhD students.<br />

o The Computer Vision http://cms.brookes.ac.uk/research/visiongroup and the Artificial<br />

Intelligence http://cms.brookes.ac.uk/staff/FabioCuzzolin/index-ml.html research groups are<br />

the strongest in the Department, with a combined total of 6 members of staff and almost 20<br />

postdocs/PhD students<br />

• Years in operation: <strong>Oxford</strong> <strong>Brookes</strong> <strong>University</strong> began life as the <strong>Oxford</strong> School of Art in 1865,<br />

become a <strong>University</strong> in 1992.<br />

• Annual sales: turnover of 170.2 million pounds and operating surplus of 15.3 million in 2011<br />

http://www.brookes.ac.uk/about/structure/annual_accounts/accounts1011.pdf<br />

• Contract/joint development with large companies, if sharable (name of the companies, type of<br />

relationship, etc.): The AI and Vision groups have continuing links, including KTPs, with companies<br />

such as Sony Entertainment, VICON, Microsoft Research Europe, Yotta as well newly developed<br />

23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com<br />

Request format and graphics© Copyright 2013 NineSigma, Inc


Request 69009 Page 2 of 11<br />

KTPs with HedgeVantage (a financial consulting company), Webmart, and Magna International (the<br />

multinational car component company). See our websites for more info.<br />

• Other information (sponsors, award, etc.): See our websites or call us for more info.<br />

Submission Terms<br />

By placing an “X” in the box below, I verify that I am submitting only non-confidential information. Further I<br />

agree to notify NineSigma, should this proposal result in a transaction with NineSigma's customer. (This effort<br />

is to ensure proper record keeping).<br />

I agree to NineSigma’s submission terms x<br />

Please insert your text below each heading in the form below, expanding as needed. Additional guidelines for<br />

preparing your proposal are included on the last page of this document.<br />

23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com


Request 69009 Page 3 of 11<br />

Title of <strong>Proposal</strong>: Locating and recognizing complex activities using part-based<br />

discriminative models<br />

Proposed Technical Approach<br />

Please include the following information by selecting options or giving brief description, possibly by<br />

using graph, figure, or drawing. Please append a copy of your paper pertaining to proposed<br />

technology.<br />

Category of proposed technology (please select a relevant one)<br />

Technology developed/optimized for monitoring people, in particular the localization and<br />

recognition of actions and complex activities by multiple actors<br />

Development stage (please select a relevant one)<br />

Under verification and improvement in both the lab and the field<br />

Overview of proposed technology<br />

(1) Analysis/recognition algorithm (what kind of human activities/behaviors can be analyzed in<br />

what processing steps):<br />

Action recognition is a hard problem for a number of reasons: #1 human motions possess a<br />

high degree of inherently variability, as quite distinct motions/gestures can carry the same<br />

meaning. As action models are typically learned from forcibly limited datasets (in terms of<br />

training videos and action classes) they have limited generalization power; #2 actions are<br />

subject to various nuisance or “covariate” factors, such as illumination, moving background,<br />

viewpoint, and many others (Figure 1). Tests have been often run in small controlled<br />

environments, while few attempts have been made to progress towards recognition “in the<br />

wild”; #3 detecting when and where an action takes place within a video is the first step in any<br />

action recognition framework: so far, however, the focus has largely been on the recognition of<br />

pre-segmented videos; #4 the presence of multiple actors (e.g., different players sitting in front<br />

of a single console) greatly complicates both localization and recognition; #5 a serious challenge<br />

arises when we move from simple, “atomic” actions to more sophisticated “activities”, series<br />

of elementary actions connected in a meaningful way common, for instance, in the smart home<br />

scenario.<br />

Figure 1: numerous nuisance factors, e.g. view, illumination, occlusions, multiple actors make<br />

the activity recognition problem hard.<br />

The most successful recent approaches, which mainly adopy kernel SVM classification of bags<br />

of local features (Figure 2), have reached their limits: only understanding the spatial and<br />

temporal structure of human activities can help us to successfully locate and recognize them in<br />

a robust and reliable way.<br />

23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com


Request 69009 Page 4 of 11<br />

Figure 2: BoF methods build histograms of frequencies of local video features: as any<br />

spatiotemporal relationship is lost, meaningless videos with almost the same histograms can be<br />

incorrectly recognized.<br />

Inspired by the successes of similar approaches in 2D object detection [20], we propose to<br />

represent human activities as spatio-temporal “objects” composed of distinct, coordinated<br />

“parts” (elementary actions).<br />

More specifically, instead of computing action descriptors on whole video clips (Figure 3 left), we<br />

do that for collections of space-time action parts associated with video subvolumes (middle);<br />

multiple instance learning (MIL) is used to learn which subvolumes are particularly<br />

discriminative of the action (solid-line green cubes), and which are not (dotted-line cubes); finally<br />

(right) a human action is represented as a “star model” of elementary BoF action parts.<br />

Figure 3: the proposed approach for learning and recognizing human activities as structured<br />

constellations of the most discriminative action parts.<br />

Step 1: Prior to modeling actions, video streams have to be processed to extract salient<br />

“features”, either frame by frame or from the entire spatio-temporal (S/T) volume which<br />

contains the action(s) of interest. A plethora of local video descriptors have been proposed for<br />

S/T volumes: Cuboid, 3D-SIFT, HoGHoF, HOG3D, extended SURF. Dense Trajectory<br />

Features, a combination of HoG-HoF with optical flow vectors and motion boundary histograms<br />

have been shown to outperformed all the other approaches. An appealing alternative to<br />

traditional video is provided by “range” (Time-of-Flight) cameras: feature extraction from<br />

range images and fusion of range and video features will be integral parts of this project.<br />

Step 2: from the local features extracted from each video subvolume, a Fisher vector<br />

representation is calculated, so that each subvolume is encoded by a single Fisher vector.<br />

Step 3: Multiple Instance Learning of the most discriminative (i.e. better characterizing an<br />

activity versus all the others) subvolumes. An initial “positive” model is learned by assuming<br />

that all examples in the positive bag (all the sub-volumes of the sequence) do contain the action<br />

at hand; a “negative” model is learned from the examples in the negative bags (videos labeled<br />

with a different action class). After an iterative process, only the most discriminative examples in<br />

each positive bag are retained. MIL reduces to a semi-convex optimisation problem, for which<br />

23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com


Request 69009 Page 5 of 11<br />

efficient heuristics exist [5]. The resulting model allows us to factor out the effect of common,<br />

shared context (similar background, common action elements).<br />

Step 4: Once the most discriminative action parts are learnt via MIL, we can construct tree-like<br />

ensembles of action parts (Figure 3 right) to use for both localization and classification of<br />

actions. Felzenszwalb and Huttenlocher have shown (in the object detection problem) that if the<br />

pictorial structure forms a star model, where each part is only connected to the root node, it is<br />

possible to compute the best match very efficiently by dynamic programming. Other<br />

approaches to building a constellation of discriminative parts have been proposed by Hoiem and<br />

Ramanan. Crucial will be the introduction of sparsity constraints in the Latent SVM semi-convex<br />

optimization problem proposed by Felzenszwalb to automatically identify the optimal number<br />

of parts.<br />

(2) Specifics in system configuration (please indicate required camera or system, if anything<br />

special is required as a basis of using the proposed algorithm):<br />

The approach is designed to work with both conventional cameras and range cameras, as in<br />

both cases a spatiotemporal volume can be constructed, from which the most discriminative<br />

parts can be learned and assembled in an overall model. A fusion of both would be pioneering<br />

work.<br />

(3) Applicability of the algorithm to versatile human activities (what should be overcome in<br />

applying algorithm developed for a specific human acitivity to any other human activities)<br />

The algorithm is being developed as general purpose: as such, it is designed to discriminate<br />

between any activities introduced in a training stage. In particular, it is explicitely designed to<br />

represent complex activities formed by a sequence of elementary actions; to cope with the<br />

presence of multiple actors/people; to localize the action of interested within a larger video in<br />

both space and time; to factor out the background (static or dynamic) in order to better<br />

discriminate different activities with common background or elementary components (i.e., parts<br />

in common).<br />

Current Performance (please answer to the following questions by showing a specific recognition<br />

task you have experienced so far as an example):<br />

(1) Recognition tasks/applications in brief (if proposers have experience in analyzing and<br />

recognizing one or some of the followings, please indicate those. If not, please briefly<br />

describe what kind of human activites proposers have experienced):<br />

The approach has been so far tested on most of the publicly available benchmarks for action<br />

recognition:<br />

The KTH dataset contains 6 action classes each performed by 25 actors, in four scenarios.<br />

People perform repetitive actions at different speeds and orientations. Sequences are longer<br />

when compared to the YouTube or the HMDB51 datasets, and contain clips in which the<br />

actors move in and out of the scene during the same sequence.<br />

The YouTube dataset contains 11 action categories and presents several challenges due to<br />

camera motion, object appearance, scale, viewpoint and cluttered backgrounds. The 1600<br />

video sequences are split into 25 groups, and we follow the author’s evaluation procedure of<br />

25-fold, leave-one-out cross validation.<br />

The Hollywood2 dataset contains 12 action classes collected from 69 different Hollywood<br />

movies. There are a total of 1707 action samples containing realistic, unconstrained human<br />

and camera motion. The dataset is divided into 823 training and 884 testing sequences, each<br />

from 5-25 seconds long.<br />

The HMDB dataset contains 51 action classes, with a total of 6849 video clips collected from<br />

movies, the Prelinger archive, YouTube and Google videos. Each action category contains a<br />

minimum of 101 clips.<br />

23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com


Request 69009 Page 6 of 11<br />

To quantify the respective algorithm performance, we report the accuracy (Acc), mean<br />

average precision (mAP), and mean F1 scores (mF1).<br />

(2) Recognition performance achieved so far:<br />

Figure 4. Left: preliminary localisation results (left) on a Hollywood2 video [45]. The colour of<br />

each box (subvolume) indicates the positive rank score of it belonging to the action class (red =<br />

high). In actioncliptest00058, a woman gets out of her car roughly around the middle of the<br />

video, as indicated by the detected subvolumes. Right: performance of MIL discriminative<br />

modelling (Step 3) with Dense Trajectory Features as features on the most common datasets,<br />

compared to the traditional BoF baseline. Even when using traditional feature, learning the most<br />

discriminative action parts via MIL much improve performance on challenging testbeds.<br />

Figure 5: performance of BoF global models with Fisher representation (Step 2) on the most<br />

common datasets, compared to the State of the Art. Note how accuracy and average precision<br />

(recognition rate) dramatically improve w.r.t. to previous approaches.<br />

(3) Latency to recognize specific human activity (how many seconds after the occurrence or<br />

specific human activities, can the activities be recognized by the algorithm):<br />

( ~2 ) sec for recognition on the KTH dataset: as features are computed from volumes frameper-seconds<br />

do not make much sense in our approach: anyway, the frame rate in all sequences<br />

is around 30fps.<br />

Computing the classification scores for 60,000 testing video instances (each 1000dim) on the<br />

KTH dataset takes 0.5 seconds on a standard laptop: this does not include feature computation<br />

and representation times, which can vary largely depending on choice of features,<br />

representation, classification methods, and pc hardware.<br />

(4) Possibility to predict the occurrence of specific human activities (please select a<br />

relevant one)<br />

Possible<br />

23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com


Request 69009 Page 7 of 11<br />

Though no such tests have been conducted yet, an online version of the algorithm can be<br />

imagined in which the recognition of elementary actions which can be part of a more complex<br />

activity is used to predict the likelihood of the latter happening before it actually takes place.<br />

(5) Necessary resolution of target objects in the image for proper analysis/recognition<br />

( 360 ) x ( 240 ) pixel in most cases (see above datasets descriptions); however, subsampling is<br />

normally used to reduce computational time, so that actual videos have even lower<br />

dimensionality.<br />

(6) Illumination on target objects for proper analysis/recognition (qualitative description is<br />

all right, if it is hard to quantify illumination level, e.g., can be used not only in bright but in<br />

somewhat dark rooms)<br />

As so far we have used state of the art benchmarks, rather than in-house datasets, it is hard to<br />

answer quantitatively this particular question.<br />

However, all the most recent tested datasets (Hollywood, Hollywood 2, YouTube, HMDB51)<br />

contain videos characterized by widely varying degrees of illumination: indeed,a feature of our<br />

approach is to build models invariant to nuisance factors such as illumination.<br />

Please have a look at the relevant web pages for a closer inspection:<br />

http://www.di.ens.fr/~laptev/actions/hollywood2/<br />

http://www.cs.ucf.edu/~liujg/YouTube_Action_dataset.html<br />

Especially HMDB<br />

http://serre-lab.clps.brown.edu/resources/HMDB/<br />

contains very dark as well as very bright sequences.<br />

Development plan for establishing recognition technology for any of the followings or alike<br />

(Getting out of or falling off a bed / falling down / breathing / becoming feverish / having a fit or<br />

cough / having convulsions / experiencing pain / choking or difficulty swallowing ))<br />

Human activites already tackled (if recognition of any of the above human activities have<br />

been already realized, please indicate them, or briefly describe them):<br />

The activities specified above have not explicitely tackled to date, as we focussed so far on<br />

the most common publicly available datasets. However, as our approach is inherently<br />

flexible (part-based) and learns to discriminate from a training set we do not foresee<br />

difficulties in tackling a different set of action classes.<br />

Development plan and challenges to be overcome if recognition of any of abovementioned<br />

human activites will be newly tried<br />

Datasets focussed on the activities of interest to you are being search by our PhD student<br />

Sapienza. In case the search's outcome is not positive, we propose to collect our own<br />

testbed via both traditional and range cameras (e.g. the Kinect device) in our possession.<br />

Range cameras are particularly attractive for indoor scenarios, and two separate efforts to<br />

perform gesture/exercise recognition via Kinect for clinical purposes (which led to a pending<br />

NIHR i4i grant application) and for exercising (by a Msc student) are being pursued by as at<br />

the current time.<br />

Pose estimation algos from range data already exist, though more customized solutions can<br />

be studied. After mapping activities to series of human poses the above techniques can be<br />

applied to recognize them.<br />

Understanding in privacy issues in installing camera based surveillance system for<br />

monitoring people, or a network with organizations having such capability, if any<br />

23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com


Request 69009 Page 8 of 11<br />

Privacy and Intellectual Property are managed centrally by <strong>Oxford</strong> <strong>Brookes</strong> <strong>University</strong> via<br />

RBDO (Research and Business Development Office).<br />

As for organizations with expertise in the surveillance systems, the group has links with<br />

HMGCC (http://www.hmgcc.gov.uk/), the UK government centre of excellence whose<br />

aim is to design and develop secure communication systems, hardware and software<br />

specifically for HM Government use, both at home and overseas.<br />

Proposed Budget and Conditions<br />

Preferred style of collaboration and proposed conditions<br />

For Phase 1 (validation) we believe the best strategy is to hire a Research Assistant for 1-2 years,<br />

which will work full time on the project to complete all the steps (1-4) of the algorithm (in<br />

collaboration with our Ph.D. student Michael Sapienza), and tune it towards the actions/activities<br />

specified by the client and the smart home scenario. In this perspective it makes sense to explore<br />

range camera technologies, as we are doing in the medical monitoring application (see below).<br />

The naked cost of a member of staff is 10,000-14000 pounds a year for an RA (postgraduate<br />

student), 27-30,000 pounds for a postdoctoral researcher. Indirect costs need to be added for about<br />

150% of the naked cost.<br />

Equipment is already in possession of the group in terms of computer clusters, range and<br />

traditional cameras, but possibly a few thousand pounds will be needed in this sense.<br />

For Phase 2 (commercialization) the most suitable scheme is probably that of a Knowledge Transfer<br />

Partnership (KTP), on which the groups and <strong>Oxford</strong> <strong>Brookes</strong> in general have a strong background.<br />

A KTP typically lasts for 2 years: each Associate has a total cost of some 75,000 pounds a year (of<br />

which 75% funded by the UK government, only 25% in charge of the industrial partner) – one or two<br />

Associates can be requested depending on the scale and complexity of the project.<br />

Status of Intellectual Property of proposed technology and the organizational policy regarding technology<br />

transfer, licencing, etc.<br />

At the present time the technology is being developed as a research project, so as such it has<br />

not been patented yet, though we plan to proceed in that direction given the very encouraging<br />

performances.<br />

Intellectual Property are managed centrally by <strong>Oxford</strong> <strong>Brookes</strong> <strong>University</strong> via RBDO (Research<br />

and Business Development Office) which has access to financial and other resources to<br />

enable Intellectual Property and its commercial exploitation to be effectively managed, whilst<br />

maximising the widespread dissemination of the research results. This includes finance for<br />

patenting and proof of concept funding; Intellectual Property, technology and market<br />

assessment; resources for defining and implementing a commercialisation strategy though<br />

licensing, start-up company or other routes.<br />

RBDO has a strong track record of commercialisation of its intellectual property. Income from<br />

licences was £1.5M in 2011 and this ranks the <strong>University</strong> in the top 10 in the UK of all<br />

universities for royalty income. OBU through RBDO holds a total portfolio of 20 patents. The<br />

<strong>University</strong> also supports the creation of spin out companies when appropriate and has some<br />

successful examples. In particular, <strong>Oxford</strong> <strong>Brookes</strong> is extremely active in the field of<br />

Knowledge Transfer Partnerships: the Artificial Intelligence group has two upcoming KTPs<br />

concerning machine learning techniques for trading and pricing, while the Computer Vision group<br />

has won the 2009 National KTP Award selected over hundreds of project by the Technology<br />

Strategy board.<br />

<strong>Proposal</strong> Team Experience<br />

23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com


Please include the following if applicable<br />

Request 69009 Page 9 of 11<br />

Selected articles/journal publications, patents, etc. related to proposed technology<br />

F. Cuzzolin, Using bilinear models for view-invariant action and identity recognition, Proc. of Computer<br />

Vision and Pattern Recognition (CVPR'06), 1701-1708, 2006.<br />

F. Cuzzolin, Multilinear modeling for robust identity recognition from gait, “Behavioral Biometrics for Human<br />

Identification: Intelligent Applications”, pp. 169-188, L. Wang and X. Geng Eds., IGI Publishing, 2010.<br />

F. Cuzzolin, Learning pullback manifolds of generative dynamical models for action recognition, IEEE<br />

Transactions on PAMI (2012, u/r).<br />

F. Cuzzolin, D. Mateus and R. Horaud, Robust coherent Laplacian protrusion segmentation along 3D<br />

sequences, International Journal of Computer Vision (2012, u/r).<br />

F. Cuzzolin, D. Mateus, D. Knossow, E. Boyer and R. Horaud, Coherent laplacian protrusion segmentation,<br />

Proc. of Computer Vision and Pattern Recognition (CVPR'08), pp. 1-8, June 2008.<br />

M. Sapienza, F. Cuzzolin and Ph. Torr, Learning discriminative space-time actions from weakly labelled<br />

videos (best poster prize recipient), INRIA Machine Learning Summer School, Grenoble, July 2012<br />

M. Sapienza, F. Cuzzolin and Ph. Torr, Learning discriminative space-time actions from weakly labelled<br />

videos, Proc. of the British Machine Vision Conference (BMVC'12), September 2012.<br />

M. Sapienza, F. Cuzzolin and Ph. Torr, Learning Fisher star models for action recognition in space-time<br />

videos, submitted to CVPR 2013.<br />

Track records of research and development or product development by principal developers<br />

The Artificial Intelligence and the Computer Vision groups are very active in computer vision and activity<br />

recognition in particular.<br />

Dr Cuzzolin has recently been awarded a 122K£ EPSRC First Grant for a project on “Tensorial modeling<br />

of dynamical systems for gait and activity recognition” which has received 6/6 reviews, and proposes a<br />

generative approach to action and identity recognition. He is in the process of submitting a a 200K£<br />

Leverhulme project on “Guessing plots for video googling” (strongly related to the current proposal).<br />

Also most relevant to the current proposal, Dr Fabio Cuzzolin and Professor Phil Torr have a joint pending<br />

EPSRC (the UK's Engineering and Physical Sciences Research Council) £ 650,000 (1.1M $) grant<br />

application on “Making action recognition work in the real world”. Dr Cuzzolin and Professor Helen<br />

Dawes have a joint pending £ 370,000 NIHR grant application on a project focussed on “Monitoring health<br />

conditions at home via Kinect”, which proposes to use action classification to monitor brain conditions in<br />

patients remotely. Contacts have been made with Magna International, the car component company, on<br />

the application of gait recognition for biometric purposes and smart vehicles able to gesturally interact with<br />

drivers and pedestrians.<br />

Dr Cuzzolin is preparing as the Coordinator a European Union collaborative 3 million euro (4 million dollar)<br />

STREP project on “Action Recognition for Video Management”, with Technicolor (France), INRIA<br />

TexMex (France), ETH Zurich (Switzerland) as partners, to submit to Horizon 2020.<br />

Professor Torr was involved in the startup company 2d3 (http://www.2d3.com/), part of the <strong>Oxford</strong> Metrics<br />

Group (OMG). Their first product, “boujou”, is used by special effects studios all over the world. Boujou is<br />

used to track the motion of the camera and allow for clean video insertion of objects, and has been used on<br />

the special effects of almost every major feature film in the last five years, including the “Harry Potter” and<br />

“Lord of the Rings” series. Prof. Torr has directly worked with the following companies based in the UK:<br />

2d3, Vicon Life, Yotta (http://www.yotta.tv/company/), Microsoft Research Europe, Sharp Laboratories,<br />

Sony Entertainments Europe, with contributions to commercial products appearing (or about to appear)<br />

with four of them. His work is currently in use in the film and game industry. His work with the <strong>Oxford</strong><br />

Metrics Group in a Knowledge Transfer Partnership 2005-9 won the 2009 National Best KTP of the year<br />

award, selected out of several hundred projects (http://www.ktponline.org.uk/awards2009/BestKTP.aspx).<br />

Professor Torr's segmentation work just appeared in Sony’s new flagship Christmas 2012 PS3 launch:<br />

“Wonderbook, Book of Spells”: http://www.brookes.ac.uk/business_employers/ktp/wonderbook/index_html.<br />

23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com


Request 69009 Page 10 of 11<br />

Prof. Torr has been the PI on several grants, several of which are related to the topic of this proposal. His<br />

EPSRC first grant (cash limited to 120K) was Markerless Motion Capture for Humans in Video<br />

(GR/T21790/01(P)) Oct 2004-Oct 2007, which has led to a large output of research, including four papers<br />

accepted as orals to the top vision conferences. His second EPSRC grant Automatic Generation of Content<br />

for 3D Displays EP/C006631/1, Nov 2005-May 2009, has led to a SIGGRAPH paper (and patent) as well as<br />

a paper prize at IEEE CVPR 2008 and at NIPS 2007, amongst others. The majority of these papers have<br />

been published on the main journals in the field: IJCV, JMLR, PAMI. The product arising from the<br />

SIGGRAPH paper (VideoTrace) has led to a spin off company.<br />

The groups have running KTPs (Knowledge Transfer Partnerships) with companies as diverse as<br />

HedgeVantage (a financial trading consultant company), Webmart (print brokerage), Sony Europe, VICON<br />

(the motion capture equipment company).<br />

Submitting Your <strong>Proposal</strong> (Please delete this section from your proposal document)<br />

READY TO SUBMIT?<br />

Overview<br />

QUESTIONS?<br />

All proposals should be submitted online at NineSights, the collaborative<br />

innovation community from NineSigma.<br />

• Already a member? Please login now.<br />

• Need to Register? Start here. Registration is free. You will be asked to<br />

agree to the NineSights Terms of Use as part of registration.<br />

Once you have logged in to NineSights:<br />

1. SAVE your completed proposal document to your computer<br />

2. CLICK HERE to open the RFP page<br />

3. Click the red RESPOND NOW button next to the RFP<br />

4. Enter a brief abstract on the submission form and attach your saved<br />

proposal and any supplemental files, then click SUBMIT<br />

5. Submitted proposals will appear on your Dashboard under Content. <strong>Proposal</strong>s<br />

are private – only you and NineSigma can view them online.<br />

View answers to<br />

Frequently Asked Questions<br />

Contact the Solution Provider Help Desk<br />

EMAIL: PhD@ninesigma.com<br />

PHONE: +1 216-283-3901<br />

Form Instructions (This page may be deleted from your proposal document)<br />

Your response is essentially an introduction to NineSigma’s client of who you are, your capabilities, and what type of<br />

possible solution you can offer. This is an initial opportunity to present your innovation for further discussion. Your<br />

response should be a non-enabling disclosure. Your response must not contain any confidential information or<br />

information that would enable someone else to replicate your invention without paying for it.<br />

Target Audience<br />

Your goal is to provide a compelling description of your proposed solution to trigger the interest of the Request<br />

sponsor’s decision makers and the people with the technical and business knowledge to make the final decision.<br />

NineSigma does not evaluate the technology in proposals or screen responses for our client. We do provide organized<br />

summaries of your capabilities as they compare to the Request specifications and the client’s evaluation criteria.<br />

<strong>Proposal</strong> Content<br />

Please insert your text below each heading in the form above. We recommend a 3-page limit, but you may use as<br />

many pages as necessary to present relevant and compelling information. In addition, you may delete this instruction<br />

page and any other italicized notes or make other customizations to the document to suit your needs.<br />

23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com


Request 69009 Page 11 of 11<br />

Our Client wants to learn about… Other Suggestions<br />

• WHAT your technology does and a general description of how<br />

it works (You may include a more detailed discussion if your<br />

intellectual property (IP) has been secured appropriately)<br />

• How your solution addresses the specifications in the Request<br />

• What differentiates your solution from others in the field<br />

− Unique aspects of your technology<br />

− How your solution overcomes drawbacks of other existing<br />

technologies<br />

• Performance or technical data (current or anticipated)<br />

• The readiness of your technology (e.g. at proof-of-concept<br />

phase, already in use, etc.)<br />

• IP you may have around the proposed technology<br />

• Who you are and the expertise of you and your team or<br />

organization with respect to the needs of the Request<br />

• What you need in order to continue the discussion or reveal<br />

the details of your solution. (e.g. confidentiality agreement)<br />

• Budget and timeline estimate for the initial phase or for other<br />

arrangements as appropriate<br />

−Consider the client’s funding amount (if listed) and<br />

your budget as starting points in the negotiation<br />

• Use the professional language of<br />

science/engineering/technology<br />

• Avoid jargon<br />

• Consider including photographs or a<br />

video clip if appropriate<br />

• Attach supplemental information<br />

(such as a resume, brochure, or<br />

publication) to the end of this<br />

document<br />

• OR you may upload up to 10<br />

supplemental files when submitting<br />

this proposal through our website<br />

Our Clients evaluate…<br />

• Partial solutions<br />

• <strong>Proposal</strong>s from collaborative teams<br />

• Statements of interest from<br />

government laboratories<br />

• <strong>Proposal</strong>s from outside the U.S.<br />

For additional guidance and suggestions, view our Guide to Writing a Compelling Non-Confidential <strong>Proposal</strong>.<br />

How <strong>Proposal</strong>s are Evaluated<br />

♦ Our client will use the information you provide to judge whether they should pursue more in-depth discussions,<br />

negotiations, or other arrangements directly with you. This initial evaluation requires about two months.<br />

♦ NineSigma will notify respondents if our client selected them for progression or not. If they were not selected, they<br />

may be able to receive feedback directly from the organization as to why they were not selected.<br />

♦ If selected for progression, the next step would be a conversation either with NineSigma or directly with the<br />

requesting organization to answer any outstanding questions.<br />

♦ If both parties wish to proceed, the requesting organization may initiate a contract such as a confidentiality<br />

agreement for further detailed discussion, a face-to-face meeting, or a submission of samples for evaluation.<br />

♦ The final step would be a contract establishing an official business relationship. This could include a supply<br />

agreement, licensing, a research contract, or a joint development agreement.<br />

23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!