Outline Proposal - Oxford Brookes University

Request Title: 

NineSigma Point of Contact: 

PROPOSAL FOR REQUEST #69009 

Video Analysis/Recognition of Human Activities 

S. Akutagawa 

Submission Date: 28 January, 2013 

Contact Information 

• Name of organization: Oxford Brookes University 

• Name of proposer(s): Dr Fabio Cuzzolin 

• Address Department of Computing and Communication 

Technologies, Wheatley campus 

• City, State, Zip: Oxford, OX33 1HX 

• Country: United Kingdom 

• Phone: +44 1865 484526 

• Email: fabio.cuzzolin@brookes.ac.uk 

• Direct Web Page Link: http://cms.brookes.ac.uk/staff/FabioCuzzolin/ 

• Additional Organization Information 

• Size : 

o University: Oxford Brookes University is a premier learning and teaching institution with 

an outstanding research record. We are widely acknowledged to be the UK's leading 

modern university, surpassing many older institutions in newspaper league tables. 

The university had in 2009 a total of some 18,000 students, of which 22% at postgraduate 

level http://www.brookes.ac.uk/about/facts/statistics and some 200 PhD students. 

o The Department of Computing and Communication Technologies has some 32 faculty 

members and around 30 PhD students. 

o The Computer Vision http://cms.brookes.ac.uk/research/visiongroup and the Artificial 

Intelligence http://cms.brookes.ac.uk/staff/FabioCuzzolin/index-ml.html research groups are 

the strongest in the Department, with a combined total of 6 members of staff and almost 20 

postdocs/PhD students 

• Years in operation: Oxford Brookes University began life as the Oxford School of Art in 1865, 

become a University in 1992. 

• Annual sales: turnover of 170.2 million pounds and operating surplus of 15.3 million in 2011 

http://www.brookes.ac.uk/about/structure/annual_accounts/accounts1011.pdf 

• Contract/joint development with large companies, if sharable (name of the companies, type of 

relationship, etc.): The AI and Vision groups have continuing links, including KTPs, with companies 

such as Sony Entertainment, VICON, Microsoft Research Europe, Yotta as well newly developed 

23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com 

Request format and graphics© Copyright 2013 NineSigma, Inc

Request 69009 Page 2 of 11 

KTPs with HedgeVantage (a financial consulting company), Webmart, and Magna International (the 

multinational car component company). See our websites for more info. 

• Other information (sponsors, award, etc.): See our websites or call us for more info. 

Submission Terms 

By placing an “X” in the box below, I verify that I am submitting only non-confidential information. Further I 

agree to notify NineSigma, should this proposal result in a transaction with NineSigma's customer. (This effort 

is to ensure proper record keeping). 

I agree to NineSigma’s submission terms x 

Please insert your text below each heading in the form below, expanding as needed. Additional guidelines for 

preparing your proposal are included on the last page of this document. 

23611 Chagrin Blvd., Suite 320, Cleveland, OH 44122 • 216-295-4800 • www.ninesigma.com


Title of Proposal: Locating and recognizing complex activities using part-based 

discriminative models 

Proposed Technical Approach 

Please include the following information by selecting options or giving brief description, possibly by 

using graph, figure, or drawing. Please append a copy of your paper pertaining to proposed 

technology. 

Category of proposed technology (please select a relevant one) 

Technology developed/optimized for monitoring people, in particular the localization and 

recognition of actions and complex activities by multiple actors 

Development stage (please select a relevant one) 

Under verification and improvement in both the lab and the field 

Overview of proposed technology 

(1) Analysis/recognition algorithm (what kind of human activities/behaviors can be analyzed in 

what processing steps): 

Action recognition is a hard problem for a number of reasons: #1 human motions possess a 

high degree of inherently variability, as quite distinct motions/gestures can carry the same 

meaning. As action models are typically learned from forcibly limited datasets (in terms of 

training videos and action classes) they have limited generalization power; #2 actions are 

subject to various nuisance or “covariate” factors, such as illumination, moving background, 

viewpoint, and many others (Figure 1). Tests have been often run in small controlled 

environments, while few attempts have been made to progress towards recognition “in the 

wild”; #3 detecting when and where an action takes place within a video is the first step in any 

action recognition framework: so far, however, the focus has largely been on the recognition of 

pre-segmented videos; #4 the presence of multiple actors (e.g., different players sitting in front 

of a single console) greatly complicates both localization and recognition; #5 a serious challenge 

arises when we move from simple, “atomic” actions to more sophisticated “activities”, series 

of elementary actions connected in a meaningful way common, for instance, in the smart home 

scenario. 

Figure 1: numerous nuisance factors, e.g. view, illumination, occlusions, multiple actors make 

the activity recognition problem hard. 

The most successful recent approaches, which mainly adopy kernel SVM classification of bags 

of local features (Figure 2), have reached their limits: only understanding the spatial and 

temporal structure of human activities can help us to successfully locate and recognize them in 

a robust and reliable way. 



Figure 2: BoF methods build histograms of frequencies of local video features: as any 

spatiotemporal relationship is lost, meaningless videos with almost the same histograms can be 

incorrectly recognized. 

Inspired by the successes of similar approaches in 2D object detection [20], we propose to 

represent human activities as spatio-temporal “objects” composed of distinct, coordinated 

“parts” (elementary actions). 

More specifically, instead of computing action descriptors on whole video clips (Figure 3 left), we 

do that for collections of space-time action parts associated with video subvolumes (middle); 

multiple instance learning (MIL) is used to learn which subvolumes are particularly 

discriminative of the action (solid-line green cubes), and which are not (dotted-line cubes); finally 

(right) a human action is represented as a “star model” of elementary BoF action parts. 

Figure 3: the proposed approach for learning and recognizing human activities as structured 

constellations of the most discriminative action parts. 

Step 1: Prior to modeling actions, video streams have to be processed to extract salient 

“features”, either frame by frame or from the entire spatio-temporal (S/T) volume which 

contains the action(s) of interest. A plethora of local video descriptors have been proposed for 

S/T volumes: Cuboid, 3D-SIFT, HoGHoF, HOG3D, extended SURF. Dense Trajectory 

Features, a combination of HoG-HoF with optical flow vectors and motion boundary histograms 

have been shown to outperformed all the other approaches. An appealing alternative to 

traditional video is provided by “range” (Time-of-Flight) cameras: feature extraction from 

range images and fusion of range and video features will be integral parts of this project. 

Step 2: from the local features extracted from each video subvolume, a Fisher vector 

representation is calculated, so that each subvolume is encoded by a single Fisher vector. 

Step 3: Multiple Instance Learning of the most discriminative (i.e. better characterizing an 

activity versus all the others) subvolumes. An initial “positive” model is learned by assuming 

that all examples in the positive bag (all the sub-volumes of the sequence) do contain the action 

at hand; a “negative” model is learned from the examples in the negative bags (videos labeled 

with a different action class). After an iterative process, only the most discriminative examples in 

each positive bag are retained. MIL reduces to a semi-convex optimisation problem, for which 



efficient heuristics exist [5]. The resulting model allows us to factor out the effect of common, 

shared context (similar background, common action elements). 

Step 4: Once the most discriminative action parts are learnt via MIL, we can construct tree-like 

ensembles of action parts (Figure 3 right) to use for both localization and classification of 

actions. Felzenszwalb and Huttenlocher have shown (in the object detection problem) that if the 

pictorial structure forms a star model, where each part is only connected to the root node, it is 

possible to compute the best match very efficiently by dynamic programming. Other 

approaches to building a constellation of discriminative parts have been proposed by Hoiem and 

Ramanan. Crucial will be the introduction of sparsity constraints in the Latent SVM semi-convex 

optimization problem proposed by Felzenszwalb to automatically identify the optimal number 

of parts. 

(2) Specifics in system configuration (please indicate required camera or system, if anything 

special is required as a basis of using the proposed algorithm): 

The approach is designed to work with both conventional cameras and range cameras, as in 

both cases a spatiotemporal volume can be constructed, from which the most discriminative 

parts can be learned and assembled in an overall model. A fusion of both would be pioneering 

work. 

(3) Applicability of the algorithm to versatile human activities (what should be overcome in 

applying algorithm developed for a specific human acitivity to any other human activities) 

The algorithm is being developed as general purpose: as such, it is designed to discriminate 

between any activities introduced in a training stage. In particular, it is explicitely designed to 

represent complex activities formed by a sequence of elementary actions; to cope with the 

presence of multiple actors/people; to localize the action of interested within a larger video in 

both space and time; to factor out the background (static or dynamic) in order to better 

discriminate different activities with common background or elementary components (i.e., parts 

in common). 

Current Performance (please answer to the following questions by showing a specific recognition 

task you have experienced so far as an example): 

(1) Recognition tasks/applications in brief (if proposers have experience in analyzing and 

recognizing one or some of the followings, please indicate those. If not, please briefly 

describe what kind of human activites proposers have experienced): 

The approach has been so far tested on most of the publicly available benchmarks for action 

recognition: 

The KTH dataset contains 6 action classes each performed by 25 actors, in four scenarios. 

People perform repetitive actions at different speeds and orientations. Sequences are longer 

when compared to the YouTube or the HMDB51 datasets, and contain clips in which the 

actors move in and out of the scene during the same sequence. 

The YouTube dataset contains 11 action categories and presents several challenges due to 

camera motion, object appearance, scale, viewpoint and cluttered backgrounds. The 1600 

video sequences are split into 25 groups, and we follow the author’s evaluation procedure of 

25-fold, leave-one-out cross validation. 

The Hollywood2 dataset contains 12 action classes collected from 69 different Hollywood 

movies. There are a total of 1707 action samples containing realistic, unconstrained human 

and camera motion. The dataset is divided into 823 training and 884 testing sequences, each 

from 5-25 seconds long. 

The HMDB dataset contains 51 action classes, with a total of 6849 video clips collected from 

movies, the Prelinger archive, YouTube and Google videos. Each action category contains a 

minimum of 101 clips. 



To quantify the respective algorithm performance, we report the accuracy (Acc), mean 

average precision (mAP), and mean F1 scores (mF1). 

(2) Recognition performance achieved so far: 

Figure 4. Left: preliminary localisation results (left) on a Hollywood2 video [45]. The colour of 

each box (subvolume) indicates the positive rank score of it belonging to the action class (red = 

high). In actioncliptest00058, a woman gets out of her car roughly around the middle of the 

video, as indicated by the detected subvolumes. Right: performance of MIL discriminative 

modelling (Step 3) with Dense Trajectory Features as features on the most common datasets, 

compared to the traditional BoF baseline. Even when using traditional feature, learning the most 

discriminative action parts via MIL much improve performance on challenging testbeds. 

Figure 5: performance of BoF global models with Fisher representation (Step 2) on the most 

common datasets, compared to the State of the Art. Note how accuracy and average precision 

(recognition rate) dramatically improve w.r.t. to previous approaches. 

(3) Latency to recognize specific human activity (how many seconds after the occurrence or 

specific human activities, can the activities be recognized by the algorithm): 

( ~2 ) sec for recognition on the KTH dataset: as features are computed from volumes frameper-seconds 

do not make much sense in our approach: anyway, the frame rate in all sequences 

is around 30fps. 

Computing the classification scores for 60,000 testing video instances (each 1000dim) on the 

KTH dataset takes 0.5 seconds on a standard laptop: this does not include feature computation 

and representation times, which can vary largely depending on choice of features, 

representation, classification methods, and pc hardware. 

(4) Possibility to predict the occurrence of specific human activities (please select a 

relevant one) 

Possible 



Though no such tests have been conducted yet, an online version of the algorithm can be 

imagined in which the recognition of elementary actions which can be part of a more complex 

activity is used to predict the likelihood of the latter happening before it actually takes place. 

(5) Necessary resolution of target objects in the image for proper analysis/recognition 

( 360 ) x ( 240 ) pixel in most cases (see above datasets descriptions); however, subsampling is 

normally used to reduce computational time, so that actual videos have even lower 

dimensionality. 

(6) Illumination on target objects for proper analysis/recognition (qualitative description is 

all right, if it is hard to quantify illumination level, e.g., can be used not only in bright but in 

somewhat dark rooms) 

As so far we have used state of the art benchmarks, rather than in-house datasets, it is hard to 

answer quantitatively this particular question. 

However, all the most recent tested datasets (Hollywood, Hollywood 2, YouTube, HMDB51) 

contain videos characterized by widely varying degrees of illumination: indeed,a feature of our 

approach is to build models invariant to nuisance factors such as illumination. 

Please have a look at the relevant web pages for a closer inspection: 

http://www.di.ens.fr/~laptev/actions/hollywood2/ 

http://www.cs.ucf.edu/~liujg/YouTube_Action_dataset.html 

Especially HMDB 

http://serre-lab.clps.brown.edu/resources/HMDB/ 

contains very dark as well as very bright sequences. 

Development plan for establishing recognition technology for any of the followings or alike 

(Getting out of or falling off a bed / falling down / breathing / becoming feverish / having a fit or 

cough / having convulsions / experiencing pain / choking or difficulty swallowing )) 

Human activites already tackled (if recognition of any of the above human activities have 

been already realized, please indicate them, or briefly describe them): 

The activities specified above have not explicitely tackled to date, as we focussed so far on 

the most common publicly available datasets. However, as our approach is inherently 

flexible (part-based) and learns to discriminate from a training set we do not foresee 

difficulties in tackling a different set of action classes. 

Development plan and challenges to be overcome if recognition of any of abovementioned 

human activites will be newly tried 

Datasets focussed on the activities of interest to you are being search by our PhD student 

Sapienza. In case the search's outcome is not positive, we propose to collect our own 

testbed via both traditional and range cameras (e.g. the Kinect device) in our possession. 

Range cameras are particularly attractive for indoor scenarios, and two separate efforts to 

perform gesture/exercise recognition via Kinect for clinical purposes (which led to a pending 

NIHR i4i grant application) and for exercising (by a Msc student) are being pursued by as at 

the current time. 

Pose estimation algos from range data already exist, though more customized solutions can 

be studied. After mapping activities to series of human poses the above techniques can be 

applied to recognize them. 

Understanding in privacy issues in installing camera based surveillance system for 

monitoring people, or a network with organizations having such capability, if any 



Privacy and Intellectual Property are managed centrally by Oxford Brookes University via 

RBDO (Research and Business Development Office). 

As for organizations with expertise in the surveillance systems, the group has links with 

HMGCC (http://www.hmgcc.gov.uk/), the UK government centre of excellence whose 

aim is to design and develop secure communication systems, hardware and software 

specifically for HM Government use, both at home and overseas. 

Proposed Budget and Conditions 

Preferred style of collaboration and proposed conditions 

For Phase 1 (validation) we believe the best strategy is to hire a Research Assistant for 1-2 years, 

which will work full time on the project to complete all the steps (1-4) of the algorithm (in 

collaboration with our Ph.D. student Michael Sapienza), and tune it towards the actions/activities 

specified by the client and the smart home scenario. In this perspective it makes sense to explore 

range camera technologies, as we are doing in the medical monitoring application (see below). 

The naked cost of a member of staff is 10,000-14000 pounds a year for an RA (postgraduate 

student), 27-30,000 pounds for a postdoctoral researcher. Indirect costs need to be added for about 

150% of the naked cost. 

Equipment is already in possession of the group in terms of computer clusters, range and 

traditional cameras, but possibly a few thousand pounds will be needed in this sense. 

For Phase 2 (commercialization) the most suitable scheme is probably that of a Knowledge Transfer 

Partnership (KTP), on which the groups and Oxford Brookes in general have a strong background. 

A KTP typically lasts for 2 years: each Associate has a total cost of some 75,000 pounds a year (of 

which 75% funded by the UK government, only 25% in charge of the industrial partner) – one or two 

Associates can be requested depending on the scale and complexity of the project. 

Status of Intellectual Property of proposed technology and the organizational policy regarding technology 

transfer, licencing, etc. 

At the present time the technology is being developed as a research project, so as such it has 

not been patented yet, though we plan to proceed in that direction given the very encouraging 

performances. 

Intellectual Property are managed centrally by Oxford Brookes University via RBDO (Research 

and Business Development Office) which has access to financial and other resources to 

enable Intellectual Property and its commercial exploitation to be effectively managed, whilst 

maximising the widespread dissemination of the research results. This includes finance for 

patenting and proof of concept funding; Intellectual Property, technology and market 

assessment; resources for defining and implementing a commercialisation strategy though 

licensing, start-up company or other routes. 

RBDO has a strong track record of commercialisation of its intellectual property. Income from 

licences was £1.5M in 2011 and this ranks the University in the top 10 in the UK of all 

universities for royalty income. OBU through RBDO holds a total portfolio of 20 patents. The 

University also supports the creation of spin out companies when appropriate and has some 

successful examples. In particular, Oxford Brookes is extremely active in the field of 

Knowledge Transfer Partnerships: the Artificial Intelligence group has two upcoming KTPs 

concerning machine learning techniques for trading and pricing, while the Computer Vision group 

has won the 2009 National KTP Award selected over hundreds of project by the Technology 

Strategy board. 

Proposal Team Experience 


Please include the following if applicable 


Selected articles/journal publications, patents, etc. related to proposed technology 

F. Cuzzolin, Using bilinear models for view-invariant action and identity recognition, Proc. of Computer 

Vision and Pattern Recognition (CVPR'06), 1701-1708, 2006. 

F. Cuzzolin, Multilinear modeling for robust identity recognition from gait, “Behavioral Biometrics for Human 

Identification: Intelligent Applications”, pp. 169-188, L. Wang and X. Geng Eds., IGI Publishing, 2010. 

F. Cuzzolin, Learning pullback manifolds of generative dynamical models for action recognition, IEEE 

Transactions on PAMI (2012, u/r). 

F. Cuzzolin, D. Mateus and R. Horaud, Robust coherent Laplacian protrusion segmentation along 3D 

sequences, International Journal of Computer Vision (2012, u/r). 

F. Cuzzolin, D. Mateus, D. Knossow, E. Boyer and R. Horaud, Coherent laplacian protrusion segmentation, 

Proc. of Computer Vision and Pattern Recognition (CVPR'08), pp. 1-8, June 2008. 

M. Sapienza, F. Cuzzolin and Ph. Torr, Learning discriminative space-time actions from weakly labelled 

videos (best poster prize recipient), INRIA Machine Learning Summer School, Grenoble, July 2012 

M. Sapienza, F. Cuzzolin and Ph. Torr, Learning discriminative space-time actions from weakly labelled 

videos, Proc. of the British Machine Vision Conference (BMVC'12), September 2012. 

M. Sapienza, F. Cuzzolin and Ph. Torr, Learning Fisher star models for action recognition in space-time 

videos, submitted to CVPR 2013. 

Track records of research and development or product development by principal developers 

The Artificial Intelligence and the Computer Vision groups are very active in computer vision and activity 

recognition in particular. 

Dr Cuzzolin has recently been awarded a 122K£ EPSRC First Grant for a project on “Tensorial modeling 

of dynamical systems for gait and activity recognition” which has received 6/6 reviews, and proposes a 

generative approach to action and identity recognition. He is in the process of submitting a a 200K£ 

Leverhulme project on “Guessing plots for video googling” (strongly related to the current proposal). 

Also most relevant to the current proposal, Dr Fabio Cuzzolin and Professor Phil Torr have a joint pending 

EPSRC (the UK's Engineering and Physical Sciences Research Council) £ 650,000 (1.1M $) grant 

application on “Making action recognition work in the real world”. Dr Cuzzolin and Professor Helen 

Dawes have a joint pending £ 370,000 NIHR grant application on a project focussed on “Monitoring health 

conditions at home via Kinect”, which proposes to use action classification to monitor brain conditions in 

patients remotely. Contacts have been made with Magna International, the car component company, on 

the application of gait recognition for biometric purposes and smart vehicles able to gesturally interact with 

drivers and pedestrians. 

Dr Cuzzolin is preparing as the Coordinator a European Union collaborative 3 million euro (4 million dollar) 

STREP project on “Action Recognition for Video Management”, with Technicolor (France), INRIA 

TexMex (France), ETH Zurich (Switzerland) as partners, to submit to Horizon 2020. 

Professor Torr was involved in the startup company 2d3 (http://www.2d3.com/), part of the Oxford Metrics 

Group (OMG). Their first product, “boujou”, is used by special effects studios all over the world. Boujou is 

used to track the motion of the camera and allow for clean video insertion of objects, and has been used on 

the special effects of almost every major feature film in the last five years, including the “Harry Potter” and 

“Lord of the Rings” series. Prof. Torr has directly worked with the following companies based in the UK: 

2d3, Vicon Life, Yotta (http://www.yotta.tv/company/), Microsoft Research Europe, Sharp Laboratories, 

Sony Entertainments Europe, with contributions to commercial products appearing (or about to appear) 

with four of them. His work is currently in use in the film and game industry. His work with the Oxford 

Metrics Group in a Knowledge Transfer Partnership 2005-9 won the 2009 National Best KTP of the year 

award, selected out of several hundred projects (http://www.ktponline.org.uk/awards2009/BestKTP.aspx). 

Professor Torr's segmentation work just appeared in Sony’s new flagship Christmas 2012 PS3 launch: 

“Wonderbook, Book of Spells”: http://www.brookes.ac.uk/business_employers/ktp/wonderbook/index_html. 



Prof. Torr has been the PI on several grants, several of which are related to the topic of this proposal. His 

EPSRC first grant (cash limited to 120K) was Markerless Motion Capture for Humans in Video 

(GR/T21790/01(P)) Oct 2004-Oct 2007, which has led to a large output of research, including four papers 

accepted as orals to the top vision conferences. His second EPSRC grant Automatic Generation of Content 

for 3D Displays EP/C006631/1, Nov 2005-May 2009, has led to a SIGGRAPH paper (and patent) as well as 

a paper prize at IEEE CVPR 2008 and at NIPS 2007, amongst others. The majority of these papers have 

been published on the main journals in the field: IJCV, JMLR, PAMI. The product arising from the 

SIGGRAPH paper (VideoTrace) has led to a spin off company. 

The groups have running KTPs (Knowledge Transfer Partnerships) with companies as diverse as 

HedgeVantage (a financial trading consultant company), Webmart (print brokerage), Sony Europe, VICON 

(the motion capture equipment company). 

Submitting Your Proposal (Please delete this section from your proposal document) 

READY TO SUBMIT? 

Overview 

QUESTIONS? 

All proposals should be submitted online at NineSights, the collaborative 

innovation community from NineSigma. 

• Already a member? Please login now. 

• Need to Register? Start here. Registration is free. You will be asked to 

agree to the NineSights Terms of Use as part of registration. 

Once you have logged in to NineSights: 

1. SAVE your completed proposal document to your computer 

2. CLICK HERE to open the RFP page 

3. Click the red RESPOND NOW button next to the RFP 

4. Enter a brief abstract on the submission form and attach your saved 

proposal and any supplemental files, then click SUBMIT 

5. Submitted proposals will appear on your Dashboard under Content. Proposals 

are private – only you and NineSigma can view them online. 

View answers to 

Frequently Asked Questions 

Contact the Solution Provider Help Desk 

EMAIL: PhD@ninesigma.com 

PHONE: +1 216-283-3901 

Form Instructions (This page may be deleted from your proposal document) 

Your response is essentially an introduction to NineSigma’s client of who you are, your capabilities, and what type of 

possible solution you can offer. This is an initial opportunity to present your innovation for further discussion. Your 

response should be a non-enabling disclosure. Your response must not contain any confidential information or 

information that would enable someone else to replicate your invention without paying for it. 

Target Audience 

Your goal is to provide a compelling description of your proposed solution to trigger the interest of the Request 

sponsor’s decision makers and the people with the technical and business knowledge to make the final decision. 

NineSigma does not evaluate the technology in proposals or screen responses for our client. We do provide organized 

summaries of your capabilities as they compare to the Request specifications and the client’s evaluation criteria. 

Proposal Content 

Please insert your text below each heading in the form above. We recommend a 3-page limit, but you may use as 

many pages as necessary to present relevant and compelling information. In addition, you may delete this instruction 

page and any other italicized notes or make other customizations to the document to suit your needs. 



Our Client wants to learn about… Other Suggestions 

• WHAT your technology does and a general description of how 

it works (You may include a more detailed discussion if your 

intellectual property (IP) has been secured appropriately) 

• How your solution addresses the specifications in the Request 

• What differentiates your solution from others in the field 

− Unique aspects of your technology 

− How your solution overcomes drawbacks of other existing 

technologies 

• Performance or technical data (current or anticipated) 

• The readiness of your technology (e.g. at proof-of-concept 

phase, already in use, etc.) 

• IP you may have around the proposed technology 

• Who you are and the expertise of you and your team or 

organization with respect to the needs of the Request 

• What you need in order to continue the discussion or reveal 

the details of your solution. (e.g. confidentiality agreement) 

• Budget and timeline estimate for the initial phase or for other 

arrangements as appropriate 

−Consider the client’s funding amount (if listed) and 

your budget as starting points in the negotiation 

• Use the professional language of 

science/engineering/technology 

• Avoid jargon 

• Consider including photographs or a 

video clip if appropriate 

• Attach supplemental information 

(such as a resume, brochure, or 

publication) to the end of this 

document 

• OR you may upload up to 10 

supplemental files when submitting 

this proposal through our website 

Our Clients evaluate… 

• Partial solutions 

• Proposals from collaborative teams 

• Statements of interest from 

government laboratories 

• Proposals from outside the U.S. 

For additional guidance and suggestions, view our Guide to Writing a Compelling Non-Confidential Proposal. 

How Proposals are Evaluated 

♦ Our client will use the information you provide to judge whether they should pursue more in-depth discussions, 

negotiations, or other arrangements directly with you. This initial evaluation requires about two months. 

♦ NineSigma will notify respondents if our client selected them for progression or not. If they were not selected, they 

may be able to receive feedback directly from the organization as to why they were not selected. 

♦ If selected for progression, the next step would be a conversation either with NineSigma or directly with the 

requesting organization to answer any outstanding questions. 

♦ If both parties wish to proceed, the requesting organization may initiate a contract such as a confidentiality 

agreement for further detailed discussion, a face-to-face meeting, or a submission of samples for evaluation. 

♦ The final step would be a contract establishing an official business relationship. This could include a supply 

agreement, licensing, a research contract, or a joint development agreement.

Outline Proposal - Oxford Brookes University

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?