11.12.2012 Views

D2.1 Requirements and Specification - CORBYS

D2.1 Requirements and Specification - CORBYS

D2.1 Requirements and Specification - CORBYS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>D2.1</strong> <strong>Requirements</strong> <strong>and</strong> <strong>Specification</strong><br />

impact assessment or threat refinement (level 3), <strong>and</strong> lastly process refinement (level 4). This model,<br />

however, does not incorporate human-in-the-loop, <strong>and</strong> another level called user refinement (level 5) has been<br />

proposed to “delineate the human from the machine in the process refinement” allowing for the human to play<br />

an important role in the fusion process. (Blasch <strong>and</strong> Plano, 2002)<br />

• Level 0: Sub-object assessment<br />

• Level 1: Object assessment<br />

• Level 2: Situation assessment<br />

• Level 3: Impact assessment<br />

• Level 4: Process refinement<br />

• Level 5: User refinement<br />

Level 0 <strong>and</strong> 1 deal with sub-object <strong>and</strong> object assessment respectively, making use of information from<br />

multiple sources to arrive at a representation of objects of interest in the environment. In level 2, a<br />

relationship between the identified objects is established. At the end of this level, once the situation<br />

assessment process is complete, the system has achieved situation awareness as the objects detected in the<br />

environment <strong>and</strong> the various ways in which they are related or connected with each other is known by the<br />

system. Level 3 allows the system to predict effects of actions or situations on the environment. Level 4<br />

attempts to refine the outcomes of levels 1, 2 <strong>and</strong> 3.<br />

Corradini et al. (2005) list a number of approaches <strong>and</strong> architectures for multimodal fusion in multimodal<br />

systems such as carrying out multimodal fusion in a maximum likelihood estimation framework, using<br />

distributed agent architectures (e.g. Open Agent Architecture OAA (Cheyer <strong>and</strong> Martin, 2001)) with intraagent<br />

communication taking place through a blackboard, identification of individuals via “physiological<br />

<strong>and</strong>/or behavioural characteristics” e.g. biometric security systems using fingerprints, iris, face, voice, h<strong>and</strong><br />

shape etc. (Corradini et al. 2005). It is stated by Corradini et al (2005) that modality fusion in such systems<br />

involve less complicated processing as they fall largely under a “pattern recognition framework” <strong>and</strong> that this<br />

process may use techniques for integrating “biometric traits” (Corradini et al. 2005) such as the weighted sum<br />

rule as in Wang et al. (2003), Fisher discriminant analysis (Wang et al. 2003), decision trees (Ross <strong>and</strong> Jain,<br />

2003), decision fusion scheme (Jain et al, 1999) etc.<br />

Corradini et al (2005) also list a number of systems fusing speech <strong>and</strong> lip movements such as using<br />

histograms <strong>and</strong> multivariate Gaussians (Nock et al. 2002), artificial neural networks (Wolff et al. 1994; Meier<br />

et al. 2000) <strong>and</strong> hidden Markov models (Nock et al. 2002).<br />

Some systems use independent individual modality processing modules such as speech recognition modules,<br />

gesture recognition module, gaze localisation etc. Each module carries out mono-modal processing <strong>and</strong><br />

presents the output to the multimodal processing module which h<strong>and</strong>les the semantic fusion. These systems<br />

are ideal for introducing a shelf framework where various showcases may be developed for different<br />

application domains applying re-usable off-the-shelf components each h<strong>and</strong>ling a single modality in full.<br />

Other systems include “quick set” (L<strong>and</strong>ragin, 2007) which offers the user the freedom to interact with a mapbased<br />

application using a pen-<strong>and</strong>-speech cross-modal input capability. The system presented in Elting (2002)<br />

enables the user to specify a comm<strong>and</strong> by way of speech, pointing gesture <strong>and</strong> the input from a graphical user<br />

interface into a “pipelined architecture”. The system put forward by Wahlster et al (2001) is a multimodal<br />

115

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!