06.02.2013 Views

Abstract book (pdf) - ICPR 2010

Abstract book (pdf) - ICPR 2010

Abstract book (pdf) - ICPR 2010

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

done on MIT-BIH Arrhythmia database. The results have been examined and approved by medical doctors.<br />

13:30-16:30, Paper ThBCT9.28<br />

Crossmodal Matching of Speakers using Lip and Voice Features in Temporally Non-Overlapping Audio and Video<br />

Streams<br />

Roy, Anindya, Ec. Pol. Federale de Lausanne<br />

Marcel, Sebastien, Ec. Pol. Federale de Lausanne<br />

Person identification using audio (speech) and visual (facial appearance, static or dynamic) modalities, either independently<br />

or jointly, is a thoroughly investigated problem in pattern recognition. In this work, we explore a novel task : person identification<br />

in a cross-modal scenario, i.e., matching the speaker in an audio recording to the same speaker in a video recording,<br />

where the two recordings have been made during different sessions, using speaker specific information which is<br />

common to both the audio and video modalities. Several recent psychological studies have shown how humans can indeed<br />

perform this task with an accuracy significantly higher than chance. Here we propose two systems which can solve this<br />

task comparably well, using purely pattern recognition techniques. We hypothesize that such systems could be put to practical<br />

use in multimodal biometric and surveillance systems.<br />

13:30-16:30, Paper ThBCT9.29<br />

Image Parsing with a Three-State Series Neural Network Classifier<br />

Seyedhosseini Tarzjani, Seyed Mojtaba, Univ. of Utah<br />

Paiva, Antonio, Univ. of Utah<br />

Tasdizen, Tolga, Univ. of Utah<br />

We propose a three-state series neural network for effective propagation of context and uncertainty information for image<br />

parsing. The activation functions used in the proposed model have three states instead of the normal two states. This makes<br />

the neural network more flexible than the two-state neural network, and allows for uncertainty to be propagated through<br />

the stages. In other words, decisions about difficult pixels can be left for later stages which have access to more contextual<br />

information than earlier stages. We applied the proposed method to three different datasets and experimental results demonstrate<br />

higher performance of the three-state series neural network.<br />

13:30-16:30, Paper ThBCT9.30<br />

Pan-Sharpening using an Adaptive Linear Model<br />

Liu, Lining, Beihang Univ.<br />

Wang, Yiding, North China Univ. of Tech.<br />

Wang, Yunhong, Beihang Univ.<br />

Yu, Haiyan, Beihang Univ.<br />

In this paper, we propose an algorithm to synthesize high-resolution multispectral images by fusing panchromatic (Pan)<br />

images and multispectral (MS) images. The algorithm is based on an adaptive linear model, which is automatically estimated<br />

by least square fitting. In this model, a virtual difference band is appended to the MS to guarantee the correlation<br />

between the Pan and MS. Then, an iterative procedure is carried out to generate the fused images using steepest descent<br />

method. The efficiency of the presented technique is tested by performing pan-sharpening of IKONOS, Quick Bird, and<br />

Landsat-7 ETM+ datasets. Experimental results show that our method provides better fusion results than other methods.<br />

13:30-16:30, Paper ThBCT9.31<br />

A Study of Voice Source and Vocal Tract Filter based Features in Cognitive Load Classification<br />

Le, Phu, The Univ. of New South Wales<br />

Epps, Julien, The Univ. of New South Wales<br />

Choi, Eric, ational ICT Australia<br />

Ambikairajah, Eliathamby, The Univ. of New South Wales<br />

Speech has been recognized as an attractive method for the measurement of cognitive load. Previous approaches have<br />

used mel frequency cepstral coefficients (MFCCs) as discriminative features to classify cognitive load. The MFCCs contain<br />

information from both the voice source and the vocal tract, so that the individual contributions of each to cognitive load<br />

variation are unclear. This paper aims to extract speech features related to either the voice source or the vocal tract and use<br />

- 321 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!