Abstract book (pdf) - ICPR 2010
Abstract book (pdf) - ICPR 2010
Abstract book (pdf) - ICPR 2010
- TAGS
- abstract
- icpr
- icpr2010.org
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
done on MIT-BIH Arrhythmia database. The results have been examined and approved by medical doctors.<br />
13:30-16:30, Paper ThBCT9.28<br />
Crossmodal Matching of Speakers using Lip and Voice Features in Temporally Non-Overlapping Audio and Video<br />
Streams<br />
Roy, Anindya, Ec. Pol. Federale de Lausanne<br />
Marcel, Sebastien, Ec. Pol. Federale de Lausanne<br />
Person identification using audio (speech) and visual (facial appearance, static or dynamic) modalities, either independently<br />
or jointly, is a thoroughly investigated problem in pattern recognition. In this work, we explore a novel task : person identification<br />
in a cross-modal scenario, i.e., matching the speaker in an audio recording to the same speaker in a video recording,<br />
where the two recordings have been made during different sessions, using speaker specific information which is<br />
common to both the audio and video modalities. Several recent psychological studies have shown how humans can indeed<br />
perform this task with an accuracy significantly higher than chance. Here we propose two systems which can solve this<br />
task comparably well, using purely pattern recognition techniques. We hypothesize that such systems could be put to practical<br />
use in multimodal biometric and surveillance systems.<br />
13:30-16:30, Paper ThBCT9.29<br />
Image Parsing with a Three-State Series Neural Network Classifier<br />
Seyedhosseini Tarzjani, Seyed Mojtaba, Univ. of Utah<br />
Paiva, Antonio, Univ. of Utah<br />
Tasdizen, Tolga, Univ. of Utah<br />
We propose a three-state series neural network for effective propagation of context and uncertainty information for image<br />
parsing. The activation functions used in the proposed model have three states instead of the normal two states. This makes<br />
the neural network more flexible than the two-state neural network, and allows for uncertainty to be propagated through<br />
the stages. In other words, decisions about difficult pixels can be left for later stages which have access to more contextual<br />
information than earlier stages. We applied the proposed method to three different datasets and experimental results demonstrate<br />
higher performance of the three-state series neural network.<br />
13:30-16:30, Paper ThBCT9.30<br />
Pan-Sharpening using an Adaptive Linear Model<br />
Liu, Lining, Beihang Univ.<br />
Wang, Yiding, North China Univ. of Tech.<br />
Wang, Yunhong, Beihang Univ.<br />
Yu, Haiyan, Beihang Univ.<br />
In this paper, we propose an algorithm to synthesize high-resolution multispectral images by fusing panchromatic (Pan)<br />
images and multispectral (MS) images. The algorithm is based on an adaptive linear model, which is automatically estimated<br />
by least square fitting. In this model, a virtual difference band is appended to the MS to guarantee the correlation<br />
between the Pan and MS. Then, an iterative procedure is carried out to generate the fused images using steepest descent<br />
method. The efficiency of the presented technique is tested by performing pan-sharpening of IKONOS, Quick Bird, and<br />
Landsat-7 ETM+ datasets. Experimental results show that our method provides better fusion results than other methods.<br />
13:30-16:30, Paper ThBCT9.31<br />
A Study of Voice Source and Vocal Tract Filter based Features in Cognitive Load Classification<br />
Le, Phu, The Univ. of New South Wales<br />
Epps, Julien, The Univ. of New South Wales<br />
Choi, Eric, ational ICT Australia<br />
Ambikairajah, Eliathamby, The Univ. of New South Wales<br />
Speech has been recognized as an attractive method for the measurement of cognitive load. Previous approaches have<br />
used mel frequency cepstral coefficients (MFCCs) as discriminative features to classify cognitive load. The MFCCs contain<br />
information from both the voice source and the vocal tract, so that the individual contributions of each to cognitive load<br />
variation are unclear. This paper aims to extract speech features related to either the voice source or the vocal tract and use<br />
- 321 -