06.02.2013 Views

Abstract book (pdf) - ICPR 2010

Abstract book (pdf) - ICPR 2010

Abstract book (pdf) - ICPR 2010

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Vector Machines. Results show that Likelihood Space Classification improves the performance (91.76%) of Maximum<br />

Likelihood Classification (79.1%). Thereafter, we introduce the concept of fusion in the likelihood space, which is shown<br />

to outperform the typically used model-level fusion, attaining a classification accuracy of 94.01% and further improving<br />

all previous results.<br />

09:00-11:10, Paper ThAT9.12<br />

Improved Mandarin Keyword Spotting using Confusion Garbage Model<br />

Zhang, Shilei, IBM Res., China<br />

Shuang, Zhiwei, IBM Res., China<br />

Shi, Qin, IBM Res., China<br />

Qin, Yong, IBM Res., China<br />

This paper presents an improved acoustic keyword spotting (KWS) algorithm using a novel confusion garbage model in<br />

Mandarin conversational speech. Observing the KWS corpus, we found there are many words with similar pronunciation<br />

with predefined keywords, although they have different Chinese characters and different meanings, which easily result in<br />

high false alarm rate. In this paper, an improved acoustic KWS method with confusion garbage models was developed<br />

that absorbs similar pronunciation words confused with specific keywords for a given task. One obvious advantage of<br />

such method is that it provides a flexible framework to implement the selection procedure and reduce false alarm rate effectively<br />

for a specific task. The efficiency of the proposed architecture was evaluated under HMM-based confidence<br />

measures (CM) methods and demonstrated on a conversational telephone dataset.<br />

09:00-11:10, Paper ThAT9.13<br />

Human Activity Recognition using Local Shape Descriptors<br />

Venkatesha, Sharath, Univ. of California, Santa Barbara<br />

Turk, Matthew, Univ. of California, Santa Barbara<br />

We propose a method for human activity recognition in videos, based on shape analysis. We define local shape descriptors<br />

for interest points on the detected contour of the human action and build an action descriptor using a Bag of Features<br />

method. We also use the temporal relation among matching interest points across successive video frames. Further, an<br />

SVM is trained on these action descriptors to classify the activity in the scene. The method is invariant to the length of the<br />

video sequence, and hence it is suitable in online activity recognition. We have demonstrated the results on an action database<br />

consisting of nine actions like walk, jump, bend, etc., by twenty people, in indoor and outdoor scenarios. The proposed<br />

method achieves an accuracy of 87%, and is comparable to other state-of-the-art methods.<br />

09:00-11:10, Paper ThAT9.14<br />

Use of Line Spectral Frequencies for Emotion Recognition from Speech<br />

Bozkurt, Elif, Koc Univ.<br />

Erzin, Engin, Koc Univ.<br />

Eroglu Erdem, Cigdem, Bahcesehir Univ.<br />

Erdem, Arif Tanju, Ozyegin Univ.<br />

We propose the use of the line spectral frequency (LSF) features for emotion recognition from speech, which have not<br />

been been previously employed for emotion recognition to the best of our knowledge. Spectral features such as mel-scaled<br />

cepstral coefficients have already been successfully used for the parameterization of speech signals for emotion recognition.<br />

The LSF features also offer a spectral representation for speech, moreover they carry intrinsic information on the formant<br />

structure as well, which are related to the emotional state of the speaker [4]. We use the Gaussian mixture model (GMM)<br />

classifier architecture, that captures the static color of the spectral features. Experimental studies performed over the Berlin<br />

Emotional Speech Database and the FAU Aibo Emotion Corpus demonstrate that decision fusion configurations with LSF<br />

features bring a consistent improvement over the MFCC based emotion classification rates.<br />

09:00-11:10, Paper ThAT9.15<br />

Spatially Regularized Common Spatial Patterns for EEG Classification<br />

Lotte, Fabien, Inst. for Infocomm Res.<br />

Guan, Cuntai, Inst. for Infocomm Res.<br />

- 267 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!