06.02.2013 Views

Abstract book (pdf) - ICPR 2010

Abstract book (pdf) - ICPR 2010

Abstract book (pdf) - ICPR 2010

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

16:00-16:20, Paper TuCT6.2<br />

Modeling Syllable-Based Pronunciation Variation for Accented Mandarin Speech Recognition<br />

Zhang, Shilei, IBM Res.<br />

Shi, Qin, IBM Res. – China<br />

Qin, Yong, IBM Res. – China<br />

Pronunciation variation is a natural and inevitable phenomenon in an accented Mandarin speech recognition application.<br />

In this paper, we integrate knowledge-based and data-driven approaches together for syllable-based pronunciation variation<br />

modeling to improve the performance of Mandarin speech recognition system for speakers with Southern accent. First,<br />

we generate the syllable-based pronunciation variation rules of Southern accent observed from the training corpus by Chinese<br />

linguistic expert. Second, dictionary augmentation with multiple pronunciation variants and pronunciation probability<br />

derived from forced alignment statistics of training data. The acoustic models will be retrained based on the new expansion<br />

dictionary. Finally, pronunciation variation adaptation will be performed to further fit the data on the decoding stage by<br />

taking distribution of variation rules clusters of testing set into account. The experimental results show that the proposed<br />

method provides a flexible framework to improve the recognition performance for accented speech effectively.<br />

16:20-16:40, Paper TuCT6.3<br />

Automatic Pronunciation Transliteration for Chinese-English Mixed Language Keyword Spotting<br />

Zhang, Shilei, IBM Res.<br />

Shuang, Zhiwei, IBM Res. – China<br />

Qin, Yong, IBM Res. – China<br />

This paper presents automatic pronunciation transliteration method with acoustic and contextual analysis for Chinese-<br />

English mixed language keyword spotting (KWS) system. More often, we need to develop robust Chinese-English mixed<br />

language spoken language technology without Chinese accented English acoustic data. In this paper, we exploit pronunciation<br />

conversion method based on syllable-based characteristic analysis of pronunciation and data-driven phoneme pairs<br />

mappings to solve mixed language problem by only using well-trained Chinese models. One obvious advantage of such<br />

method is that it provides a flexible framework to implement the pronunciation conversion of English keywords to Chinese<br />

automatically. The efficiency of the proposed method was demonstrated under KWS task on mixed language database.<br />

16:40-17:00, Paper TuCT6.4<br />

Learning Virtual HD Model for Bi-Model Emotional Speaker Recognition<br />

Huang, Ting, Zhejiang Univ.<br />

Yang, Yingchun, Zhejiang Univ.<br />

Pitch mismatch between training and testing is one of the important factors causing the performance degradation of the<br />

speaker recognition system. In this paper, we adopted the missing feature theory and specified the Unreliable Region (UR)<br />

as the parts of the utterance with high emotion induced pitch variation. To model these regions, a virtual HD (High Different<br />

from neutral, with large pitch offset) model for each target speaker was built from the virtual speech, which were converted<br />

from the neutral speech by the Pitch Transformation Algorithm (PTA). In the PTA, a polynomial transformation function<br />

was learned to model the relationship of the average pitch between the neutral and the high-pitched utterances. Compared<br />

with traditional GMM-UBM and our previous method, our new method obtained 1.88% and 0.84% identification rate<br />

(IR) increase on the MASC respectively, which are promising results.<br />

17:00-17:20, Paper TuCT6.5<br />

Role of Synthetically Generated Samples on Speech Recognition in a Resource-Scarce Language<br />

Chakraborty, Rupayan, St. Thomas’ Coll. of Eng. & Tech.<br />

Garain, Utpal, Indian Statistical Inst.<br />

Speech recognition systems that make use of statistical classifiers require a large number of training samples. However,<br />

collection of real samples has always been a difficult problem due to the involvement of substantial amount of human intervention<br />

and cost. Considering this problem, this paper presents a novel method for generating synthetic samples from<br />

a handful of real samples and investigates the role of these samples in designing a speech recognition system. Speaker dependent<br />

limited vocabulary isolated word recognition in an Indian language (i.e. Bengali) has been taken a reference to<br />

demonstrate the potential of the proposed framework. The role of synthetic samples is demonstrated by showing a significant<br />

improvement in recognition accuracy. A maximum improvement of 10% is achieved using the proposed approach.<br />

- 127 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!