Abstract book (pdf) - ICPR 2010
Abstract book (pdf) - ICPR 2010
Abstract book (pdf) - ICPR 2010
- TAGS
- abstract
- icpr
- icpr2010.org
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
16:00-16:20, Paper TuCT6.2<br />
Modeling Syllable-Based Pronunciation Variation for Accented Mandarin Speech Recognition<br />
Zhang, Shilei, IBM Res.<br />
Shi, Qin, IBM Res. – China<br />
Qin, Yong, IBM Res. – China<br />
Pronunciation variation is a natural and inevitable phenomenon in an accented Mandarin speech recognition application.<br />
In this paper, we integrate knowledge-based and data-driven approaches together for syllable-based pronunciation variation<br />
modeling to improve the performance of Mandarin speech recognition system for speakers with Southern accent. First,<br />
we generate the syllable-based pronunciation variation rules of Southern accent observed from the training corpus by Chinese<br />
linguistic expert. Second, dictionary augmentation with multiple pronunciation variants and pronunciation probability<br />
derived from forced alignment statistics of training data. The acoustic models will be retrained based on the new expansion<br />
dictionary. Finally, pronunciation variation adaptation will be performed to further fit the data on the decoding stage by<br />
taking distribution of variation rules clusters of testing set into account. The experimental results show that the proposed<br />
method provides a flexible framework to improve the recognition performance for accented speech effectively.<br />
16:20-16:40, Paper TuCT6.3<br />
Automatic Pronunciation Transliteration for Chinese-English Mixed Language Keyword Spotting<br />
Zhang, Shilei, IBM Res.<br />
Shuang, Zhiwei, IBM Res. – China<br />
Qin, Yong, IBM Res. – China<br />
This paper presents automatic pronunciation transliteration method with acoustic and contextual analysis for Chinese-<br />
English mixed language keyword spotting (KWS) system. More often, we need to develop robust Chinese-English mixed<br />
language spoken language technology without Chinese accented English acoustic data. In this paper, we exploit pronunciation<br />
conversion method based on syllable-based characteristic analysis of pronunciation and data-driven phoneme pairs<br />
mappings to solve mixed language problem by only using well-trained Chinese models. One obvious advantage of such<br />
method is that it provides a flexible framework to implement the pronunciation conversion of English keywords to Chinese<br />
automatically. The efficiency of the proposed method was demonstrated under KWS task on mixed language database.<br />
16:40-17:00, Paper TuCT6.4<br />
Learning Virtual HD Model for Bi-Model Emotional Speaker Recognition<br />
Huang, Ting, Zhejiang Univ.<br />
Yang, Yingchun, Zhejiang Univ.<br />
Pitch mismatch between training and testing is one of the important factors causing the performance degradation of the<br />
speaker recognition system. In this paper, we adopted the missing feature theory and specified the Unreliable Region (UR)<br />
as the parts of the utterance with high emotion induced pitch variation. To model these regions, a virtual HD (High Different<br />
from neutral, with large pitch offset) model for each target speaker was built from the virtual speech, which were converted<br />
from the neutral speech by the Pitch Transformation Algorithm (PTA). In the PTA, a polynomial transformation function<br />
was learned to model the relationship of the average pitch between the neutral and the high-pitched utterances. Compared<br />
with traditional GMM-UBM and our previous method, our new method obtained 1.88% and 0.84% identification rate<br />
(IR) increase on the MASC respectively, which are promising results.<br />
17:00-17:20, Paper TuCT6.5<br />
Role of Synthetically Generated Samples on Speech Recognition in a Resource-Scarce Language<br />
Chakraborty, Rupayan, St. Thomas’ Coll. of Eng. & Tech.<br />
Garain, Utpal, Indian Statistical Inst.<br />
Speech recognition systems that make use of statistical classifiers require a large number of training samples. However,<br />
collection of real samples has always been a difficult problem due to the involvement of substantial amount of human intervention<br />
and cost. Considering this problem, this paper presents a novel method for generating synthetic samples from<br />
a handful of real samples and investigates the role of these samples in designing a speech recognition system. Speaker dependent<br />
limited vocabulary isolated word recognition in an Indian language (i.e. Bengali) has been taken a reference to<br />
demonstrate the potential of the proposed framework. The role of synthetic samples is demonstrated by showing a significant<br />
improvement in recognition accuracy. A maximum improvement of 10% is achieved using the proposed approach.<br />
- 127 -