Automatic Recognition of Cantonese-English Code ... - Aclclp.org.tw

More documents

Recommendations

Info

282 Joyce Y. C. Chan et al. systems or sub-systems”. Different combinations of languages are found in code-switching, for examples, Spanish-English in United States, German-Italian and French-Italian in Switzerland, and Hebrew-English in Israel (Auer, 1998). In Taiwan, code-switching between Chinese dialects, namely Mandarin and Taiwanese, has become common in recent years (Chen, 2004). Hong Kong is an international city where many people, especially the younger generation, are Cantonese and English bilinguals. English words are frequently embedded into spoken Cantonese. The switching of language tends to be intra-sentential, and it rarely involves linguistic units above the clause level. Hence, the term code-mixing is usually preferred (Li, 2000). In this case, Cantonese is the primary language, also known as the matrix language, and English is the secondary language, usually referred to as the embedded language (Halmari, 1997). Automatic speech recognition (ASR) is one of the key technologies in spoken language processing. An ASR system converts an input speech waveform into a sequence of words. Recently, ASR for multilingual applications has attracted great interest (Schultz & Kirchhoff, 2006). In state-of-the-art ASR systems, the input speech is assumed to contain only one language and the language identity is given. These systems are not able to handle code-mixing speech, which differs significantly from monolingual speech spoken by native speakers. This calls for special consideration in the design of acoustic models, lexical and language models, and in the decoding algorithm. There have been two different approaches to code-switching or code-mixing speech recognition (Lyu et al., 2006; Chan et al., 2006). The first approach involves a language boundary detection (LBD) algorithm that divides the input utterance into language-homogeneous segments. The language identity of each segment is determined, and the respective monolingual speech recognizer is applied. LBD for mixed-language utterances was studied by Wu et al. (2006) and Chan et al. (2004). Language-specific phonological and acoustic properties were used as the primary cues to identify the languages. The second approach aims to develop a cross-lingual speech recognition system, which can handle multiple languages in a single utterance. The acoustic models, language models, and pronunciation dictionary are designed to be multi-lingual and cover all languages concerned. In Lyu et al. (2006), automatic recognition of Mandarin-Taiwanese code-switching speech was investigated. It was found that Mandarin and Taiwanese, both of which are Chinese dialects, share a large percentage of lexicon items. Their grammar was also assumed to be similar. A one-pass recognition algorithm was developed using a character-based search net. It was shown that the one-pass approach outperforms LBD-based multi-pass approaches. In You et al. (2004), a mixed-lingual keyword spotting system was developed for auto-attendant applications. The keywords to be detected could be in either English or Chinese. This paper presents a study on automatic speech recognition of Cantonese-English
Automatic Recognition of Cantonese-English Code-Mixing Speech 283 code-mixing speech. Part of the work was reported in Chan et al. (2006). Our study covers all components of an ASR system, including acoustic models, language models, pronunciation dictionary, and search algorithm. Different approaches to LBD are also investigated. By understanding the linguistic properties of monolingual Cantonese and English, as well as code-mixing speech, the major difficulties in code-mixing speech recognition are revealed and possible solutions are suggested. We propose a two-pass recognition system, in which the acoustic and linguistic knowledge sources are integrated with language boundary information. Simulation experiments are carried out to evaluate the performance of the whole system as well as individual system components. 2. Difficulties in Code-mixing Speech Recognition 2.1 Linguistic Properties of Cantonese and English Cantonese is a Chinese dialect. It is spoken by tens of millions of people in the provinces of Guangdong, Guangxi, Hong Kong, and Macau. A Chinese word in its written form is composed of a sequence of characters. In Cantonese, each Chinese character is pronounced as a monosyllable carrying a specific lexical tone (Ching et al., 2006). English is one of the most popular languages in the world. An English word is written as a sequence of letters. In spoken form, each word may consist of several syllables, some of which are designated to be stressed. Table 1 shows a pair of example words in Cantonese and English. Table 1. Examples of Cantonese and English words in written and spoken format. Written (orthographic transcription) Spoken (phonetic transcription) 產生 /ts a n/ /s / produce /p r ´ »d j u˘ s/ Syllables can be divided into smaller units, namely consonants (C) and vowels (V). Cantonese syllables take the structures of V, CV, CVC, or VC (Ching et al., 1994). If tonal difference is not considered, the number of distinct Cantonese syllables is around 600 (Ching et al., 2006). The syllable structure in English is more complicated than that in Cantonese. Although many English syllables share the same canonical forms as given above, there also exist combinations like CCV, VCC, CCCV, and CCCVCC (Wester, 2003), which are not found in Cantonese. There are 22 consonants and 22 vowels (including diphthongs) in Cantonese, and 24 consonants and 14 vowels in American English (Ching et al., 1994; Ladefoged, 1999). Table 2 lists the IPA (International Phonetic Alphabet) symbols of these phonemes. Some of the phonemes in the two languages are labeled with the same IPA symbols by phoneticians, meaning that they are phonetically very close. Some of the other phonemes are also
Page 1: Computational Linguistics and Chine
Page 5 and 6: Automatic Recognition of Cantonese-

Automatic Recognition of Cantonese-English Code ... - Aclclp.org.tw

Create successful ePaper yourself

Delete template?

Save as template?