13.07.2015 Views

Letter-to-Sound Conversion for Urdu Text-to-Speech System

Letter-to-Sound Conversion for Urdu Text-to-Speech System

Letter-to-Sound Conversion for Urdu Text-to-Speech System

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

َِچج ث ٹ ت پ ب ا زڑ ر ذ ڈ د خ ح عظ ط ض ص ش س ژ نم ل گ ك ق ف غ ےى ئ ہ و ْ ّ ً ٰ ُھة ں آ Table 1: <strong>Urdu</strong> basic (<strong>to</strong>p) and secondary(middle) letters and aerab (bot<strong>to</strong>m)Combination of these characters realizes a richinven<strong>to</strong>ry of 44 consonants, 8 long oral vowels, 7long nasal vowels, 3 short vowels and numerousdiphthongs (e.g. Saleem et al. 2002, Hussain 1997;set of <strong>Urdu</strong> diphthongs is still under analysis).This phonemic inven<strong>to</strong>ry is given in Table 2.The italicized phonemes, whose existence is stillnot determined, are not considered any further (seeSaleem et al. 2002 <strong>for</strong> further discussion).Mapping of this phonetic inven<strong>to</strong>ry <strong>to</strong> thecharacters given in Table 1 is discussed later.(a)p b p b m mt d t d n n k k t d t d q f v s z x hr r j l l(b)i e æu o i e æu o Table 2: <strong>Urdu</strong> (a) Consonantal and (b) Vocalicphonemic inven<strong>to</strong>ry3 NLP <strong>for</strong> <strong>Urdu</strong> TTSAs discussed earlier, <strong>to</strong> enable text-<strong>to</strong>-speechsystem <strong>for</strong> any language, a Natural LanguageProcessing component is required. The NLPsystem may have differing requirement <strong>for</strong>different languages. However, it always takes rawtext input and always outputs precise phonetictranscription <strong>for</strong> a language. The system can bedivided in<strong>to</strong> two parts, <strong>Text</strong>-NormalizationComponent and Phonological ProcessingComponent. These components may be furtherdivided. A simplified schematic is shown inFigure 1 1 .<strong>Urdu</strong> Raw<strong>Text</strong> InputNormalized<strong>Urdu</strong> <strong>Text</strong>TokenizerSemanticTaggerStringGenera<strong>to</strong>r<strong>Letter</strong> <strong>to</strong> <strong>Sound</strong>ConverterSyllabifier<strong>Sound</strong> ChangeManagerStress MarkerIn<strong>to</strong>nationMarkerAnnotated PhoneticOutputFigure 1: NLP architecture <strong>for</strong> <strong>Urdu</strong> TTS system1This diagram is based on the architecture of <strong>Urdu</strong><strong>Text</strong> <strong>to</strong> <strong>Speech</strong> system under development at Center <strong>for</strong>Research in <strong>Urdu</strong> Language Processing(www.crulp.org).Workshop on Arabic Script Based Languages, COLING2004, Geneva 2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!