َِچج ث ٹ ت پ ب ا زڑ ر ذ ڈ د خ ح عظ ط ض ص ش س ژ نم ل گ ك ق ف غ ےى ئ ہ و ْ ّ ً ٰ ُھة ں آ Table 1: <strong>Urdu</strong> basic (<strong>to</strong>p) and secondary(middle) letters and aerab (bot<strong>to</strong>m)Combination of these characters realizes a richinven<strong>to</strong>ry of 44 consonants, 8 long oral vowels, 7long nasal vowels, 3 short vowels and numerousdiphthongs (e.g. Saleem et al. 2002, Hussain 1997;set of <strong>Urdu</strong> diphthongs is still under analysis).This phonemic inven<strong>to</strong>ry is given in Table 2.The italicized phonemes, whose existence is stillnot determined, are not considered any further (seeSaleem et al. 2002 <strong>for</strong> further discussion).Mapping of this phonetic inven<strong>to</strong>ry <strong>to</strong> thecharacters given in Table 1 is discussed later.(a)p b p b m mt d t d n n k k t d t d q f v s z x hr r j l l(b)i e æu o i e æu o Table 2: <strong>Urdu</strong> (a) Consonantal and (b) Vocalicphonemic inven<strong>to</strong>ry3 NLP <strong>for</strong> <strong>Urdu</strong> TTSAs discussed earlier, <strong>to</strong> enable text-<strong>to</strong>-speechsystem <strong>for</strong> any language, a Natural LanguageProcessing component is required. The NLPsystem may have differing requirement <strong>for</strong>different languages. However, it always takes rawtext input and always outputs precise phonetictranscription <strong>for</strong> a language. The system can bedivided in<strong>to</strong> two parts, <strong>Text</strong>-NormalizationComponent and Phonological ProcessingComponent. These components may be furtherdivided. A simplified schematic is shown inFigure 1 1 .<strong>Urdu</strong> Raw<strong>Text</strong> InputNormalized<strong>Urdu</strong> <strong>Text</strong>TokenizerSemanticTaggerStringGenera<strong>to</strong>r<strong>Letter</strong> <strong>to</strong> <strong>Sound</strong>ConverterSyllabifier<strong>Sound</strong> ChangeManagerStress MarkerIn<strong>to</strong>nationMarkerAnnotated PhoneticOutputFigure 1: NLP architecture <strong>for</strong> <strong>Urdu</strong> TTS system1This diagram is based on the architecture of <strong>Urdu</strong><strong>Text</strong> <strong>to</strong> <strong>Speech</strong> system under development at Center <strong>for</strong>Research in <strong>Urdu</strong> Language Processing(www.crulp.org).Workshop on Arabic Script Based Languages, COLING2004, Geneva 2
The <strong>Text</strong> Normalization component takes acharacter string as input and converts it in<strong>to</strong> astring of letters. Within it, the Tokenizer uses thepunctuation marks and space between words <strong>to</strong>mark <strong>to</strong>ken boundaries which are then stamped aswords, punctuation, date, time and other relevantcategories by the Semantic Tagger. The StringGenera<strong>to</strong>r takes any non-letter based input (e.g. anumber or a date containing digits) and converts itin<strong>to</strong> a letter string.After the input is converted in<strong>to</strong> a stringcomprising only of letters, the PhonologicalProcessing Component generates thecorresponding phonetic transcription. This is donethrough a series of processes. The first process is<strong>to</strong> use <strong>Letter</strong>-<strong>to</strong>-<strong>Sound</strong> Converter (detailed below)<strong>to</strong> convert the normalized text input <strong>to</strong> a phonemicstring. This process may also be referred <strong>to</strong> asgrapheme-<strong>to</strong>-phoneme conversion. This isfollowed by Syllabifier, which marks syllableboundaries. The intermediate output is then<strong>for</strong>warded <strong>to</strong> a module which applies <strong>Urdu</strong> soundchange rules <strong>to</strong> generate the correspondingphonetic string. Following these modules, StressMarker and In<strong>to</strong>nation Marker modules add stressand in<strong>to</strong>nation <strong>to</strong> the string being processed. Resyllabificationis also per<strong>for</strong>med after soundchange rules are applied, in case phones areepenthesized or deleted and syllable boundariesrequire re-adjustment. <strong>Urdu</strong> shows a reasonablyregular behavior and most of these tasks can beachieved through rule-based systems (e.g. seeHussain 1997 <strong>for</strong> stress assignment algorithm).This paper focuses on <strong>Letter</strong>-<strong>to</strong>-<strong>Sound</strong> rules <strong>for</strong><strong>Urdu</strong>, the first in the series of modules inPhonological Processing Component.4 <strong>Urdu</strong> <strong>Letter</strong> <strong>to</strong> <strong>Sound</strong> Rules<strong>Urdu</strong> shows a very regular mapping fromgraphemes <strong>to</strong> phonemes. However, <strong>to</strong> explain thebehavior, the letters need <strong>to</strong> be further classifiedin<strong>to</strong> the following categories:a. Consonantal charactersb. Dual (consonantal and vocalic) behaviorcharactersc. Vowel modifier characterd. Consonant modifier charactere. Composite (consonantal and vocalic) characterSimilarly, the aerab set can also be divided in<strong>to</strong>the following categories:f. Basic vowel specifierg. Extended vowel specifierh. Consonantal gemination specifieri. Dual (vocalic and consonantal) inser<strong>to</strong>rFinally, there is a third category which may takeshape of an letter and aerab:j. Vowel-aerab placeholderThe Consonantal characters in (a) above alwaysrepresent a consonant of <strong>Urdu</strong>. In <strong>Urdu</strong>, there isalways a single consonant corresponding <strong>to</strong> asingle character of this category, unlike some otherlanguages e.g. English maps “ph” string <strong>to</strong>phoneme /f/. Most of the <strong>Urdu</strong> consonantalcharacters fall in<strong>to</strong> this category. These charactersand corresponding consonantal phonemes aregiven in Table 3 below. A simple mapping rulewould generate the phoneme corresponding <strong>to</strong>these characters.ب پ ت ٹ ث ج چt d s t p bح خ د ڈ ذ ر ڑ r z d x hز ژ س ش ص ض طt z s s zظ ع غ ف ق ك گ k q f zل م ن ہ ةt h n m lTable 3: Consonantal characters and theircorresponding phonemesThree characters of <strong>Urdu</strong> show dual behavior,i.e. in certain contexts they trans<strong>for</strong>m in<strong>to</strong>consonants, but in certain other contexts, theytrans<strong>for</strong>m in<strong>to</strong> vowels. These characters are AlefAlef acts .(ے or ى) and Yay ,(و) vao ,(ا)exceptionally in this category and there<strong>for</strong>e it isdiscussed separately in (j) below. Vao changes <strong>to</strong>/v/ and Yay changes <strong>to</strong> the approximant /j/ whenthey occur in consonantal positions (in onset orcoda of a syllable). However, when they occur asnucleus of a syllable, they <strong>for</strong>m long vowels. Asan example, Yay occurs as a consonant when i<strong>to</strong>ccurs in the onset of single syllable word ر Workshop on Arabic Script Based Languages, COLING2004, Geneva 3