10.07.2015 Views

Main Issues in Grapheme-to-Phoneme Conversion for TTS - sepln

Main Issues in Grapheme-to-Phoneme Conversion for TTS - sepln

Main Issues in Grapheme-to-Phoneme Conversion for TTS - sepln

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Procesamien<strong>to</strong> del Lenguaje Natural, núm. 35 (2005), pp. 29-34 recibido 02-05-2005; aceptado 01-06-2005<strong>Ma<strong>in</strong></strong> issues <strong>in</strong> grapheme-<strong>to</strong>-phoneme conversion <strong>for</strong> <strong>TTS</strong>Tatyana PolyakovaUniversitat Politècnica de CatalunyaJordi Girona 1-3, Barcelona, Spa<strong>in</strong>tatyana@gps.tsc.upc.esAn<strong>to</strong>nio BonafonteUniversitat Politècnica de CatalunyaJordi Girona 1-3, Barcelona, Spa<strong>in</strong>an<strong>to</strong>nio.bonafonte@upc.esResumen: La conversión de letras a fonemas en <strong>in</strong>glés está siendo desarrollada para su futura<strong>in</strong>tegración en un sistema de síntesis de habla dentro del proyec<strong>to</strong> TC-STAR. En este trabajo sedescriben los experimen<strong>to</strong>s realizados usando dos técnicas diferentes de aprendizaje au<strong>to</strong>mático.Se ha considerado la predicción de la pronunciación con y s<strong>in</strong> acen<strong>to</strong>. Se analiza la <strong>in</strong>fluenciade los diferentes parámetros en la tasa de error en la conversión de letras a fonemas. También seestudia la distribución de la tasa de error en función de la longitud de las palabras ha sidoobtenida.Palabras clave: letras en fonemas, al<strong>in</strong>eado, x-grams, traduc<strong>to</strong>res de estados f<strong>in</strong>i<strong>to</strong>s, CART.Abstract: <strong>Grapheme</strong>-<strong>to</strong>-phoneme conversion system <strong>for</strong> English is be<strong>in</strong>g developed <strong>for</strong> further<strong>in</strong>tegration <strong>in</strong><strong>to</strong> speech synthesis system with<strong>in</strong> TC-STAR project. In this work we describeexperiments per<strong>for</strong>med us<strong>in</strong>g two different mach<strong>in</strong>e learn<strong>in</strong>g techniques. The pronunciation waspredicted both <strong>for</strong> stressed and unstressed lexicon and the results were compared. Analysis ofdifferent parameters that may <strong>in</strong>fluence the error rate <strong>in</strong> grapheme-<strong>to</strong>-phoneme conversion wasper<strong>for</strong>med. The error rate as a function of the word length was studied.Keywords: grapheme-<strong>to</strong>-phoneme, alignment, x-grams, f<strong>in</strong>ite-state transducers, CART.1 IntroductionText-<strong>to</strong>-speech synthesis and cont<strong>in</strong>uous speechrecognition are the two rapidly develop<strong>in</strong>gtechnologies that have many applications <strong>in</strong>many different areas of human life. It isimpossible <strong>to</strong> have a good-work<strong>in</strong>g text-<strong>to</strong>speechsystem without hav<strong>in</strong>g a <strong>to</strong>ol thatgenerates correct pronunciation from theorthographic transcription.Because of the writ<strong>in</strong>g system irregularities(absence of unambiguity between the phonemeand grapheme full sets, where, <strong>for</strong> example oneletter may correspond <strong>to</strong> two phonemes or <strong>to</strong> nosound at all) the grapheme-<strong>to</strong>-phonemeconversion is not an easy task, especially <strong>for</strong>languages like English or French where therelation between letters and sounds is not veryclear.To tra<strong>in</strong> a grapheme-<strong>to</strong>-phoneme conversionsystem most algorithms require an alignmentbetween grapheme and phoneme str<strong>in</strong>gs. Manydifferent approaches have been elaborated <strong>to</strong>au<strong>to</strong>matically per<strong>for</strong>m grapheme-<strong>to</strong>-phonemealignment <strong>for</strong> the translation systems; however,<strong>in</strong> grapheme-<strong>to</strong>-phoneme conversion usually adictionary aligned by hand-seeded rules is used<strong>to</strong> tra<strong>in</strong> a classifier.Different methods <strong>for</strong> build<strong>in</strong>g au<strong>to</strong>maticalignments have been proposed, among whichthere are one-<strong>to</strong>-one and many-<strong>to</strong>-manyalignments. In one-<strong>to</strong>-one alignment each lettercorresponds only <strong>to</strong> one phoneme and viceversa. To match grapheme and phoneme str<strong>in</strong>gs,an “empty” symbol is <strong>in</strong>troduced <strong>in</strong><strong>to</strong> eitherstr<strong>in</strong>g.In many-<strong>to</strong>-many alignments a letter cancorrespond <strong>to</strong> more than one phoneme and aphoneme can correspond <strong>to</strong> more than one letter.An example of one-<strong>to</strong>-one alignment modelis the epsilon scatter<strong>in</strong>g model (ESM) where“epsilon” or “empty” phoneme is <strong>in</strong>troduced <strong>to</strong>either both grapheme and phoneme str<strong>in</strong>gs oronly <strong>to</strong> phoneme str<strong>in</strong>g (Pagel, Lenzo and Black,1998) Then the EM algorithm is applied <strong>to</strong>optimize their positions.An example of many-<strong>to</strong>-many alignment wasproposed by Bisani and Ney (2003). In theirmodel the ma<strong>in</strong> idea is that the bothorthographic <strong>for</strong>m and pronunciation <strong>for</strong> eachword are determ<strong>in</strong>ed by a common sequence ofgraphones which is a pair q= (g, φ) of letter andphonemes sequences, respectively (whereq ∈ Q ⊆ G * × Φ * ; and G is the letter alphabetand Φ is the <strong>in</strong>ven<strong>to</strong>ry of phonemic symbols).For a given sequence of letters g ЄG* theISSN: 1135-5948© 2005 Sociedad Española para el Procesamien<strong>to</strong> del Lenguaje Natural


T. Polyakova, A. Bonafonteprobability <strong>to</strong> f<strong>in</strong>d most likely phonemesequence φ ЄΦ* is maximized.This paper reports on a data driven-approach<strong>to</strong> pronunciation model<strong>in</strong>g. To align graphemesand phonemes we chose <strong>to</strong> use au<strong>to</strong>maticepsilon-scatter<strong>in</strong>g. The experiments wereper<strong>for</strong>med on stressed and unstressed lexicon ofEnglish, the grapheme-<strong>to</strong>-phoneme results,obta<strong>in</strong>ed by CART decision trees and x-grambased F<strong>in</strong>ite-State Transducers were compared<strong>for</strong> these lexicons <strong>for</strong> both cases.2 <strong>Grapheme</strong>-<strong>to</strong>-phoneme methods2.1 Epsilon-scatter<strong>in</strong>g based alignmentAs the alignment method we use au<strong>to</strong>maticepsilon-scatter<strong>in</strong>g method <strong>to</strong> produce one -<strong>to</strong>onealignment. (Black, Lenzo and Pagel, 1998).For the cases when the number of letters isgreater than that of phonemes we <strong>in</strong>troduce, the“empty” phoneme symbol <strong>in</strong><strong>to</strong> phoneticrepresentations <strong>to</strong> match the length of graphemeand phoneme str<strong>in</strong>gs. First we add thenecessary number of “empty” symbols, <strong>in</strong><strong>to</strong> allpossible positions <strong>in</strong> phonetic representations <strong>for</strong>each word <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g corpus. In the Table 1we give an example of two of the possiblealignment candidates <strong>for</strong> the word “meadows”.Let. M E A D O W SPhon. m E _ d o _ zPhon. _ _ m E d o ZTable 1: Some possible alignment candidates.Such a probabilistic <strong>in</strong>itialization gives usnumber of all possible imperfect alignments.Our goal is <strong>to</strong> maximize the probability thatletter g matches phoneme ϕ and there<strong>for</strong>echoose the best alignment from the possiblecandidates. It is done by apply<strong>in</strong>g theexpectation maximization algorithm (EM)(Dempster, Laird and Rub<strong>in</strong>, 1977). The EMassociated with jo<strong>in</strong>t grapheme/phonemeprobabilities. Under certa<strong>in</strong> circumstances theEM guarantees an <strong>in</strong>crease of the likelihoodfunction at each iteration until convergence <strong>to</strong> alocal maximum.2.1.1 Stress assignmentIn this work we have used the phonetic alphabetthat is called SAMPA(http://www.phon.ucl.ac.uk/home/sampa/american.htm).The <strong>to</strong>tal number of the phoneme symbols<strong>in</strong>volved <strong>in</strong> the SAMPA <strong>for</strong> American English isN p =41. As we use different phonemes <strong>for</strong>accented and unaccented vowels, <strong>to</strong> per<strong>for</strong>mtra<strong>in</strong><strong>in</strong>g and test on the accented dictionary wehave added N a =17, accented vowel symbols andan “empty” phoneme, “_”. As an example, <strong>in</strong>Table 2 the word “abilities” is shown <strong>to</strong>getherwith its two phonetic representations: with stressand without stress. In the stressed version thedigit “1” is added <strong>to</strong> identify the stressed vowel.In this example one “empty” symbol was<strong>in</strong>troduced <strong>to</strong> match the str<strong>in</strong>gs.Let. A B I L I T I E SStr. @ B I1 l @ t i _ zPhon.Unstr.Phon.@ B I l @ t i _ zTable 2: Alignment of the graphemes andphonemes <strong>in</strong>clud<strong>in</strong>g the “empty” symbol. Thedigit 1 is the stress marker.2.2 CARTDeriv<strong>in</strong>g the pronunciation au<strong>to</strong>matically byus<strong>in</strong>g decision trees, such as Classification andRegression Trees, or CART is a commonly usedtechnique <strong>in</strong> grapheme-<strong>to</strong>-phoneme conversion.Pagel, Lenzo and Black (1998) have <strong>in</strong>troducedthe CART decision-trees <strong>in</strong><strong>to</strong> pronunciationprediction. As the <strong>in</strong>put vec<strong>to</strong>rs, the graphemicslid<strong>in</strong>g w<strong>in</strong>dow, conta<strong>in</strong><strong>in</strong>g three letters on theleft and three letters on the right of each centerletter was considered. The advantage of us<strong>in</strong>gCART method is that it produces compactmodels. The model size is def<strong>in</strong>ed by the <strong>to</strong>talnumber of questions and leaf nodes <strong>in</strong> thegenerated tree.2.3 F<strong>in</strong>ite State TransducersIn this next approach we chose thepronunciation that maximizes{ ( / )}arg max p gϕϕ (1),where ϕ is the sequence of phonemes,<strong>in</strong>clud<strong>in</strong>g the “empty” phoneme, and g is thesequence of letters.To solve the maximization problem we use aF<strong>in</strong>ite state transducer, which is similar <strong>to</strong>(Galescu and Allen, 2001). The equation (1) canbe expressed asarg max{ p( ϕ, g)}/ p( g) = arg max{ p( ϕ, g)}(2).ϕϕ30


<strong>Ma<strong>in</strong></strong> <strong>Issues</strong> <strong>in</strong> <strong>Grapheme</strong>-<strong>to</strong>-<strong>Phoneme</strong> <strong>Conversion</strong> <strong>for</strong> <strong>TTS</strong>This can be estimated, us<strong>in</strong>g standard n-grammethods.In the tra<strong>in</strong><strong>in</strong>g, first the alignment isper<strong>for</strong>med. Then we def<strong>in</strong>e the graphonesγ , asthe pair consist<strong>in</strong>g of one letter and onephoneme (<strong>in</strong>clud<strong>in</strong>gε )NNi 1 i 1 i 1∏ ( − − i i 1 1 ) ∏ −i 1 , (3)pg ( , ϕ) = p g, ϕ / g , ϕ = p( γ / γ )i= 1 i=1where N is the number of letters <strong>in</strong> the word.This can be estimated us<strong>in</strong>g n-grams, but thesymbols are not words, but the “graphones”, i.e.pair (letter, phoneme) found <strong>in</strong> the aligneddictionary.N-grams can be represented us<strong>in</strong>g a F<strong>in</strong>itestateau<strong>to</strong>ma<strong>to</strong>n: <strong>for</strong> each his<strong>to</strong>ry h we def<strong>in</strong>e astate and <strong>for</strong> each new graphone γl< gl,ϕl> wecreate an edge, with the label < gl,ϕl> andweight it with the probability p( γl/ h).For example, <strong>for</strong> the word “aligned” wecreate the follow<strong>in</strong>g graphone sequence: (A,a) (L,l) (I,aI) (G,_) (N,n) (E,_)(D,d) Fig. 1 shows some states and edges <strong>for</strong> thebigram-based FSA (<strong>in</strong> practice all of the tra<strong>in</strong><strong>in</strong>gdata is used <strong>to</strong> estimate probability)., p(I,aI/L,l)I, aIp(I,aI/L,l)L, l, p(I,i/L,l)I,ip(I,i/L,l)Figure 1: A F<strong>in</strong>ite-state au<strong>to</strong>ma<strong>to</strong>nThe FSA allows <strong>to</strong> compute the probabilityp(γ ), but this requires an alignment. In order <strong>to</strong>solve the equation (2) we derive a f<strong>in</strong>ite statetransducer <strong>in</strong> a straight<strong>for</strong>ward way: the labelsattached <strong>to</strong> the FSA are split: the letter becomesan <strong>in</strong>put and the phoneme becomes an output.Fig. 2 shows the results of this after apply<strong>in</strong>g it<strong>to</strong> Fig. 1.Note that the FST is not determ<strong>in</strong>istic. For<strong>in</strong>stance, from the state labeled as (L,l) there aretwo possible edges <strong>for</strong> the same <strong>in</strong>put letter “I”.p(I,aI/L,l), I/aIL, lI/i, p(I,i/L,l)I, aI I, iFigure 2: A F<strong>in</strong>ite-state transducerTo f<strong>in</strong>d the pronunciation we have <strong>to</strong> solveequation (3). This is equivalent <strong>to</strong> f<strong>in</strong>d<strong>in</strong>g thepath of the FST with maximum probabilities.The <strong>in</strong>put letters limit the number of edgeswhich can be followed. The best path is foundus<strong>in</strong>g the dynamic programm<strong>in</strong>g algorithm.In order <strong>to</strong> derive correct pronunciation weuse the flexible length x-gram model (Bonafonteand Mariño, 1996). The x-gram model assumesthat the number of condition<strong>in</strong>g graphemephonemepairs depends on each particular case.The ma<strong>in</strong> idea of the x-gram model lies <strong>in</strong>apply<strong>in</strong>g of a merg<strong>in</strong>g-state algorithm whichallows us <strong>to</strong> reduce the number of states. Thetwo criteria <strong>for</strong> merg<strong>in</strong>g were used. Firstly, themerg<strong>in</strong>g is applied if the number of times of agiven his<strong>to</strong>ry appearance on thetra<strong>in</strong><strong>in</strong>g data with lexicon size J, is less than athreshold, k m<strong>in</strong> (where q i is a grapheme-phonemepair). The probability <strong>to</strong> such graphemephonemepairs is derived by smooth<strong>in</strong>g. Thenthe states are merged if their distributionsp=p(q/q i-m ,…q i ) and m=p(q/q i-m+1 ,…q i ) have asmall enough difference <strong>in</strong> the <strong>in</strong><strong>for</strong>mation. Themeasure of the difference is the divergence Ddef<strong>in</strong>ed asJ⎛ p( j)⎞D( p// m) = ∑ p( i)log⎜ ⎟j=1 ⎝m( j)⎠(4)Choos<strong>in</strong>g the proper values of the thresholdsk m<strong>in</strong> and D , one can significantly reduce thenumber of states without decreas<strong>in</strong>g of modelgoodness.3 Experimental resultsLC-STAR (www.lc-star.com) has createddictionaries <strong>in</strong> 13 languages. In this work wehave used the LC-STAR dictionary of AmericanEnglish, k<strong>in</strong>dly provided by NSC (naturalSpeech Communications) <strong>for</strong> the developmen<strong>to</strong>f speech-<strong>to</strong>-speech translation demonstra<strong>to</strong>rwith<strong>in</strong> LC-STAR project.31


T. Polyakova, A. BonafonteWe have per<strong>for</strong>med experiments us<strong>in</strong>g theepsilon scatter<strong>in</strong>g method <strong>to</strong> align the lexiconcomb<strong>in</strong>ed with CART and FST <strong>to</strong> predict thecorrect pronunciationFirst, the system was tra<strong>in</strong>ed and tested us<strong>in</strong>gthe CART tree-based phoneme prediction. TheLC-STAR dictionary conta<strong>in</strong>s 50 thousandwords; it was randomly split <strong>in</strong><strong>to</strong> ten equal parts.N<strong>in</strong>ety per cent of the dictionary was used <strong>to</strong>tra<strong>in</strong> the system, ten percent of which was used<strong>to</strong> per<strong>for</strong>m the evaluation; other ten per centwere used <strong>to</strong> test the system.3.1. <strong>Conversion</strong> us<strong>in</strong>g CART and<strong>in</strong>fluence of the decision tree parameterson the error rateFor different parameters of CART decision treethe phoneme and word error rate have beenplotted <strong>in</strong> Fig. 3a and Fig. 3b. Figure 3a showsthe error rate as the function of the value ofm<strong>in</strong>imum amount of the entropy ga<strong>in</strong>, necessary<strong>to</strong> justify the further tree expansion. The Fig. 3bshows these error rates <strong>for</strong> different values ofmaximum tree depth. If we set this parameter <strong>to</strong>be very small, the word error rate tends <strong>to</strong> bevery high.erro r, %50454035302520151050phonemeswords0 0.1 0.2 0.3 0.4 0.5 0.6 0.7entropy ga<strong>in</strong>Figure 3a: <strong>Phoneme</strong> and word error rates (%)as a function of m<strong>in</strong>imum entropy ga<strong>in</strong> needed<strong>for</strong> further expansion of the tree.The parameters that give us the best resultswere found <strong>to</strong> be 0.001 <strong>for</strong> the entropy ga<strong>in</strong> and7 <strong>for</strong> the tree depth. These values co<strong>in</strong>cide withthose obta<strong>in</strong>ed by Black, Lenzo and Pagel(1998).Below we present the results obta<strong>in</strong>ed <strong>for</strong> thestressed and unstressed dictionary (see Table 3).The data presented <strong>in</strong> the first row were obta<strong>in</strong>ed<strong>for</strong> the phonetic transcription <strong>in</strong>clud<strong>in</strong>g thestress marks; a misplaced stress mark wascounted as an error. The second row shows thepercentage of correct phonemes,erorr, %706050403020100phonemeswords3 4 5 6 7depthFigure 3b: <strong>Phoneme</strong> and word error rates (%)as a function of the maximum tree depth.and words <strong>in</strong> compar<strong>in</strong>g with the correctphonetic transcription <strong>for</strong> the unstressed lexicon.Phon. Wordswith stress 90.13 51.04w/o stress 93.02 65.48Table 3: Percentage of correct phonemes, andwords <strong>for</strong> stressed and unstressed dictionaryus<strong>in</strong>g CART.The results can be improved by remov<strong>in</strong>g theambiguous abbreviations and non-standardwords, see Table 4.Phon. Wordswith stress 91.29 57.8w/o stress 93.93 68.32Table 4: Percentage of correct phonemes, andwords <strong>for</strong> stressed and unstressed dictionaryus<strong>in</strong>g CART, after the removal of long nonstandardwords and abbreviations from thecorpus.As we can see from the Tables 3 and 4, thesystem per<strong>for</strong>ms significantly better on theunstressed lexicon, especially if we take <strong>in</strong><strong>to</strong> theconsideration the percentage of the wordscorrect obta<strong>in</strong>ed from the experiment.3.2 <strong>Grapheme</strong>-<strong>to</strong>-phoneme conversionus<strong>in</strong>g FSTIt is important <strong>to</strong> analyze how the parameters ofthe x-gram models may affect on the error rates.In order <strong>to</strong> clarify this, we per<strong>for</strong>medexperiments <strong>for</strong> different values of theparameters n and D <strong>in</strong> x-gram model, n is themaximum possible length conditionalprobability his<strong>to</strong>ry.32


<strong>Ma<strong>in</strong></strong> <strong>Issues</strong> <strong>in</strong> <strong>Grapheme</strong>-<strong>to</strong>-<strong>Phoneme</strong> <strong>Conversion</strong> <strong>for</strong> <strong>TTS</strong>The Fig. 4a shows error rate percentage asthe function of the n.error, %80706050403020100phonemeswords1 3 5 7 9nFigure 4a: <strong>Phoneme</strong> and word error rates (%)as a function of n.Both error rates mono<strong>to</strong>nically decrease asthe n <strong>in</strong>creases.This implies that n=5 is the optimalparameter of the n-gram model <strong>for</strong> the givencorpus. In Fig. 4b we plotted the error rate as afunction of the divergence threshold <strong>in</strong> the x-gram model.error, %35302520151050phonemeswords0 0.1 0.2 D 0.3 0.4 0.5Figure 4b: <strong>Phoneme</strong> and word error rates (%)as a function of the divergence threshold D.The results presented <strong>in</strong> Fig. 4 imply thatallow<strong>in</strong>g a significant decrease of the statesnumber the x-gram model gives good resultseven <strong>for</strong> high enough value of the divergencethreshold. The results presented <strong>in</strong> Fig. 4a and4b give the <strong>to</strong>tal percentage of errors <strong>in</strong> thegrapheme-<strong>to</strong>-phoneme conversions. The errorrates practically do not change <strong>in</strong> the range ofthe parameters: n≥5 and 0≤D


fT. Polyakova, A. Bonafonte0.250.20.150.10.050errors distribution functionrunn<strong>in</strong>g word frequencylexicon word frequency2 4 6 8 10 12 14 16lFigure 5: Probability distribution function oferrors versus the grapheme number per word(open squares). The l<strong>in</strong>e with filled circlesrepresents the distribution <strong>for</strong> runn<strong>in</strong>g words;and the l<strong>in</strong>e with filled triangles is thedistribution <strong>for</strong> words <strong>in</strong> a lexicon <strong>in</strong> English.5 ConclusionsThe per<strong>for</strong>med experiments show thatgrapheme-<strong>to</strong>-phoneme results depend on manyparameters as well as on the choice of mach<strong>in</strong>elearn<strong>in</strong>gtechnique used <strong>to</strong> per<strong>for</strong>m conversion.The results were obta<strong>in</strong>ed <strong>for</strong> both accented andunaccented lexicon, us<strong>in</strong>g epsilon-scatter<strong>in</strong>gmethod <strong>to</strong> per<strong>for</strong>m alignment and comb<strong>in</strong><strong>in</strong>g itwith CART and FST classifiers.The FST per<strong>for</strong>med better on both stressedand unstressed dictionary. We have analyzeddifferent fac<strong>to</strong>rs that may <strong>in</strong>fluence errors rate<strong>in</strong> grapheme-<strong>to</strong>-phoneme conversion. Theobta<strong>in</strong>ed error distribution (see Fig. 5) <strong>in</strong>dicatesthat 9-letter long words ma<strong>in</strong>ly contribute <strong>to</strong> the<strong>to</strong>tal error rate if the optimal model parametersare chosen <strong>for</strong> tra<strong>in</strong><strong>in</strong>g of the system. Anoptimal grapheme-<strong>to</strong>-phoneme alignmentmethod has the same importance as theclassification method selection and it couldgive a significant per<strong>for</strong>mance improvement <strong>in</strong>the tasks of speech synthesis and speechrecognition.The problem of the currently used one-<strong>to</strong>onealignment and <strong>in</strong>sert<strong>in</strong>g of the “empty”symbols is that some alignments produced arecompletely artificial and do not specify thepronunciation, moreover the reasonablealignment may not be unique (Damper andMarchand, (2005). Black, Lenzo and Pagel(1998) have concluded that the choice of thedecision-tree learn<strong>in</strong>g technique does not<strong>in</strong>fluence on the results as much as thealignment algorithm, used <strong>to</strong> align the lexicon.In the future we are plann<strong>in</strong>g <strong>to</strong> work <strong>to</strong>compare different alignment methods6 AcknowledgmentsThis work has been funded by the EuropeanUnion under the <strong>in</strong>tegrated project TC-STAR -Technology and Corpora <strong>for</strong> Speech <strong>to</strong> SpeechTranslation(IST-2002-FP6-506738,http://www.tc-star.org ).7 ReferencesBisani M. and H. Ney. 2002. Investigations onjo<strong>in</strong>t-multigram model <strong>for</strong> grapheme-<strong>to</strong>phonemeconversion. In Proc. of the 7 th Int.Conf. on Spoken Language Process<strong>in</strong>g,Denver, CO, vol.1, 105-108.Black A., K. Lenzo K. and V. Pagel. 1998.<strong>Issues</strong> <strong>in</strong> build<strong>in</strong>g general letter <strong>to</strong> soundrules. In Proc. of the 3rd ESCA workshopon speech synthesis., 77-80, Jenolah Caves,AustraliaBonafonte A. and J. B. Mariño, 1996 “Languagemodel<strong>in</strong>g us<strong>in</strong>g X-grams” Proc of ICSLP-96,Vol.1, 394-397, Philadelphia, Oc<strong>to</strong>ber1996.Damper R. I., Y. Marchand , J-D. Marsterns.and A. Baz<strong>in</strong>. 2004. “Align<strong>in</strong>g letters andphonemes <strong>for</strong> speech synthesis” <strong>in</strong> Proc. ofthe 5 th ISCA Speech Syntesis Workshop,Pittsburgh, 209-214.Damper R. I., Y. Marchand. 2005. In<strong>for</strong>mationfusion approaches <strong>to</strong> the au<strong>to</strong>maticpronunciation of pr<strong>in</strong>t by analogy.In<strong>for</strong>mation Fusion. To appear.Dempster A. P., N. M. Laird and D.B. Rub<strong>in</strong> .1977. Maximum likelihood from <strong>in</strong>completedata via the EM algorithm. Journal of theRoyal Statistical Society, Ser. B, 39, 1-38.Galescu L., J. Allen., 2001 "Bi-directional<strong>Conversion</strong> Between <strong>Grapheme</strong>s and<strong>Phoneme</strong>s Us<strong>in</strong>g a Jo<strong>in</strong>t N-gram Model", <strong>in</strong>Proc. of the 4th ISCA Tu<strong>to</strong>rial and ResearchWorkshop on Speech Synthesis, Perthshire,Scotland, 2001Pagel V., K. Lenzo, A.Black. 1998. Letter-<strong>to</strong>soundrules <strong>for</strong> accented lexiconcompression. In Proc. of ICSLP98, vol 5,2015-2020, Sydney, AustraliaSigurd B., M. Eeg-Olofsson., J. van de Weijer.2004. Word length, sentence length andfrequency zipf-revisited. Studia L<strong>in</strong>guistica58(1), 37-52.34

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!