13.07.2015 Views

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Proceedings</strong>, FONETIK <strong>2009</strong>, Dept. of Linguistics, Stockholm UniversityThe number of out-of-vocabulary (OOV)words is likely to be high, as new terms andnames frequently appear in the textbooks, requiringsophisticated tools for automatic generationof pronunciations. The Filibuster systemdistinguishes between four word types; propernames, compounds and simplex words in thetarget language, and English words.In order to reach the goal of making thetextbooks available for studies, all text - plainNorwegian text and English text passages, OOVwords and proper names - need to be intelligible,raising the demands for a distinct and pragmaticvoice.The development of the NorwegianvoiceThe development of the Norwegian voice canbe divided into four stages: (1) adjustments andcompletion of the pronunciation dictionary andthe text corpus, and the development of the recordingmanuscripts, (2) recordings of theNorwegian speaker, (3) segmentation and buildingthe speech database, and (4) quality assurance.Pronunciation dictionariesThe Norwegian HLT Resource Collection hasbeen made available for research and commercialuse by the Language Council for Norwegian(http://www.sprakbanken.uib.no/). The resourcesinclude a pronunciation dictionary forNorwegian bokmål with about 780,000 entries,which were used in the Filibuster NorwegianTTS. The pronunciations are transcribed in asomewhat revised SAMPA, and follow mainlythe transcription conventions in Øverland(2000). Some changes to the pronunciationswere done, mainly consistent adaptations to theNorwegian speaker's pronunciation and removalof inconsistencies, but a number of true errorswere also corrected, and a few changes weremade due to revisions of the transcription conventions.To cover the need for English pronunciations,the English dictionary used by the Swedishvoice, consisting of about 16,000 entries,was used. The pronunciations in this dictionaryare ‘Swedish-style’ English. Accordingly, theywere adapted into ‘Norwegian-style’ Englishpronunciations. 24 xenophones were implementedin the phoneme set, of which about 15have a sufficiently number of representations inthe speech database, and will be used by theTTS system. The remaining xenophones will bemapped into phonemes that are more frequentin the speech database.In addition, some proper names from theSwedish pronunciation dictionary were adaptedto Norwegian pronunciations, resulting in aproper name dictionary of about 50,000 entries.Text corpusThe text corpus used for manuscript constructionand word frequency statistics consists ofabout 10.8 million words from news and magazinetext, university level textbooks of differenttopics, and Official Norwegian Reports(http://www.regjeringen.no/nb/dok/NOUer.html?id=1767). The text corpus has been cleanedand sentence chunked.Recording manuscriptsThe construction of the Norwegian recordingmanuscript was achieved by searching phoneticallyrich utterances iteratively. While diphoneswas used as the main search unit, searches alsoincluded high-frequency triphones and syllables.As mentioned above, university level textbooksinclude a vast range of different domainsand text types, and demands larger recordingmanuscripts than most TTS systems in order tocover the search units for different text typesand languages. Biographical references, for example,can have a very complex construction,with authors of different nationalities, name initialsof different formats, titles in other languages,page intervals and so on. To maintain ahigh performance of the TTS system for morecomplex text structures, the recording manuscriptmust contain a lot of these kinds of utterances.To cover the need of English phone sequences,a separate English manuscript was recorded.The CMU ARCTIC database forspeech synthesis with nearly 1,150 English utterances(Kominek and Black, 2003) was usedfor this purpose. In addition, the Norwegianmanuscript contained many utterances withmixed Norwegian and English, as well as emailaddresses, acronyms, spelling, numerals, lists,announcements of DAISY specific structuressuch as page numbers, tables, parallel text andso on.RecordingsThe speech was recorded in NLB’s recordingstudio. An experienced male textbook speakerwas recorded by a native supervisor. The re-37

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!