RLAT – Rapid Language Adaptation Toolkit

RLAT – Rapid Language Adaptation Toolkit 


Tim Schlippe 

24 June 2010


RLAT – Rapid Language Adaptation Toolkit


Outline 

• Introduction 

• Text Collection and Selection 

Crawling 

Snapshot function 

Recrawling 

Text management 

Prompt Selection 

• Language Model Building 

Language model management 

• Phoneme selection 

• Lexicon Pronunciation Creation 

G2P Rules 

Lex Learner 

3


Outline 

• Audio Collection 

• Acoustic Model Building 

Configuration 

Training 

• Building Synthesis Voice 

• Testing Automatic Speech Recognition / Text To Speech 

• Ongoing work at CSL 

4



• Bridge the gap between technology experts → language experts 

Automatic Speech Recognition (ASR) 

Machine Translation (MT) 

Text-to-Speech (TTS) 

• Develop web-based intelligent systems 

Interactive Learning with user in the loop 

Rapid Adaptation of universal models to unseen languages 

• RLAT webpage: http://csl.ira.uka.de/rlat-dev 

5



• Collects: 

Appropriate text data 

Appropriate audio data 

• Defines: 

Phoneme set 

Rich prompt set 

Lexical pronunciations 

6 

• Produces: 

Pronunciation model 

ASR acoustic model 

ASR language model 

TTS voice 

• Maintains: 

Projects and users login 

Data and models


RLAT – Project 

7


Overview – Automatic Speech Recognition 

Front End 

(Preprocessing) 

Acoustic 

Model 

8 

Decoder 

(Search) 

Lexicon / 

Dictionary 

Language 

Model 

Text



9


RLAT – Text Collection and Selection 

Goal: 

• Get as much relevant text data as possible 

• Use the text data for 

Generating recording prompts 

Generating vocabulary lists 

Build Language Models for ASR 

• Possibilities to obtain text data: 

Web crawler (web pages and search engines) 

Upload (local) / download texts (remote) 

• Language encoding 

To deal with very common alphabets internally all utf-8 

10


RLAT – Text Collection and Selection – Crawling 

• Interface for automatic text collection 

Upload function, Web crawler, Recrawl: boost data from similar sites 

• Filtering: 

Language Identification 

Remove HTML tags, XML, JavaScripts, … 

Convert from different character encodings (ISO-8859, UTF8, …) and 

file formats into raw text in UTF-8 

Text Normalization 

11



• Link Level Depth, Upload Test File to evaluate text beeing 

crawled, Upload File with URLs and Search Terms, Clock … 

12


RLAT – Text Collection and Selection – Snapshot function 

• Snapshot functionality 

Informative and visual feedback about the quality of text beeing crawled 

Results which indicate the quality (PP, OOV) of the collected texts are computed and 

displayed periodically (to be defined by the user) during the crawling process 

13



• Snapshot functionality 

Informative and visual feedback about the quality of text beeing crawled 

Results which indicate the quality (PP, OOV) of the collected texts are computed and 

displayed periodically (to be defined by the user) during the crawling process 

14



Crawl 

Every 10 mins 

normalize 

Every selected period 

15 

Copy, cat and build LM



• Example: 18 days of Crawling French Data: www.leparisien.fr 

7.0 

6.0 

5.0 

4.0 

3.0 

2.0 

1.0 

0.0 

OOV Rate (%) of collected data 

on a French dev set 

OOV Rate n-gram coverage 

days 

16 

n-gram coverages (%) of collected data 

on a French dev set 

days



• Crawling of French Text Data: www.leparisien.fr 

300 K 

250 K 

200 K 

150 K 

100 K 

50 K 

Vocabulary size of collected data Total words of collected data 

17 

400 Mio 

350 Mio 

300 Mio 

250 Mio 

200 Mio 

150 Mio 

100 Mio 

50 Mio


RLAT – Text Collection and Selection – Recrawling 

• Boost data from similar sites 

Approach 

1. User supplies an URL to RLAT for crawling 

2. Crawler retrieves N documents (web pages) 

3. Compute the statistics (TF-IDF) from the N documents 

4. Terms with highest TF-IDF score form query terms 

5. Query search engine (Google) to get the URLs for the query terms 

6. Crawl the URLs for the data 

18


RLAT – Text Collection and Selection – Recrawling 

19


RLAT – Text Collection and Selection – Text Management 

• Allow to store and handle multiple text files 

20


RLAT – Text Collection and Selection – Text Management 

www.tagesschau.de_2010-03-18_15.33.info 

21


RLAT – Prompt Selection 

• Prompts for recording: 

Collection without transcription 

• Prompts should be: 

Easy to say (no hard words, numerals etc) 

Rich in variability 

22


RLAT – Language Model Building 

• Three LM building options 

Interpolate base LMs 

Add additional data: concatenate text corpora and build one LM 

Do not use additional data 

• Evaluation: 

Perplexity, OOV Rate, Ngram Coverage, Vocabulary Size 

5-fold Cross Validation, Test File 

23



24



• Interpolate LMs, LM Evaluation … 

25



• More transparency for the LM building process 

26



27


RLAT – Build Language Model – LM Management 

• Build LMs from text selected in „Text Management“ and interpolate 

those 

28


RLAT – Build Language Model – LM Management 

manually 

selected 

dev set 

LM1 

weights 

LM2 … text1 text2 … 

interpolate 

29 

LM

RLAT – Rapid Language Adaptation Toolkit



31


RLAT – Phoneme Selection 

• Selection from standard IPA chart 

• User’s names for phonemes 

Can match their lexicon (if one exists) 

Can match their familiarity 

• Audio feedback 

Click to hear recording of each phone 

32


RLAT – Phoneme Selection 

33


RLAT – Lexicon Pronunciation Creation – G2P Rules 

34


RLAT – Lexicon Pronunciation Creation – Lex Learner 

35



36


RLAT – Audio Recording Tool 

• Online recording tools 

Collaboratively record large number of speakers 

Speakers may separate from developers 

• Java based for portability 

Works with “many” browsers 

• Visual feedback during recording 

• Automatic upload on completion 

37


RLAT – Audio Collection 

38


RLAT – Acoustic Model Building 

• Acoustic Model Building requires: 

Recorded Speech Data 

Phone set definition 

Pronunciation Lexicon 

• Two step process: 

1. Configuration 

2. Model Training 

39


RLAT – Acoustic Model Building - Configuration 

• Checks dependencies and errors 

Lexicon and phone set correspond 

Words in recorded prompts are covered by the lexicon 

• Divides the recorded data into training and test sets 

• Performance evaluation 

Few data: K-fold cross-validation, with K = #speakers 

More data: Data split into 90% (train) and 10% (test) 

40


RLAT – Acoustic Model Building 

• Select speakers for test and training set 

41


RLAT – Acoustic Model Building – Training 

• Requires successful configuration 

• Creates Log files and Displays to the user 

All steps of training 

• EM Training for Context Independent Models 

3-state HMM 

Number of Gaussians per Model depends on data 

• EM Training for Context Dependent Models 

Number of models depends on data 

• MFCC front-end (Mel Frequency Cepstral Coefficients), 

LDA (linear discriminant analysis) 

• Progress of training procedure 

• Results of performance evaluation 

42


RLAT – Acoustic Model Building – Training 

43



44


RLAT – Building Synthesis Voice 

45


RLAT – Testing Automatic Speech Recognition / Text To Speech 

Simple echo back testing function 

46


Automatic Pronunciation Dictionary Generation from the WWW 


Derive pronunciations using IPA fields in different websites 

47


Text Normalization based on Statistical Machine Translation 

and Internet User Support 


Web-based user interface for langauge-specific text normalization 

Hybrid approach (rules + SMT) 

48


Studien-/Bachelorarbeit: 

Error Blaming in the Rapid Language Adaptation Toolkit 

• Was sind die Schwachstellen des Spracherkenners 

und wie können sie behoben werden? 

Sprachmodell (hohe OOV Rate, Perplexität) 

kleiner Signal-Rausch-Abstand 

Akzente 

Hesitationen (uhm, ehm) und Fragmente (ha-, hallo) 

. . . 

49




Difficulty 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

0 10 20 30 40 50 60 70 80 90 100 

Benötigte Kenntnisse: 

• Grundlagenwissen Spracherkennung, Statistik 

• Linux-Grundkenntnisse 

• Programmierkenntnisse, z.B. in Perl oder PHP 

• Spaß an Informatik und Linguistik 

50 

WER 

Ab sofort bei: 

Tim Schlippe (tim.schlippe@kit.edu)


Studien-/Bachelorarbeit: Konzept eines Spiels für die 

Sammlung von Ressourcen zum Bau von Spracherkennern 

• Konzepte von Spielen für die Sammlung von Sprach-Ressourcen 

recherchieren und analysieren 

• Implementieren einer Lösung 

• Auswertung der Lösung 

51




Benötigte Kenntnisse: 

• Grundlagenwissen Spracherkennung 

• Linux-Grundkenntnisse 

• Programmierkenntnisse, z.B. in Perl oder PHP 

• Spaß an Informatik und Linguistik 

Ab sofort bei: 

Tim Schlippe (tim.schlippe@kit.edu) 

52


Thanks for your interest!


References 

Tanja Schultz and Alan Black. 2008. Tutorial on Rapid Language 

Adaptation Tools for Multilingual Speech Processing Systems. 

ICASSP 2008, Las Vegas, USA, 30 March 2008.

RLAT – Rapid Language Adaptation Toolkit

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?