14.07.2013 Views

RLAT – Rapid Language Adaptation Toolkit

RLAT – Rapid Language Adaptation Toolkit

RLAT – Rapid Language Adaptation Toolkit

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

Tim Schlippe<br />

24 June 2010


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong>


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

Outline<br />

• Introduction<br />

• Text Collection and Selection<br />

Crawling<br />

Snapshot function<br />

Recrawling<br />

Text management<br />

Prompt Selection<br />

• <strong>Language</strong> Model Building<br />

<strong>Language</strong> model management<br />

• Phoneme selection<br />

• Lexicon Pronunciation Creation<br />

G2P Rules<br />

Lex Learner<br />

3


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

Outline<br />

• Audio Collection<br />

• Acoustic Model Building<br />

Configuration<br />

Training<br />

• Building Synthesis Voice<br />

• Testing Automatic Speech Recognition / Text To Speech<br />

• Ongoing work at CSL<br />

4


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

• Bridge the gap between technology experts → language experts<br />

Automatic Speech Recognition (ASR)<br />

Machine Translation (MT)<br />

Text-to-Speech (TTS)<br />

• Develop web-based intelligent systems<br />

Interactive Learning with user in the loop<br />

<strong>Rapid</strong> <strong>Adaptation</strong> of universal models to unseen languages<br />

• <strong>RLAT</strong> webpage: http://csl.ira.uka.de/rlat-dev<br />

5


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

• Collects:<br />

Appropriate text data<br />

Appropriate audio data<br />

• Defines:<br />

Phoneme set<br />

Rich prompt set<br />

Lexical pronunciations<br />

6<br />

• Produces:<br />

Pronunciation model<br />

ASR acoustic model<br />

ASR language model<br />

TTS voice<br />

• Maintains:<br />

Projects and users login<br />

Data and models


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Project<br />

7


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

Overview <strong>–</strong> Automatic Speech Recognition<br />

Front End<br />

(Preprocessing)<br />

Acoustic<br />

Model<br />

8<br />

Decoder<br />

(Search)<br />

Lexicon /<br />

Dictionary<br />

<strong>Language</strong><br />

Model<br />

Text


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

Overview <strong>–</strong> Automatic Speech Recognition<br />

9


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection<br />

Goal:<br />

• Get as much relevant text data as possible<br />

• Use the text data for<br />

Generating recording prompts<br />

Generating vocabulary lists<br />

Build <strong>Language</strong> Models for ASR<br />

• Possibilities to obtain text data:<br />

Web crawler (web pages and search engines)<br />

Upload (local) / download texts (remote)<br />

• <strong>Language</strong> encoding<br />

To deal with very common alphabets internally all utf-8<br />

10


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Crawling<br />

• Interface for automatic text collection<br />

Upload function, Web crawler, Recrawl: boost data from similar sites<br />

• Filtering:<br />

<strong>Language</strong> Identification<br />

Remove HTML tags, XML, JavaScripts, …<br />

Convert from different character encodings (ISO-8859, UTF8, …) and<br />

file formats into raw text in UTF-8<br />

Text Normalization<br />

11


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Crawling<br />

• Link Level Depth, Upload Test File to evaluate text beeing<br />

crawled, Upload File with URLs and Search Terms, Clock …<br />

12


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Snapshot function<br />

• Snapshot functionality<br />

Informative and visual feedback about the quality of text beeing crawled<br />

Results which indicate the quality (PP, OOV) of the collected texts are computed and<br />

displayed periodically (to be defined by the user) during the crawling process<br />

13


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Snapshot function<br />

• Snapshot functionality<br />

Informative and visual feedback about the quality of text beeing crawled<br />

Results which indicate the quality (PP, OOV) of the collected texts are computed and<br />

displayed periodically (to be defined by the user) during the crawling process<br />

14


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Snapshot function<br />

Crawl<br />

Every 10 mins<br />

normalize<br />

Every selected period<br />

15<br />

Copy, cat and build LM


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Snapshot function<br />

• Example: 18 days of Crawling French Data: www.leparisien.fr<br />

7.0<br />

6.0<br />

5.0<br />

4.0<br />

3.0<br />

2.0<br />

1.0<br />

0.0<br />

OOV Rate (%) of collected data<br />

on a French dev set<br />

OOV Rate n-gram coverage<br />

days<br />

16<br />

n-gram coverages (%) of collected data<br />

on a French dev set<br />

days


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Crawling<br />

• Crawling of French Text Data: www.leparisien.fr<br />

300 K<br />

250 K<br />

200 K<br />

150 K<br />

100 K<br />

50 K<br />

Vocabulary size of collected data Total words of collected data<br />

17<br />

400 Mio<br />

350 Mio<br />

300 Mio<br />

250 Mio<br />

200 Mio<br />

150 Mio<br />

100 Mio<br />

50 Mio


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Recrawling<br />

• Boost data from similar sites<br />

Approach<br />

1. User supplies an URL to <strong>RLAT</strong> for crawling<br />

2. Crawler retrieves N documents (web pages)<br />

3. Compute the statistics (TF-IDF) from the N documents<br />

4. Terms with highest TF-IDF score form query terms<br />

5. Query search engine (Google) to get the URLs for the query terms<br />

6. Crawl the URLs for the data<br />

18


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Recrawling<br />

19


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Text Management<br />

• Allow to store and handle multiple text files<br />

20


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Text Management<br />

www.tagesschau.de_2010-03-18_15.33.info<br />

21


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Prompt Selection<br />

• Prompts for recording:<br />

Collection without transcription<br />

• Prompts should be:<br />

Easy to say (no hard words, numerals etc)<br />

Rich in variability<br />

22


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> <strong>Language</strong> Model Building<br />

• Three LM building options<br />

Interpolate base LMs<br />

Add additional data: concatenate text corpora and build one LM<br />

Do not use additional data<br />

• Evaluation:<br />

Perplexity, OOV Rate, Ngram Coverage, Vocabulary Size<br />

5-fold Cross Validation, Test File<br />

23


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> <strong>Language</strong> Model Building<br />

24


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> <strong>Language</strong> Model Building<br />

• Interpolate LMs, LM Evaluation …<br />

25


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> <strong>Language</strong> Model Building<br />

• More transparency for the LM building process<br />

26


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> <strong>Language</strong> Model Building<br />

27


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Build <strong>Language</strong> Model <strong>–</strong> LM Management<br />

• Build LMs from text selected in „Text Management“ and interpolate<br />

those<br />

28


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Build <strong>Language</strong> Model <strong>–</strong> LM Management<br />

manually<br />

selected<br />

dev set<br />

LM1<br />

weights<br />

LM2 … text1 text2 …<br />

interpolate<br />

29<br />

LM


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong>


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

Overview <strong>–</strong> Automatic Speech Recognition<br />

31


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Phoneme Selection<br />

• Selection from standard IPA chart<br />

• User’s names for phonemes<br />

Can match their lexicon (if one exists)<br />

Can match their familiarity<br />

• Audio feedback<br />

Click to hear recording of each phone<br />

32


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Phoneme Selection<br />

33


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Lexicon Pronunciation Creation <strong>–</strong> G2P Rules<br />

34


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Lexicon Pronunciation Creation <strong>–</strong> Lex Learner<br />

35


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

Overview <strong>–</strong> Automatic Speech Recognition<br />

36


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Audio Recording Tool<br />

• Online recording tools<br />

Collaboratively record large number of speakers<br />

Speakers may separate from developers<br />

• Java based for portability<br />

Works with “many” browsers<br />

• Visual feedback during recording<br />

• Automatic upload on completion<br />

37


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Audio Collection<br />

38


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Acoustic Model Building<br />

• Acoustic Model Building requires:<br />

Recorded Speech Data<br />

Phone set definition<br />

Pronunciation Lexicon<br />

• Two step process:<br />

1. Configuration<br />

2. Model Training<br />

39


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Acoustic Model Building - Configuration<br />

• Checks dependencies and errors<br />

Lexicon and phone set correspond<br />

Words in recorded prompts are covered by the lexicon<br />

• Divides the recorded data into training and test sets<br />

• Performance evaluation<br />

Few data: K-fold cross-validation, with K = #speakers<br />

More data: Data split into 90% (train) and 10% (test)<br />

40


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Acoustic Model Building<br />

• Select speakers for test and training set<br />

41


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Acoustic Model Building <strong>–</strong> Training<br />

• Requires successful configuration<br />

• Creates Log files and Displays to the user<br />

All steps of training<br />

• EM Training for Context Independent Models<br />

3-state HMM<br />

Number of Gaussians per Model depends on data<br />

• EM Training for Context Dependent Models<br />

Number of models depends on data<br />

• MFCC front-end (Mel Frequency Cepstral Coefficients),<br />

LDA (linear discriminant analysis)<br />

• Progress of training procedure<br />

• Results of performance evaluation<br />

42


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Acoustic Model Building <strong>–</strong> Training<br />

43


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

Overview <strong>–</strong> Automatic Speech Recognition<br />

44


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Building Synthesis Voice<br />

45


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

<strong>RLAT</strong> <strong>–</strong> Testing Automatic Speech Recognition / Text To Speech<br />

Simple echo back testing function<br />

46


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

Automatic Pronunciation Dictionary Generation from the WWW<br />

• Ongoing work at CSL<br />

Derive pronunciations using IPA fields in different websites<br />

47


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

Text Normalization based on Statistical Machine Translation<br />

and Internet User Support<br />

• Ongoing work at CSL<br />

Web-based user interface for langauge-specific text normalization<br />

Hybrid approach (rules + SMT)<br />

48


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

Studien-/Bachelorarbeit:<br />

Error Blaming in the <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

• Was sind die Schwachstellen des Spracherkenners<br />

und wie können sie behoben werden?<br />

Sprachmodell (hohe OOV Rate, Perplexität)<br />

kleiner Signal-Rausch-Abstand<br />

Akzente<br />

Hesitationen (uhm, ehm) und Fragmente (ha-, hallo)<br />

. . .<br />

49


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

Studien-/Bachelorarbeit:<br />

Error Blaming in the <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

Difficulty<br />

9<br />

8<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

1<br />

0<br />

0 10 20 30 40 50 60 70 80 90 100<br />

Benötigte Kenntnisse:<br />

• Grundlagenwissen Spracherkennung, Statistik<br />

• Linux-Grundkenntnisse<br />

• Programmierkenntnisse, z.B. in Perl oder PHP<br />

• Spaß an Informatik und Linguistik<br />

50<br />

WER<br />

Ab sofort bei:<br />

Tim Schlippe (tim.schlippe@kit.edu)


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

Studien-/Bachelorarbeit: Konzept eines Spiels für die<br />

Sammlung von Ressourcen zum Bau von Spracherkennern<br />

• Konzepte von Spielen für die Sammlung von Sprach-Ressourcen<br />

recherchieren und analysieren<br />

• Implementieren einer Lösung<br />

• Auswertung der Lösung<br />

51


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

Studien-/Bachelorarbeit:<br />

Error Blaming in the <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

Benötigte Kenntnisse:<br />

• Grundlagenwissen Spracherkennung<br />

• Linux-Grundkenntnisse<br />

• Programmierkenntnisse, z.B. in Perl oder PHP<br />

• Spaß an Informatik und Linguistik<br />

Ab sofort bei:<br />

Tim Schlippe (tim.schlippe@kit.edu)<br />

52


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

Thanks for your interest!


<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />

References<br />

Tanja Schultz and Alan Black. 2008. Tutorial on <strong>Rapid</strong> <strong>Language</strong><br />

<strong>Adaptation</strong> Tools for Multilingual Speech Processing Systems.<br />

ICASSP 2008, Las Vegas, USA, 30 March 2008.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!