RLAT – Rapid Language Adaptation Toolkit
RLAT – Rapid Language Adaptation Toolkit
RLAT – Rapid Language Adaptation Toolkit
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
Tim Schlippe<br />
24 June 2010
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong>
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
Outline<br />
• Introduction<br />
• Text Collection and Selection<br />
Crawling<br />
Snapshot function<br />
Recrawling<br />
Text management<br />
Prompt Selection<br />
• <strong>Language</strong> Model Building<br />
<strong>Language</strong> model management<br />
• Phoneme selection<br />
• Lexicon Pronunciation Creation<br />
G2P Rules<br />
Lex Learner<br />
3
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
Outline<br />
• Audio Collection<br />
• Acoustic Model Building<br />
Configuration<br />
Training<br />
• Building Synthesis Voice<br />
• Testing Automatic Speech Recognition / Text To Speech<br />
• Ongoing work at CSL<br />
4
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
• Bridge the gap between technology experts → language experts<br />
Automatic Speech Recognition (ASR)<br />
Machine Translation (MT)<br />
Text-to-Speech (TTS)<br />
• Develop web-based intelligent systems<br />
Interactive Learning with user in the loop<br />
<strong>Rapid</strong> <strong>Adaptation</strong> of universal models to unseen languages<br />
• <strong>RLAT</strong> webpage: http://csl.ira.uka.de/rlat-dev<br />
5
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
• Collects:<br />
Appropriate text data<br />
Appropriate audio data<br />
• Defines:<br />
Phoneme set<br />
Rich prompt set<br />
Lexical pronunciations<br />
6<br />
• Produces:<br />
Pronunciation model<br />
ASR acoustic model<br />
ASR language model<br />
TTS voice<br />
• Maintains:<br />
Projects and users login<br />
Data and models
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Project<br />
7
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
Overview <strong>–</strong> Automatic Speech Recognition<br />
Front End<br />
(Preprocessing)<br />
Acoustic<br />
Model<br />
8<br />
Decoder<br />
(Search)<br />
Lexicon /<br />
Dictionary<br />
<strong>Language</strong><br />
Model<br />
Text
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
Overview <strong>–</strong> Automatic Speech Recognition<br />
9
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection<br />
Goal:<br />
• Get as much relevant text data as possible<br />
• Use the text data for<br />
Generating recording prompts<br />
Generating vocabulary lists<br />
Build <strong>Language</strong> Models for ASR<br />
• Possibilities to obtain text data:<br />
Web crawler (web pages and search engines)<br />
Upload (local) / download texts (remote)<br />
• <strong>Language</strong> encoding<br />
To deal with very common alphabets internally all utf-8<br />
10
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Crawling<br />
• Interface for automatic text collection<br />
Upload function, Web crawler, Recrawl: boost data from similar sites<br />
• Filtering:<br />
<strong>Language</strong> Identification<br />
Remove HTML tags, XML, JavaScripts, …<br />
Convert from different character encodings (ISO-8859, UTF8, …) and<br />
file formats into raw text in UTF-8<br />
Text Normalization<br />
11
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Crawling<br />
• Link Level Depth, Upload Test File to evaluate text beeing<br />
crawled, Upload File with URLs and Search Terms, Clock …<br />
12
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Snapshot function<br />
• Snapshot functionality<br />
Informative and visual feedback about the quality of text beeing crawled<br />
Results which indicate the quality (PP, OOV) of the collected texts are computed and<br />
displayed periodically (to be defined by the user) during the crawling process<br />
13
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Snapshot function<br />
• Snapshot functionality<br />
Informative and visual feedback about the quality of text beeing crawled<br />
Results which indicate the quality (PP, OOV) of the collected texts are computed and<br />
displayed periodically (to be defined by the user) during the crawling process<br />
14
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Snapshot function<br />
Crawl<br />
Every 10 mins<br />
normalize<br />
Every selected period<br />
15<br />
Copy, cat and build LM
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Snapshot function<br />
• Example: 18 days of Crawling French Data: www.leparisien.fr<br />
7.0<br />
6.0<br />
5.0<br />
4.0<br />
3.0<br />
2.0<br />
1.0<br />
0.0<br />
OOV Rate (%) of collected data<br />
on a French dev set<br />
OOV Rate n-gram coverage<br />
days<br />
16<br />
n-gram coverages (%) of collected data<br />
on a French dev set<br />
days
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Crawling<br />
• Crawling of French Text Data: www.leparisien.fr<br />
300 K<br />
250 K<br />
200 K<br />
150 K<br />
100 K<br />
50 K<br />
Vocabulary size of collected data Total words of collected data<br />
17<br />
400 Mio<br />
350 Mio<br />
300 Mio<br />
250 Mio<br />
200 Mio<br />
150 Mio<br />
100 Mio<br />
50 Mio
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Recrawling<br />
• Boost data from similar sites<br />
Approach<br />
1. User supplies an URL to <strong>RLAT</strong> for crawling<br />
2. Crawler retrieves N documents (web pages)<br />
3. Compute the statistics (TF-IDF) from the N documents<br />
4. Terms with highest TF-IDF score form query terms<br />
5. Query search engine (Google) to get the URLs for the query terms<br />
6. Crawl the URLs for the data<br />
18
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Recrawling<br />
19
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Text Management<br />
• Allow to store and handle multiple text files<br />
20
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Text Collection and Selection <strong>–</strong> Text Management<br />
www.tagesschau.de_2010-03-18_15.33.info<br />
21
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Prompt Selection<br />
• Prompts for recording:<br />
Collection without transcription<br />
• Prompts should be:<br />
Easy to say (no hard words, numerals etc)<br />
Rich in variability<br />
22
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> <strong>Language</strong> Model Building<br />
• Three LM building options<br />
Interpolate base LMs<br />
Add additional data: concatenate text corpora and build one LM<br />
Do not use additional data<br />
• Evaluation:<br />
Perplexity, OOV Rate, Ngram Coverage, Vocabulary Size<br />
5-fold Cross Validation, Test File<br />
23
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> <strong>Language</strong> Model Building<br />
24
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> <strong>Language</strong> Model Building<br />
• Interpolate LMs, LM Evaluation …<br />
25
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> <strong>Language</strong> Model Building<br />
• More transparency for the LM building process<br />
26
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> <strong>Language</strong> Model Building<br />
27
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Build <strong>Language</strong> Model <strong>–</strong> LM Management<br />
• Build LMs from text selected in „Text Management“ and interpolate<br />
those<br />
28
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Build <strong>Language</strong> Model <strong>–</strong> LM Management<br />
manually<br />
selected<br />
dev set<br />
LM1<br />
weights<br />
LM2 … text1 text2 …<br />
interpolate<br />
29<br />
LM
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong>
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
Overview <strong>–</strong> Automatic Speech Recognition<br />
31
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Phoneme Selection<br />
• Selection from standard IPA chart<br />
• User’s names for phonemes<br />
Can match their lexicon (if one exists)<br />
Can match their familiarity<br />
• Audio feedback<br />
Click to hear recording of each phone<br />
32
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Phoneme Selection<br />
33
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Lexicon Pronunciation Creation <strong>–</strong> G2P Rules<br />
34
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Lexicon Pronunciation Creation <strong>–</strong> Lex Learner<br />
35
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
Overview <strong>–</strong> Automatic Speech Recognition<br />
36
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Audio Recording Tool<br />
• Online recording tools<br />
Collaboratively record large number of speakers<br />
Speakers may separate from developers<br />
• Java based for portability<br />
Works with “many” browsers<br />
• Visual feedback during recording<br />
• Automatic upload on completion<br />
37
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Audio Collection<br />
38
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Acoustic Model Building<br />
• Acoustic Model Building requires:<br />
Recorded Speech Data<br />
Phone set definition<br />
Pronunciation Lexicon<br />
• Two step process:<br />
1. Configuration<br />
2. Model Training<br />
39
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Acoustic Model Building - Configuration<br />
• Checks dependencies and errors<br />
Lexicon and phone set correspond<br />
Words in recorded prompts are covered by the lexicon<br />
• Divides the recorded data into training and test sets<br />
• Performance evaluation<br />
Few data: K-fold cross-validation, with K = #speakers<br />
More data: Data split into 90% (train) and 10% (test)<br />
40
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Acoustic Model Building<br />
• Select speakers for test and training set<br />
41
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Acoustic Model Building <strong>–</strong> Training<br />
• Requires successful configuration<br />
• Creates Log files and Displays to the user<br />
All steps of training<br />
• EM Training for Context Independent Models<br />
3-state HMM<br />
Number of Gaussians per Model depends on data<br />
• EM Training for Context Dependent Models<br />
Number of models depends on data<br />
• MFCC front-end (Mel Frequency Cepstral Coefficients),<br />
LDA (linear discriminant analysis)<br />
• Progress of training procedure<br />
• Results of performance evaluation<br />
42
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Acoustic Model Building <strong>–</strong> Training<br />
43
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
Overview <strong>–</strong> Automatic Speech Recognition<br />
44
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Building Synthesis Voice<br />
45
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
<strong>RLAT</strong> <strong>–</strong> Testing Automatic Speech Recognition / Text To Speech<br />
Simple echo back testing function<br />
46
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
Automatic Pronunciation Dictionary Generation from the WWW<br />
• Ongoing work at CSL<br />
Derive pronunciations using IPA fields in different websites<br />
47
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
Text Normalization based on Statistical Machine Translation<br />
and Internet User Support<br />
• Ongoing work at CSL<br />
Web-based user interface for langauge-specific text normalization<br />
Hybrid approach (rules + SMT)<br />
48
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
Studien-/Bachelorarbeit:<br />
Error Blaming in the <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
• Was sind die Schwachstellen des Spracherkenners<br />
und wie können sie behoben werden?<br />
Sprachmodell (hohe OOV Rate, Perplexität)<br />
kleiner Signal-Rausch-Abstand<br />
Akzente<br />
Hesitationen (uhm, ehm) und Fragmente (ha-, hallo)<br />
. . .<br />
49
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
Studien-/Bachelorarbeit:<br />
Error Blaming in the <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
Difficulty<br />
9<br />
8<br />
7<br />
6<br />
5<br />
4<br />
3<br />
2<br />
1<br />
0<br />
0 10 20 30 40 50 60 70 80 90 100<br />
Benötigte Kenntnisse:<br />
• Grundlagenwissen Spracherkennung, Statistik<br />
• Linux-Grundkenntnisse<br />
• Programmierkenntnisse, z.B. in Perl oder PHP<br />
• Spaß an Informatik und Linguistik<br />
50<br />
WER<br />
Ab sofort bei:<br />
Tim Schlippe (tim.schlippe@kit.edu)
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
Studien-/Bachelorarbeit: Konzept eines Spiels für die<br />
Sammlung von Ressourcen zum Bau von Spracherkennern<br />
• Konzepte von Spielen für die Sammlung von Sprach-Ressourcen<br />
recherchieren und analysieren<br />
• Implementieren einer Lösung<br />
• Auswertung der Lösung<br />
51
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
Studien-/Bachelorarbeit:<br />
Error Blaming in the <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
Benötigte Kenntnisse:<br />
• Grundlagenwissen Spracherkennung<br />
• Linux-Grundkenntnisse<br />
• Programmierkenntnisse, z.B. in Perl oder PHP<br />
• Spaß an Informatik und Linguistik<br />
Ab sofort bei:<br />
Tim Schlippe (tim.schlippe@kit.edu)<br />
52
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
Thanks for your interest!
<strong>RLAT</strong> <strong>–</strong> <strong>Rapid</strong> <strong>Language</strong> <strong>Adaptation</strong> <strong>Toolkit</strong><br />
References<br />
Tanja Schultz and Alan Black. 2008. Tutorial on <strong>Rapid</strong> <strong>Language</strong><br />
<strong>Adaptation</strong> Tools for Multilingual Speech Processing Systems.<br />
ICASSP 2008, Las Vegas, USA, 30 March 2008.