30.01.2014 Views

Annual Report 2010 - Fachgruppe Informatik an der RWTH Aachen ...

Annual Report 2010 - Fachgruppe Informatik an der RWTH Aachen ...

Annual Report 2010 - Fachgruppe Informatik an der RWTH Aachen ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

the most probable sentence. These modules of <strong>an</strong> automatic speech recognition system (cf.<br />

Figure above) are characterized as follows:<br />

• The acoustic model captures the acoustic properties of speech <strong>an</strong>d provides the probability<br />

of the observed acoustic signal given a hypothesized word sequence. The acoustic model<br />

includes:<br />

• The acoustic <strong>an</strong>alysis which parameterizes the speech input into a sequence of acoustic<br />

vectors.<br />

• Acoustic models for the smallest sub-word units, i.e. phonemes which usually are<br />

modeled in a context dependent way.<br />

• The pronunciation lexicon, which defines the decomposition of the words into the subword<br />

units.<br />

• The l<strong>an</strong>guage model captures the linguistic properties of the l<strong>an</strong>guage <strong>an</strong>d provides the a-<br />

priori probability of a word sequence. From <strong>an</strong> information theoretic point of view,<br />

syntax, sem<strong>an</strong>tics, <strong>an</strong>d pragmatics of the l<strong>an</strong>guage could also be viewed as redund<strong>an</strong>cies.<br />

Statistical methods provide a general framework to model such redund<strong>an</strong>cies robustly.<br />

Therefore state-of-the-art l<strong>an</strong>guage models usually are based on statistical concepts.<br />

• The search realizes Bayes decision criterion on the basis of the acoustic model <strong>an</strong>d the<br />

l<strong>an</strong>guage model. This requires the generation <strong>an</strong>d scoring of competing sentence<br />

hypotheses. To obtain the final recognition result, the main objective then is to search for<br />

that sentence hypothesis with the best score, which is done efficiently using dynamic<br />

programming. The efficiency of the search process is increased by pruning unlikely<br />

hypotheses as early as possible during dynamic programming without affecting the<br />

recognition perform<strong>an</strong>ce.<br />

(a) Speech waveform of the utter<strong>an</strong>ce “Sollen wir am Sonntag nach Berlin fahren”, (b) the<br />

corresponding FFT spectrum<br />

At ‘Lehrstuhl für <strong>Informatik</strong> 6’, the following research directions related to all main areas of<br />

automatic speech recognition (ASR) were pursued in 2008/09:<br />

The generation of the Europe<strong>an</strong> Parliament Plenary Session (EPPS) corpus for speech<br />

recognition <strong>an</strong>d speech-to-speech tr<strong>an</strong>slation was continued for the main Europe<strong>an</strong><br />

l<strong>an</strong>guages. This corpus consists of tr<strong>an</strong>scribed speech <strong>an</strong>d parallel texts in the l<strong>an</strong>guages<br />

241

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!