13.06.2013 Views

haga click aquí - Amprae

haga click aquí - Amprae

haga click aquí - Amprae

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

EasyAlign Spanish: an (semi-)automatic segmentation<br />

tool under Praat<br />

Goldman, Jean-Philippe; Schwab, Sandra<br />

Université de Genève<br />

The purpose of phonetic alignment is to determine the time position of phone<br />

boundaries in a speech corpus on the basis of the audio recording and its<br />

orthographic transcription. Aligned corpora are widely used in various speech<br />

applications like automatic speech recognition, speech synthesis, as well as<br />

prosodic and phonetic research. Although manual segmentation constitutes the<br />

more accurate method, it requires a large amount of time for the human labeller.<br />

Thus, various automatic methods are now used as they are not only much<br />

quicker, but their results are also reproducible and consistent throughout a large<br />

corpus.<br />

EasyAlign has been developed within Praat in order to provide an ergonomic<br />

automatic segmentation tool, easy to use for computer science non-specialists.<br />

It is freely distributed as a self-installable plug-in and it is available for French,<br />

English, Brazilian Portuguese and recently for Castilian Spanish. EasyAlign<br />

consists in a group of tools which successively perform three automatic steps<br />

with some minor manual verifications and adjustments to ensure better quality:<br />

utterance segmentation, grapheme-to-phoneme conversion and phonetic<br />

segmentation. From the orthographic transcription, the utterance segmentation<br />

process -which is language-independent- generates a TextGrid with a unique<br />

tier, in which each interval encloses one utterance. Then, the graphemephoneme<br />

conversion step creates a second tier with the phonetic transcription<br />

of the utterances. For this language-specific process, we used for Castilian<br />

Spanish the phonetizer SAGA (Moreno & Mariño 1998) which provides a<br />

detailed phonetic transcription in SAMPA. Finally, in the phonetic segmentation<br />

step, the Viterbi-based HVite tool (within HTK) is called to align each utterance<br />

to its phonetic sequence. For Castilian Spanish, the acoustic models were<br />

trained on the basis of about 360 minutes of unaligned multi-speaker speech for<br />

which a verified phonetic transcription was provided. The phonetic<br />

segmentation process simultaneously generates a phone tier and a word tier.<br />

Additionally, a syllable tier is created on the basis of sonority-based rules for<br />

syllable segmentation.<br />

EasyAlign performances have been evaluated on the basis of a corpus of 12<br />

minutes (one minute of 12 speakers: 6 "internal" speakers from the training<br />

corpus and 6 new "external" speakers), which was manually annotated by<br />

phonetic experts and compared to the automatic alignement. Evaluation was<br />

performed according to three approaches. Firstly, a boundary-based evaluation<br />

showed that 60% of the differences between automatic and manual boundaries<br />

lie within 10ms (and 86% within 20ms). Little difference was observed when<br />

comparing internal and external speakers, which shows a good generalization<br />

of the training. Secondly, a duration-based evaluation revealed that the<br />

difference of automatic/manual phone durations, which has a standard deviation<br />

of 20ms, is similar in internal and external speakers. Finally, a segment-based<br />

evaluation showed that the median of the so-called "overlapping-rate" -a<br />

speech-rate independent measure- reaches 0.74, with little difference between<br />

internal (0.75) and external (0.72) speakers.<br />

65

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!