13.06.2013 Views

haga click aquí - Amprae

haga click aquí - Amprae

haga click aquí - Amprae

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

An algorithm for automatic segmentation of speech<br />

signals into syllables: a comparison between Italian<br />

and Castillan results<br />

Cutugno, Francesco; Origlia, Antonio<br />

LUSI-Lab@Dipartimento di Scienze Fisiche - Università di Napoli "Federico II" - Italia<br />

In this paper we present an algorithm for speech syllabification using an<br />

approach which is purely deterministic and language and content independent.<br />

Following a research line concerning the investigation about the role of<br />

intensity temporal patterns (Petrillo&Cutugno 2003) we show here a rule based<br />

system using as features: a)the position of the most relevant energy maxima, b)<br />

harmonicity measurements, c) syllable nuclei detection,d) fine marker<br />

positioning strategies.<br />

Central to the algorithm is the analysis of the energy temporal profile which is<br />

obtained by shifting a window on the speech signal every 10 ms and smoothing<br />

the derived pattern with a 45 ms Gaussian window. The obtained pattern will<br />

include a series of maxima among which we have to run a decision process to<br />

select proper nuclei candidates. This choice is based on the combined analysis<br />

of the slope of the energy rise and the height of the following energy valley and<br />

of the data derived by the Harmonicity analysis.<br />

Syllable boundaries are then placed at the energy minima between consecutive<br />

syllable nuclei candidates available at the end of the selection process.<br />

However there are several cases in which this process requires some attention:<br />

in order to cover all cases, the syllable boundary is set corresponding to the<br />

minimum value of the energy derivative between the previous syllable nucleus<br />

and the current energy minimum.<br />

The algorithm is implemented as a Praat Script and it is possible to use it in a<br />

batch procedure in order to process large speech corpora. The script is freely<br />

available for research purposes.<br />

The tool has been tested on two corpora, one for Italian for which an accurate<br />

manual labeling at phone level is available and one for Spanish Castillan. The<br />

Italian speech corpus is CLIPS (Corpora e Lessici di Italiano Parlato e Scritto -<br />

Savy&Cutugno 2008) while the Spanish one is SES (Spanish Emotional<br />

Speech - Montero et al. 1998).<br />

The evaluation of the two automatic segmentation was performed using a<br />

modified version of Petek's algorithm (Petek,1996) which compares a series of<br />

temporal markers produced by a manual segmentation process with the<br />

automatically obtained one. Errors are expressed in terms of misplacement,<br />

insertion and deletion. Misplacements are considered as substitution of a<br />

correct marker with a wrong one if the absolute distance between the two is<br />

greater than a fixed threshold.<br />

Results on CLIPS for Italian are the following: Substitutions 2.9%, Deletions<br />

10.2%, Insertions 7.4% with a final accuracy of 79.5%.<br />

Insertion and Deletion errors are respectively in connection with the<br />

phenomenon of syllable Split in presence of particularly long nuclei, and<br />

Coupling which occur when a consonantal onset in a given syllable is lenited<br />

and consequently the usual features indicating a sonority valley are missing. An<br />

accurate description of this phenomenon will be offered in the paper. In<br />

63

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!