haga click aquí - Amprae
haga click aquí - Amprae
haga click aquí - Amprae
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
An algorithm for automatic segmentation of speech<br />
signals into syllables: a comparison between Italian<br />
and Castillan results<br />
Cutugno, Francesco; Origlia, Antonio<br />
LUSI-Lab@Dipartimento di Scienze Fisiche - Università di Napoli "Federico II" - Italia<br />
In this paper we present an algorithm for speech syllabification using an<br />
approach which is purely deterministic and language and content independent.<br />
Following a research line concerning the investigation about the role of<br />
intensity temporal patterns (Petrillo&Cutugno 2003) we show here a rule based<br />
system using as features: a)the position of the most relevant energy maxima, b)<br />
harmonicity measurements, c) syllable nuclei detection,d) fine marker<br />
positioning strategies.<br />
Central to the algorithm is the analysis of the energy temporal profile which is<br />
obtained by shifting a window on the speech signal every 10 ms and smoothing<br />
the derived pattern with a 45 ms Gaussian window. The obtained pattern will<br />
include a series of maxima among which we have to run a decision process to<br />
select proper nuclei candidates. This choice is based on the combined analysis<br />
of the slope of the energy rise and the height of the following energy valley and<br />
of the data derived by the Harmonicity analysis.<br />
Syllable boundaries are then placed at the energy minima between consecutive<br />
syllable nuclei candidates available at the end of the selection process.<br />
However there are several cases in which this process requires some attention:<br />
in order to cover all cases, the syllable boundary is set corresponding to the<br />
minimum value of the energy derivative between the previous syllable nucleus<br />
and the current energy minimum.<br />
The algorithm is implemented as a Praat Script and it is possible to use it in a<br />
batch procedure in order to process large speech corpora. The script is freely<br />
available for research purposes.<br />
The tool has been tested on two corpora, one for Italian for which an accurate<br />
manual labeling at phone level is available and one for Spanish Castillan. The<br />
Italian speech corpus is CLIPS (Corpora e Lessici di Italiano Parlato e Scritto -<br />
Savy&Cutugno 2008) while the Spanish one is SES (Spanish Emotional<br />
Speech - Montero et al. 1998).<br />
The evaluation of the two automatic segmentation was performed using a<br />
modified version of Petek's algorithm (Petek,1996) which compares a series of<br />
temporal markers produced by a manual segmentation process with the<br />
automatically obtained one. Errors are expressed in terms of misplacement,<br />
insertion and deletion. Misplacements are considered as substitution of a<br />
correct marker with a wrong one if the absolute distance between the two is<br />
greater than a fixed threshold.<br />
Results on CLIPS for Italian are the following: Substitutions 2.9%, Deletions<br />
10.2%, Insertions 7.4% with a final accuracy of 79.5%.<br />
Insertion and Deletion errors are respectively in connection with the<br />
phenomenon of syllable Split in presence of particularly long nuclei, and<br />
Coupling which occur when a consonantal onset in a given syllable is lenited<br />
and consequently the usual features indicating a sonority valley are missing. An<br />
accurate description of this phenomenon will be offered in the paper. In<br />
63