Have a break! Modelling pauses in German Speech. - ResearchGate
Have a break! Modelling pauses in German Speech. - ResearchGate
Have a break! Modelling pauses in German Speech. - ResearchGate
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
syllables from the beg<strong>in</strong>n<strong>in</strong>g of the sentence<br />
until the pause produce the best results (cor =<br />
0.46002). The feature “Motivation for the<br />
MARY system to set a <strong>break</strong> <strong>in</strong>dex” has the<br />
values ‘pause before conjunction’, ‘pause after<br />
Vorfeld’ and seven different punctuation signs<br />
[i.e. , - ) ( : ‘ “ ]. The feature « numbers of<br />
syllables » was pooled <strong>in</strong>to <strong>pauses</strong> before the<br />
10 th syllable and <strong>pauses</strong> after the 10 th syllable.<br />
This number was also suggested by E. Zvonik et<br />
al.(2002), as a border for the production of short<br />
(before the 10 th syllable) and longer <strong>pauses</strong> (after<br />
the 10 th syllable).<br />
In the second part of the learn<strong>in</strong>g procedure<br />
(i.e. determ<strong>in</strong>ation of the duration of a pause) we<br />
used the follow<strong>in</strong>g features (for subfeatures see<br />
Table 2):<br />
- G2p-method of the word before the pause<br />
- The syntactic attachment of the word<br />
preced<strong>in</strong>g the pause<br />
- The syntactic phrase preced<strong>in</strong>g the pause<br />
- The number of words preced<strong>in</strong>g the pause<br />
- The motivation for the MARY system to set<br />
a <strong>break</strong> <strong>in</strong>dex.<br />
This assortment of features leads to the best<br />
results with cor = 0.42. 5 Evaluat<strong>in</strong>g the results<br />
shows still a rather low correlation. Two<br />
strategies, us<strong>in</strong>g a larger corpus for the mach<strong>in</strong>e<br />
learn<strong>in</strong>g technique and reth<strong>in</strong>k<strong>in</strong>g of the features<br />
used for tra<strong>in</strong><strong>in</strong>g may be applied <strong>in</strong> future<br />
research <strong>in</strong> order to arrive at higher correlation<br />
values.<br />
5. Conclusion<br />
In the present study we evaluated the occurrence<br />
of <strong>pauses</strong> and the length of <strong>pauses</strong> made by a<br />
natural speaker aga<strong>in</strong>st a current speechsynthesizer<br />
(MARY). In comparison to the<br />
MARY system, the natural speaker realises<br />
significantly fewer <strong>pauses</strong> at prosodic phrase<br />
5<br />
When compar<strong>in</strong>g these results with correlation<br />
coefficents usually achieved when us<strong>in</strong>g similar<br />
techniques to the prediction of the duration of<br />
phonetic segments the result seems rather poor. But it<br />
has to be kept <strong>in</strong> m<strong>in</strong>d, that there are much fewer<br />
tra<strong>in</strong>ung-samples <strong>in</strong> this case, and that the durarion of<br />
<strong>pauses</strong> displays a much higher variance than a<br />
speech-sound with its clear limitations on m<strong>in</strong>imal<br />
and maximal <strong>in</strong>herent duration.<br />
boundaries, and rather relies on prosodic means<br />
like f<strong>in</strong>al lengthen<strong>in</strong>g and boundary tones.<br />
Though these results <strong>in</strong>dicate that the<br />
speech synthesis system uses <strong>pauses</strong> to a much<br />
greater extent than the natural speaker, it will be<br />
necessary to perform perception experiments <strong>in</strong><br />
order to evaluate if and <strong>in</strong> which contexts this<br />
additional redundancy <strong>in</strong> MARY’s current<br />
sett<strong>in</strong>g is to be avoided.<br />
Furthermore, we have shown that the<br />
natural speaker makes significant differences <strong>in</strong><br />
pause duration depend<strong>in</strong>g on the surround<strong>in</strong>g<br />
environment. We used this data for tra<strong>in</strong><strong>in</strong>g a<br />
CART-based predictor for pause durations.<br />
Us<strong>in</strong>g this method, we were able to set very f<strong>in</strong>e<br />
graded <strong>pauses</strong> accord<strong>in</strong>g to the structural<br />
environment the pause would occur <strong>in</strong>.<br />
References<br />
Beckman, M. E., Ayers, G. M. Guidel<strong>in</strong>es for ToBI<br />
Labell<strong>in</strong>g. Onl<strong>in</strong>e MS and accompany<strong>in</strong>g files,<br />
1994. Available at:<br />
http://www.l<strong>in</strong>g.ohio-state.edu/phonetics/E_ToBI<br />
Breiman, L., Friedman, J., Olshen, R., and Stone,<br />
C. J., Classification and Regression Trees,<br />
Chapman and Hall, New York, 1984.<br />
Goldman-Eisler, Psychol<strong>in</strong>guistics: Experiments <strong>in</strong><br />
spontaneous speech. NY: Academic Press. 1968.<br />
Grice, Mart<strong>in</strong>e, Baumann, S., “Deutsche Intonation<br />
und GToBI.” L<strong>in</strong>guistische Berichte 191, pp. 267-<br />
298, 2002.<br />
Neubarth F., Alter K., Pirker H., Rieder E., Trost H.:<br />
“The Vienna Prosodic <strong>Speech</strong> Corpus: Purpose,<br />
Content and Encod<strong>in</strong>g,” <strong>in</strong> Zuehlke W. et al. (eds.),<br />
Konvens 2000 – Sprachkommunikation, VDE<br />
Verlag, Berl<strong>in</strong>, pp.191-96, 2000.<br />
Schröder, Mark, Trouva<strong>in</strong>, Jürgen: “The <strong>German</strong><br />
Text-to-<strong>Speech</strong> Synthesis System MARY: A Tool<br />
for Research, Development and Teach<strong>in</strong>g,”<br />
International Journal of <strong>Speech</strong> Technology, (6)<br />
pp. 365-377, 2003.<br />
Schiller, A., Teufel, S., Stöckert, C., Thielen, C.<br />
Guidel<strong>in</strong>es für das Tagg<strong>in</strong>g deutscher Textcorpora.<br />
University of Stuttgart / University of Tüb<strong>in</strong>gen,<br />
1999.<br />
Zellner, B., “Pauses and the temporal structure of<br />
speech,” <strong>in</strong> E.Kellner (Ed.) Fundamentals of<br />
speech synthesis and speech recognition, pp.41-62.<br />
1994.<br />
E. Zvonik and F. Cumm<strong>in</strong>s, “Pause Duration and<br />
Variability <strong>in</strong> Read Texts,” <strong>in</strong> Proceed<strong>in</strong>gs of<br />
ICSLP’02, pp.1109-1112. 2002.