Have a break! Modelling pauses in German Speech. - ResearchGate
Have a break! Modelling pauses in German Speech. - ResearchGate
Have a break! Modelling pauses in German Speech. - ResearchGate
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
A 0 A 1 A 2 A 3<br />
KON P1 P2 VORFELD<br />
Figure 6 : Boxplot of the feature « syntactic<br />
attachment of the word preced<strong>in</strong>g the pause ».<br />
Pauses<br />
before the<br />
5 th word<br />
Pauses<br />
b etween<br />
the 5 th<br />
and the<br />
9 th word<br />
Pauses<br />
after the<br />
9 th word<br />
Figure 7 : Boxplot of the feature « number of<br />
words preced<strong>in</strong>g the pause ».<br />
Pauses<br />
before the<br />
13 syllable<br />
Pauses<br />
after the<br />
13 syllable<br />
Figure 8 : Boxplot of the feature « number of<br />
syllables preced<strong>in</strong>g the pause ».<br />
Figure 9 : Boxplot of the feature « motivation<br />
for the MARY system to set a <strong>break</strong> <strong>in</strong>dex »<br />
KON = Before conjunctions; P1 = { , : ) - };<br />
P2 = { ( ‘ “ }; VORFELD = ‘Vorfeld’<br />
5.3 <strong>Modell<strong>in</strong>g</strong> <strong>pauses</strong><br />
For the determ<strong>in</strong>ation whether a pause occurs <strong>in</strong><br />
a given position and the prediction of the<br />
duration of this pause we used the well known<br />
mach<strong>in</strong>e learn<strong>in</strong>g technique CART<br />
[classification and regression trees (Breiman, L.,<br />
Friedman, J., Olshen, R., and Stone, C. J.,<br />
1984)]. As tra<strong>in</strong><strong>in</strong>g data we use those sentences<br />
of the SpeeDurCont-corpus, where <strong>pauses</strong><br />
occur<strong>in</strong>g <strong>in</strong> the natural speech are also predicted<br />
by the MARY system. This data is divided<br />
randomly <strong>in</strong>to 10 equally sized subsets. Each of<br />
these subsets was used as a test sample <strong>in</strong> a 10-<br />
fold cross validation rout<strong>in</strong>e. For the evaluation<br />
of the performance of this mach<strong>in</strong>e learn<strong>in</strong>g<br />
algorithm we used the correlation value (cor)<br />
between the predicted and the orig<strong>in</strong>al values of<br />
the test sample. For tra<strong>in</strong><strong>in</strong>g and evaluation we<br />
used the CART tree package provided by the<br />
statistical tool R 4 . In a prelim<strong>in</strong>ary step the<br />
candidates for a pause (each position between<br />
two words) are filtered aga<strong>in</strong>st the criterion that<br />
a <strong>break</strong> <strong>in</strong>dex 3 or 4 is provided by the MARY<br />
algorithm. Then, the occurrence of a pause and<br />
its duration are learned.<br />
For the first part of the learn<strong>in</strong>g procedure<br />
(i.e. determ<strong>in</strong><strong>in</strong>g whether a pause occurs or not)<br />
the two features (1) Motivation for the MARY<br />
system to set a <strong>break</strong> <strong>in</strong>dex and (2) number of<br />
4<br />
R is available under the terms of the Free Software<br />
Foundation’s GNU General Public License, see:<br />
http://www.r-project.org