03.05.2014 Views

Have a break! Modelling pauses in German Speech. - ResearchGate

Have a break! Modelling pauses in German Speech. - ResearchGate

Have a break! Modelling pauses in German Speech. - ResearchGate

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

A 0 A 1 A 2 A 3<br />

KON P1 P2 VORFELD<br />

Figure 6 : Boxplot of the feature « syntactic<br />

attachment of the word preced<strong>in</strong>g the pause ».<br />

Pauses<br />

before the<br />

5 th word<br />

Pauses<br />

b etween<br />

the 5 th<br />

and the<br />

9 th word<br />

Pauses<br />

after the<br />

9 th word<br />

Figure 7 : Boxplot of the feature « number of<br />

words preced<strong>in</strong>g the pause ».<br />

Pauses<br />

before the<br />

13 syllable<br />

Pauses<br />

after the<br />

13 syllable<br />

Figure 8 : Boxplot of the feature « number of<br />

syllables preced<strong>in</strong>g the pause ».<br />

Figure 9 : Boxplot of the feature « motivation<br />

for the MARY system to set a <strong>break</strong> <strong>in</strong>dex »<br />

KON = Before conjunctions; P1 = { , : ) - };<br />

P2 = { ( ‘ “ }; VORFELD = ‘Vorfeld’<br />

5.3 <strong>Modell<strong>in</strong>g</strong> <strong>pauses</strong><br />

For the determ<strong>in</strong>ation whether a pause occurs <strong>in</strong><br />

a given position and the prediction of the<br />

duration of this pause we used the well known<br />

mach<strong>in</strong>e learn<strong>in</strong>g technique CART<br />

[classification and regression trees (Breiman, L.,<br />

Friedman, J., Olshen, R., and Stone, C. J.,<br />

1984)]. As tra<strong>in</strong><strong>in</strong>g data we use those sentences<br />

of the SpeeDurCont-corpus, where <strong>pauses</strong><br />

occur<strong>in</strong>g <strong>in</strong> the natural speech are also predicted<br />

by the MARY system. This data is divided<br />

randomly <strong>in</strong>to 10 equally sized subsets. Each of<br />

these subsets was used as a test sample <strong>in</strong> a 10-<br />

fold cross validation rout<strong>in</strong>e. For the evaluation<br />

of the performance of this mach<strong>in</strong>e learn<strong>in</strong>g<br />

algorithm we used the correlation value (cor)<br />

between the predicted and the orig<strong>in</strong>al values of<br />

the test sample. For tra<strong>in</strong><strong>in</strong>g and evaluation we<br />

used the CART tree package provided by the<br />

statistical tool R 4 . In a prelim<strong>in</strong>ary step the<br />

candidates for a pause (each position between<br />

two words) are filtered aga<strong>in</strong>st the criterion that<br />

a <strong>break</strong> <strong>in</strong>dex 3 or 4 is provided by the MARY<br />

algorithm. Then, the occurrence of a pause and<br />

its duration are learned.<br />

For the first part of the learn<strong>in</strong>g procedure<br />

(i.e. determ<strong>in</strong><strong>in</strong>g whether a pause occurs or not)<br />

the two features (1) Motivation for the MARY<br />

system to set a <strong>break</strong> <strong>in</strong>dex and (2) number of<br />

4<br />

R is available under the terms of the Free Software<br />

Foundation’s GNU General Public License, see:<br />

http://www.r-project.org

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!