Have a break! Modelling pauses in German Speech. - ResearchGate
Have a break! Modelling pauses in German Speech. - ResearchGate
Have a break! Modelling pauses in German Speech. - ResearchGate
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
has to perform speech <strong>in</strong> a noisy environment,<br />
she will <strong>in</strong>sert more and more significant =<br />
longer <strong>pauses</strong> and hesitations than if she has to<br />
read a text under more quiescent conditions.<br />
As just mentioned above, <strong>pauses</strong> are an<br />
imortant prosodic cue for mark<strong>in</strong>g boundaries <strong>in</strong><br />
order to improve comprehension. However, it<br />
has to be emphasised that not only <strong>pauses</strong> but<br />
also pitch changes, pre-boundary lengthen<strong>in</strong>g,<br />
decl<strong>in</strong>ation reset or <strong>in</strong>tensity shap<strong>in</strong>g are used as<br />
prosodic means for the structur<strong>in</strong>g of utterances.<br />
In a ToBI oriented approach (cf. Beckmann &<br />
Ayers 1994) all these factors are considered as<br />
physical manifestations of abstract <strong>break</strong><br />
<strong>in</strong>dices. If we th<strong>in</strong>k about prosodic structur<strong>in</strong>g,<br />
the concept of hav<strong>in</strong>g a <strong>break</strong> appears to be the<br />
more fundamental one. Therefore, beside the<br />
realisation of <strong>pauses</strong>, we will also consider the<br />
occurrence of boundary tones <strong>in</strong> the present<br />
study.<br />
2. Aims of the present study<br />
In the present study we <strong>in</strong>vestigate how <strong>in</strong>terlexical,<br />
silent <strong>pauses</strong> are distributed <strong>in</strong> natural<br />
speech. Intra-segmental <strong>pauses</strong> as a result of<br />
plosive formation as well as <strong>pauses</strong> occurr<strong>in</strong>g at<br />
the end of an utterance, which may have some<br />
significance for the higher structur<strong>in</strong>g at the<br />
textual level, are not considered here. The basis<br />
or our model is a corpus of read speech, which is<br />
compared to the <strong>pauses</strong> predicted by the MARY<br />
text-to-speech system. Apart from determ<strong>in</strong><strong>in</strong>g<br />
whether a certa<strong>in</strong> structurally determ<strong>in</strong>ed <strong>break</strong><br />
<strong>in</strong>dex is to be actually realised as a pause,<br />
special focus is put on the duration of these<br />
<strong>pauses</strong>. The crucial question is how the length of<br />
a pause depends on the surround<strong>in</strong>g structural<br />
properties, such as Part-of-<strong>Speech</strong> of the word<br />
before the pause, different punctuation signs or<br />
position of the pause <strong>in</strong> the sentence. This<br />
<strong>in</strong>formation is then used to guidel<strong>in</strong>e the<br />
automatic tra<strong>in</strong><strong>in</strong>g of a prediction-model. The<br />
ma<strong>in</strong> goal beh<strong>in</strong>d is to develop a method to<br />
improve the performance of pause control <strong>in</strong> a<br />
text-to-speech synthesizer. We <strong>in</strong>vestigate which<br />
features should be used for such a model and<br />
how it should be tra<strong>in</strong>ed to reach a good<br />
performance. We will use modell<strong>in</strong>g by the<br />
CART method on both tasks, first tra<strong>in</strong><strong>in</strong>g on a<br />
Boolean value <strong>in</strong> order to decide whether a<br />
pause has to be realised, and <strong>in</strong> a second step<br />
determ<strong>in</strong><strong>in</strong>g the duration of the pause. In the<br />
follow<strong>in</strong>g we will describe the specific<br />
components used <strong>in</strong> the present study.<br />
3. Components<br />
3.1 The Vienna prosodic speech corpus<br />
The corpus was established dur<strong>in</strong>g a project<br />
primarily concerned with the modell<strong>in</strong>g of the<br />
duration of phonetic segments. 1 It comprises<br />
approx. 50.000 phones (1.2 hours of actual<br />
speech) of Standard Austrian <strong>German</strong> spoken by<br />
a s<strong>in</strong>gle non-professional speaker. The corpus<br />
consists of 3 different types of read material for<br />
different purposes of analysis:<br />
phonetically balanced text: the classical<br />
“Nordw<strong>in</strong>d und Sonne” and<br />
“Buttergeschichte” and 300 isolated<br />
sentences<br />
material where the <strong>in</strong>formation-structure is<br />
controlled for the analysis of the relation<br />
between focus structure and prosody.<br />
22 contiguous texts from newspaper.<br />
The segmentation and annotation of phonetic<br />
labels was done semi-automatically and was also<br />
controlled manually. The corpus is also<br />
annotated for <strong>in</strong>tonation (ToBI-labels, Fujisaki<br />
accent and phrase commands), prom<strong>in</strong>ence<br />
(manual labels for each syllable) Information on<br />
l<strong>in</strong>guistic structure (feet, syllable structure, etc.)<br />
is dynamically derived and stored <strong>in</strong> the database.<br />
S<strong>in</strong>ce the corpus was designed for a slightly<br />
different purpose we could not use the full<br />
corpus for our experiments. Due to the short<br />
length of many utterances there are not always<br />
chances to have a rest, and also read<strong>in</strong>g style<br />
<strong>in</strong>hibits the <strong>in</strong>sertion of <strong>pauses</strong> as compared to<br />
for example free dialogues. After exclud<strong>in</strong>g<br />
those irrelevant samples we ended up with a<br />
subcorpus of 347 sentences with potential pause<br />
candidates actually used to tra<strong>in</strong> our models.<br />
1<br />
Segmental Duration <strong>in</strong> <strong>German</strong> <strong>Speech</strong><br />
(SpeeDurCont). This project has been sponsored by<br />
the FWF, Grant No. P13224. The Austrian Research<br />
Institute for Artificial Intelligence is supported by the<br />
Austrian Federal M<strong>in</strong>istry of Education, Science and<br />
Culture.