03.05.2014 Views

Have a break! Modelling pauses in German Speech. - ResearchGate

Have a break! Modelling pauses in German Speech. - ResearchGate

Have a break! Modelling pauses in German Speech. - ResearchGate

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

has to perform speech <strong>in</strong> a noisy environment,<br />

she will <strong>in</strong>sert more and more significant =<br />

longer <strong>pauses</strong> and hesitations than if she has to<br />

read a text under more quiescent conditions.<br />

As just mentioned above, <strong>pauses</strong> are an<br />

imortant prosodic cue for mark<strong>in</strong>g boundaries <strong>in</strong><br />

order to improve comprehension. However, it<br />

has to be emphasised that not only <strong>pauses</strong> but<br />

also pitch changes, pre-boundary lengthen<strong>in</strong>g,<br />

decl<strong>in</strong>ation reset or <strong>in</strong>tensity shap<strong>in</strong>g are used as<br />

prosodic means for the structur<strong>in</strong>g of utterances.<br />

In a ToBI oriented approach (cf. Beckmann &<br />

Ayers 1994) all these factors are considered as<br />

physical manifestations of abstract <strong>break</strong><br />

<strong>in</strong>dices. If we th<strong>in</strong>k about prosodic structur<strong>in</strong>g,<br />

the concept of hav<strong>in</strong>g a <strong>break</strong> appears to be the<br />

more fundamental one. Therefore, beside the<br />

realisation of <strong>pauses</strong>, we will also consider the<br />

occurrence of boundary tones <strong>in</strong> the present<br />

study.<br />

2. Aims of the present study<br />

In the present study we <strong>in</strong>vestigate how <strong>in</strong>terlexical,<br />

silent <strong>pauses</strong> are distributed <strong>in</strong> natural<br />

speech. Intra-segmental <strong>pauses</strong> as a result of<br />

plosive formation as well as <strong>pauses</strong> occurr<strong>in</strong>g at<br />

the end of an utterance, which may have some<br />

significance for the higher structur<strong>in</strong>g at the<br />

textual level, are not considered here. The basis<br />

or our model is a corpus of read speech, which is<br />

compared to the <strong>pauses</strong> predicted by the MARY<br />

text-to-speech system. Apart from determ<strong>in</strong><strong>in</strong>g<br />

whether a certa<strong>in</strong> structurally determ<strong>in</strong>ed <strong>break</strong><br />

<strong>in</strong>dex is to be actually realised as a pause,<br />

special focus is put on the duration of these<br />

<strong>pauses</strong>. The crucial question is how the length of<br />

a pause depends on the surround<strong>in</strong>g structural<br />

properties, such as Part-of-<strong>Speech</strong> of the word<br />

before the pause, different punctuation signs or<br />

position of the pause <strong>in</strong> the sentence. This<br />

<strong>in</strong>formation is then used to guidel<strong>in</strong>e the<br />

automatic tra<strong>in</strong><strong>in</strong>g of a prediction-model. The<br />

ma<strong>in</strong> goal beh<strong>in</strong>d is to develop a method to<br />

improve the performance of pause control <strong>in</strong> a<br />

text-to-speech synthesizer. We <strong>in</strong>vestigate which<br />

features should be used for such a model and<br />

how it should be tra<strong>in</strong>ed to reach a good<br />

performance. We will use modell<strong>in</strong>g by the<br />

CART method on both tasks, first tra<strong>in</strong><strong>in</strong>g on a<br />

Boolean value <strong>in</strong> order to decide whether a<br />

pause has to be realised, and <strong>in</strong> a second step<br />

determ<strong>in</strong><strong>in</strong>g the duration of the pause. In the<br />

follow<strong>in</strong>g we will describe the specific<br />

components used <strong>in</strong> the present study.<br />

3. Components<br />

3.1 The Vienna prosodic speech corpus<br />

The corpus was established dur<strong>in</strong>g a project<br />

primarily concerned with the modell<strong>in</strong>g of the<br />

duration of phonetic segments. 1 It comprises<br />

approx. 50.000 phones (1.2 hours of actual<br />

speech) of Standard Austrian <strong>German</strong> spoken by<br />

a s<strong>in</strong>gle non-professional speaker. The corpus<br />

consists of 3 different types of read material for<br />

different purposes of analysis:<br />

phonetically balanced text: the classical<br />

“Nordw<strong>in</strong>d und Sonne” and<br />

“Buttergeschichte” and 300 isolated<br />

sentences<br />

material where the <strong>in</strong>formation-structure is<br />

controlled for the analysis of the relation<br />

between focus structure and prosody.<br />

22 contiguous texts from newspaper.<br />

The segmentation and annotation of phonetic<br />

labels was done semi-automatically and was also<br />

controlled manually. The corpus is also<br />

annotated for <strong>in</strong>tonation (ToBI-labels, Fujisaki<br />

accent and phrase commands), prom<strong>in</strong>ence<br />

(manual labels for each syllable) Information on<br />

l<strong>in</strong>guistic structure (feet, syllable structure, etc.)<br />

is dynamically derived and stored <strong>in</strong> the database.<br />

S<strong>in</strong>ce the corpus was designed for a slightly<br />

different purpose we could not use the full<br />

corpus for our experiments. Due to the short<br />

length of many utterances there are not always<br />

chances to have a rest, and also read<strong>in</strong>g style<br />

<strong>in</strong>hibits the <strong>in</strong>sertion of <strong>pauses</strong> as compared to<br />

for example free dialogues. After exclud<strong>in</strong>g<br />

those irrelevant samples we ended up with a<br />

subcorpus of 347 sentences with potential pause<br />

candidates actually used to tra<strong>in</strong> our models.<br />

1<br />

Segmental Duration <strong>in</strong> <strong>German</strong> <strong>Speech</strong><br />

(SpeeDurCont). This project has been sponsored by<br />

the FWF, Grant No. P13224. The Austrian Research<br />

Institute for Artificial Intelligence is supported by the<br />

Austrian Federal M<strong>in</strong>istry of Education, Science and<br />

Culture.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!