19.11.2012 Views

Best Practices for Speech Corpora in Linguistic Research Workshop ...

Best Practices for Speech Corpora in Linguistic Research Workshop ...

Best Practices for Speech Corpora in Linguistic Research Workshop ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

• comparative l<strong>in</strong>guistic research of strategies of read<strong>in</strong>g<br />

and speak<strong>in</strong>g the same texts by different speakers.<br />

• research of prosodic realization variability of express<strong>in</strong>g<br />

syntactic structures and semantic concepts;<br />

• comparative research of the cues <strong>for</strong> detect<strong>in</strong>g expressive<br />

and emotional speech;<br />

• research of morphonological variability.<br />

Recently we applied this framework <strong>for</strong> the task of automatic<br />

prosodic model<strong>in</strong>g of utterance and its prosodic type<br />

identification (Skrel<strong>in</strong> and Kocharov, 2009). To solve that<br />

task we used all available <strong>in</strong><strong>for</strong>mation <strong>in</strong> the corpus exclud<strong>in</strong>g<br />

the canonical phonetic transcription. We automatically<br />

processed the speech and annotation data thus obta<strong>in</strong><strong>in</strong>g<br />

various melodic features. Features essential <strong>for</strong> our analysis<br />

were added to the orig<strong>in</strong>al annotation scheme (see figure<br />

2):<br />

• smoothed and <strong>in</strong>terpolated fundamental frequency values<br />

(level 8);<br />

• extreme values of fundamental frequency (level 9);<br />

• boundaries of melodic movements (level 11);<br />

• ma<strong>in</strong> melodic movements with<strong>in</strong> the utterance correspond<strong>in</strong>g<br />

to the largest drop, the largest rise, the movement<br />

reach<strong>in</strong>g the global m<strong>in</strong>imum, the movement<br />

reach<strong>in</strong>g the global maximum (level 10).<br />

This year we launched a project on creation the articulatory<br />

data corpus of the Russian speech. The speech data will <strong>in</strong>clude<br />

speech signal and articulatory data expressed by both<br />

EMA and video data. The scalable framework allows us<strong>in</strong>g<br />

multimedia data as the annotation is follow<strong>in</strong>g the general<br />

ideas described above. We are able to comb<strong>in</strong>e annotation<br />

of different media <strong>in</strong> one annotation scheme.<br />

4. Conclusion<br />

The comprehensive framework <strong>for</strong> l<strong>in</strong>guistic research is<br />

presented <strong>in</strong> the paper. The major features of the framework<br />

are as follows. The annotation is strictly hierarchical,<br />

scalable and allows the assignment of any number of annotation<br />

attributes to segmental units. This makes it possible<br />

to easily extend the speech corpus by the <strong>in</strong>dividual<br />

automatically produced annotation. There is a possibility<br />

of complex search and extraction of precise relevant slices<br />

of speech data. The output of process<strong>in</strong>g result is l<strong>in</strong>guistically<br />

sensible and could be <strong>in</strong>dividually set up <strong>in</strong> different<br />

cases.<br />

The speech corpora framework is successfully used <strong>for</strong><br />

many various l<strong>in</strong>guistic tasks <strong>in</strong>clud<strong>in</strong>g those concern<strong>in</strong>g<br />

simultaneous process<strong>in</strong>g of different levels of language.<br />

5. Acknowledgements<br />

The authors acknowledge Sa<strong>in</strong>t-Petersburg State University<br />

<strong>for</strong> a research grant # 31.37.106.2011.<br />

46<br />

6. References<br />

Paul Boersma and David Ween<strong>in</strong>k. 2012. Praat: do<strong>in</strong>g<br />

phonetics by computer (version 5.3.04) [computer program].<br />

Liya Bondarko. 2009. Short description of russian sound<br />

system. In Viola de Silva and Riikka Ullakonoja, editors,<br />

Phonetics of Russian and F<strong>in</strong>nish. General Introduction.<br />

Spontaneous and Read-aloud <strong>Speech</strong>, pages 23–37. Peter<br />

Lang GmbH.<br />

N. Grønnum. 2009. A danish phonetically annotated spontaneous<br />

speech corpus (danpass). <strong>Speech</strong> Communication,<br />

51:594–603.<br />

Pavel Skrel<strong>in</strong> and Daniil Kocharov. 2009. Avtomaticheskaya<br />

obrabotka prosodichekogo o<strong>for</strong>mleniya viskazivaniya:<br />

relevantnye priznaki dlya avtomaticheskoj<br />

<strong>in</strong>terpretatsii <strong>in</strong>tonatsionnoj modeli. In Trudy tretiego<br />

mezhdistsipl<strong>in</strong>arnogo sem<strong>in</strong>ara Analiz razgovornoj<br />

russkoj rechi (AR3-2009), pages 41–46, Sa<strong>in</strong>t-<br />

Petersburg.<br />

Pavel Skrel<strong>in</strong>, N<strong>in</strong>a Volskaya, Daniil Kocharov, Kar<strong>in</strong>a Evgrafova,<br />

Olga Glotova, and Vera Evdokimova. 2010.<br />

Corpres – corpus of russian professionally read speech.<br />

In Proceed<strong>in</strong>gs of the 13th International Conference on<br />

Text, <strong>Speech</strong> and Dialogue, pages 386–393, Brno, Czech<br />

Republic. Spr<strong>in</strong>ger Verlag.<br />

Svetlana Tananayko, Daniil Kocharov, and Ksenia<br />

Sadurt<strong>in</strong>ova. 2011. Programma statisticheskoj<br />

obrabobtki korpusa rechevih dannih. In Proceed<strong>in</strong>gs<br />

of the 14th International Conference on <strong>Speech</strong><br />

and Computer, pages 457–462, Kazan, Russia. Moscow<br />

State L<strong>in</strong>guistic University.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!