19.11.2012 Views

Best Practices for Speech Corpora in Linguistic Research Workshop ...

Best Practices for Speech Corpora in Linguistic Research Workshop ...

Best Practices for Speech Corpora in Linguistic Research Workshop ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

transcriptions of each hit, but also the video with<br />

audio.<br />

Figure 5: More of a results w<strong>in</strong>dow<br />

There are several other ways of show<strong>in</strong>g the<br />

results. One very useful one is the map view<br />

(us<strong>in</strong>g Google Map technology). It shows the<br />

phonetic variations of one orthographic search.<br />

This is a very easy and enlighten<strong>in</strong>g way of<br />

view<strong>in</strong>g isoglosses.<br />

We illustrate with Figure 6. Here we have<br />

simply looked up the word ikke 'not' <strong>in</strong><br />

Norwegian. There are more than 30 000 hits <strong>in</strong><br />

the corpus, and obviously impossible to quickly<br />

make a manual overview. But the map approach<br />

helps us. We have chosen to display three types<br />

of the phonetic variants on the map:<br />

1) the versions pronouns with a fricative or<br />

affricate /ç,ʃ/ <strong>in</strong>stead of the word-<strong>in</strong>ternal stop<br />

(red markers), <strong>for</strong> example: /iʃe/.<br />

2) those that have fricatives and affricates<br />

followed by a nasal (yellow markers), <strong>for</strong><br />

example: /<strong>in</strong>çe/.<br />

3) those that are pronounced with the stop (black<br />

markers), <strong>for</strong> example: /ike/.<br />

There are many more possibilities with regard to<br />

presentation of results, as well as results<br />

handl<strong>in</strong>g, and download<strong>in</strong>g.<br />

4<br />

Figure 6: Map that show results <strong>for</strong> three<br />

different k<strong>in</strong>ds of pronunciations of ikke 'not'.<br />

8. Conclusion<br />

The Nordic Dialect Corpus shows the<br />

importance of <strong>in</strong>volv<strong>in</strong>g the end users <strong>in</strong> the<br />

development of a corpus. In our case, many of<br />

the <strong>in</strong>volved l<strong>in</strong>guists were not experienced<br />

corpus users be<strong>for</strong>ehand, but could still deliver<br />

highly valuable <strong>in</strong>put regard<strong>in</strong>g what would be<br />

desirable features of the corpus. In the end, that<br />

has led to a highly advanced tool <strong>for</strong> l<strong>in</strong>guistic<br />

research.<br />

9. References<br />

Christ, Oli. 1994. A modular and flexible<br />

architecture <strong>for</strong> an <strong>in</strong>tegrated corpus query<br />

system. COM-PLEX’94, Budapest.<br />

Evert, Stefan. 2005. The CQP Query Language<br />

Tutorial. Institute <strong>for</strong> Natural Language<br />

Process<strong>in</strong>g, University of Stuttgart. URL<br />

www.ims.unistutgart.de/projekte/CorpusWork<br />

bench/CQPTutorial.<br />

Johannessen, Janne Bondi, Lars Nygaard, Joel<br />

Priestley and Anders Nøklestad. 2008. Glossa:<br />

a Multil<strong>in</strong>gual, Multimodal, Configurable<br />

User Inter-face. In: Proceed<strong>in</strong>gs of the Sixth<br />

International Language Resources and<br />

Evaluation (LREC'08). Paris: European<br />

Language Resources Association (ELRA).<br />

Johannessen, Janne Bondi, Joel Priestley,<br />

Krist<strong>in</strong> Hagen, Tor Anders Åfarli, and Øyste<strong>in</strong><br />

Alexander Vangsnes. 2009. The Nordic<br />

Dialect Corpus - an Advanced <strong>Research</strong> Tool.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!