19.11.2012 Views

Best Practices for Speech Corpora in Linguistic Research Workshop ...

Best Practices for Speech Corpora in Linguistic Research Workshop ...

Best Practices for Speech Corpora in Linguistic Research Workshop ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

f<strong>in</strong>ished their thesis.<br />

The number of external people who requested access to<br />

‘level 3’ resources over recent years was not that high. We<br />

need to see <strong>in</strong> the future whether the regulations that are<br />

currently <strong>in</strong> place can and should be ma<strong>in</strong>ta<strong>in</strong>ed as<br />

expla<strong>in</strong>ed. Access regulations rema<strong>in</strong> a highly sensitive<br />

area, where the technical possibilities opened up by us<strong>in</strong>g<br />

web-based technologies need to be carefully balanced<br />

aga<strong>in</strong>st the ethical and legal responsibilities which<br />

archivists and depositors have towards the speech<br />

communities. Despite almost 10 years of on-go<strong>in</strong>g<br />

discussions and debate, no simple solution to this problem<br />

has yet been found.<br />

6. Conclusion<br />

<strong>Speech</strong> corpora <strong>in</strong>volve many <strong>in</strong>tricate questions on<br />

various levels, from corpus design via annotation,<br />

metadata and organization, to data preservation and<br />

dissem<strong>in</strong>ation, and <strong>in</strong>clude legal and ethical issues. This<br />

paper addressed some of them from the technical po<strong>in</strong>t of<br />

view of an archive and software development team also<br />

engaged <strong>in</strong> build<strong>in</strong>g a federated <strong>in</strong>frastructure <strong>for</strong><br />

language resources.<br />

7. Acknowledgements<br />

The Language Archive is a unit of the Max Planck<br />

Institute <strong>for</strong> Psychol<strong>in</strong>guistics (MPI-PL), funded by the<br />

Max Planck Society (MPG), the Berl<strong>in</strong>-Brandenburg<br />

Academy of Sciences (BBAW) and the Royal<br />

Netherlands Academy of Sciences (KNAW). The DOBES<br />

project is funded by the Volkswagen Foundation. The<br />

CLARIN pilot phase was f<strong>in</strong>anced by the European<br />

Commission, the national CLARIN projects are funded<br />

by the <strong>in</strong>dividual member states.<br />

8. References<br />

AMS: Archive Management System. Onl<strong>in</strong>e at<br />

tla.mpi.nl/tools/tla-tools/ams.<br />

ARBIL: Metada Editor <strong>for</strong> IMDI and CMDI. Onl<strong>in</strong>e at:<br />

tla.mpi.nl/tools/tla-tools/arbil.<br />

Broeder, D., Claus, A., Offenga, F., Skiba, R., Trilsbeek,<br />

P., & Wittenburg, P. (2006). LAMUS: The Language<br />

Archive Management and Upload System. In<br />

Proceed<strong>in</strong>gs of the 5th International Conference on<br />

Language Resources and Evaluation (LREC 2006) (pp.<br />

2291-2294).<br />

CGN Design: Page “Corpusopbouw” on the pages of the<br />

Corpus Gesproken Nederlands (CGN). (2006). Onl<strong>in</strong>e:<br />

tst.<strong>in</strong>l.nl/cgndocs/doc_Dutch/topics/design/<strong>in</strong>dex.htm.<br />

Last visited 2.4.2012.<br />

BBAW: Berl<strong>in</strong>-Brandenburgische Akademie der<br />

Wissenschaften. www.bbaw.de. Last visited 23.3.2012.<br />

CLARIN: Common Language Resources and Technology<br />

Infrastructure. www.clar<strong>in</strong>.eu. Last visited 23.3.2012.<br />

DARIAH: Digital <strong>Research</strong> Infrastructure <strong>for</strong> the Arts and<br />

Humanities. Onl<strong>in</strong>e at: www.dariah.eu.<br />

DOBES: Dokumentation Bedrohter Sprachen.<br />

www.mpi.nl/dobes. Last visited 23.3.2012.<br />

Haug, G. and Schnell, S. (2011). Annotations us<strong>in</strong>g<br />

71<br />

GRAID (Grammatical Relations and Animacy <strong>in</strong><br />

Discourse). Introduction and guidel<strong>in</strong>es <strong>for</strong> annotators.<br />

Version 6.0. Onl<strong>in</strong>e at<br />

www.l<strong>in</strong>guistik.uni-kiel.de/GRAID_manual6.0_08sept<br />

.pdf. Last visited 4.4.2012.<br />

ISOcat: ISO Data Category Registry: www.isocat.org<br />

Kemps-Snijders, M., W<strong>in</strong>dhouwer, M. A., Wittenburg, P.,<br />

Wright, S.E. (2008). ISOcat: Corrall<strong>in</strong>g Data<br />

Categories <strong>in</strong> the Wild. In European Language<br />

Resources Association (ELRA) (ed), Proceed<strong>in</strong>gs of<br />

the Sixth International Conference on Language<br />

Resources and Evaluation (LREC 2008), Marrakech,<br />

Morocco, May 28-30, 2008.<br />

KNAW: Kon<strong>in</strong>klijke Nederlandse Akademie van<br />

Wetenschappen. www.knaw.nl. Last visited 23.3.2012.<br />

LAMUS: Language Archive Management and Upload<br />

System. Onl<strong>in</strong>e at: tla.mpi.nl/tools/tla-tools/lamus.<br />

Lieb, H, Drude, S. (2001). Advanced Gloss<strong>in</strong>g –<br />

A Language Documentation Format. DOBES Work<strong>in</strong>g<br />

Papers 1. Onl<strong>in</strong>e at<br />

www.mpi.nl/DOBES/documents/Advanced-Gloss<strong>in</strong>g1<br />

.pdf. Last visited 2012-04-02.<br />

MacWh<strong>in</strong>ney, B. (2000). The CHILDES Project: Tools<br />

<strong>for</strong> Analyz<strong>in</strong>g Talk. 3rd Edition. Mahwah, NJ:<br />

Lawrence Erlbaum Associates. Newest version onl<strong>in</strong>e<br />

at: childes.psy.cmu.edu/manuals/chat.pdf. Last visited<br />

2012-04-01. (CHAT manual)<br />

MPG: Max-Planck-Gesellschaft. www.mpg.de. Last<br />

visited 23.3.2012.<br />

Max Planck Institute <strong>for</strong> Psychol<strong>in</strong>guistics (2005).<br />

DOBES Code of Conduct. Compiled by Peter<br />

Wittenburg with the assistance of several experts.<br />

www.mpi.nl/DOBES/ethical_legal_aspects/DOBES-c<br />

oc-v2.pdf, updated on 6/01/2006. Last visited<br />

2012-04-01.<br />

MPI-PL: Max Planck Institute <strong>for</strong> Psychol<strong>in</strong>guistics.<br />

www.mpi.nl/. Last visited 23.3.2012.<br />

RELcat: a Relation Registry <strong>for</strong> l<strong>in</strong>guistics concepts:<br />

http://lux13.mpi.nl/relcat/site/<strong>in</strong>dex.html<br />

TLA: The Language Archive at the Max-Planck Institute<br />

<strong>for</strong> Psychol<strong>in</strong>guistics. tla.mpi.nl. Last visited<br />

23.3.2012.<br />

W<strong>in</strong>dhouwer, M.A. (2012). RELcat: a Relation Registry<br />

<strong>for</strong> ISOcat data categories. Accepted <strong>for</strong> a poster and<br />

demonstration at the Eigth International Conference on<br />

Language Resources and Evaluation LREC 2012,<br />

Istambul, May 2012.<br />

Wittenburg, P., Brugman, H., Russel, A., Klassmann, A.,<br />

Sloetjes, H. (2006). ELAN: a Professional Framework<br />

<strong>for</strong> Multimodality <strong>Research</strong>. In: Proceed<strong>in</strong>gs of LREC<br />

2006, Fifth International Conference on Language<br />

Resources and Evaluation.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!