29.01.2013 Aufrufe

Mehrsprachigkeit in Europa: Plurilinguismo in Europa ... - EURAC

Mehrsprachigkeit in Europa: Plurilinguismo in Europa ... - EURAC

Mehrsprachigkeit in Europa: Plurilinguismo in Europa ... - EURAC

MEHR ANZEIGEN
WENIGER ANZEIGEN

Erfolgreiche ePaper selbst erstellen

Machen Sie aus Ihren PDF Publikationen ein blätterbares Flipbook mit unserer einzigartigen Google optimierten e-Paper Software.

Multil<strong>in</strong>gual Corpus for Term<strong>in</strong>ology Work: The LexALP Corpus of Legal Texts<br />

• text type: legal texts<br />

Experts for each legal system selected the latest versions and amendments of legal texts<br />

relevant to spatial plann<strong>in</strong>g and susta<strong>in</strong>able development at the time of corpus collection<br />

(summer 2005). As these issues are regulated at different levels, also the collected documents<br />

represent different levels of legislation: for <strong>in</strong>stance, whereas landscape protection issues<br />

are regulated at the level of the s<strong>in</strong>gle Länder <strong>in</strong> Austria, <strong>in</strong> France such strong delegation<br />

of competences is not implemented. As a consequence, the Austrian corpus section conta<strong>in</strong>s<br />

no federal legislation while the French section is composed only of French national law<br />

codes.<br />

Table 1 shows the number of documents for each legal system. In addition, the legal levels<br />

regional (R), national (N) and supranational (SUP), which all documents are classifi ed by, are<br />

specifi ed. The three hierarchical levels are not represented equally among the different legal<br />

systems. As expla<strong>in</strong>ed above, this is due to the fact that different legal systems exhibit a<br />

different <strong>in</strong>tr<strong>in</strong>sic organisation.<br />

legal<br />

system<br />

number<br />

of docs<br />

levels10 R+N N<br />

AT CH DE FR IT SI Alp. Conv. INT EU total<br />

612 119 62 612 490 213 40 149 795 3092<br />

N+R<br />

(Bavaria) N<br />

N+R<br />

N SUP SUP SUP<br />

(Friuli V. G.)<br />

10 Table 1: Distribution of corpus documents over legal systems<br />

Altogether, the number of texts collected amounts to 3,095 and comprises <strong>in</strong> total 18,597,942<br />

words <strong>in</strong> four languages and for 9 legal systems.<br />

3. Corpus architecture<br />

In the same way <strong>in</strong> which the content of a corpus has to be selected carefully (see section<br />

2), decisions on corpus architecture are highly <strong>in</strong>fl uenced by the requirements of term<strong>in</strong>ology<br />

work. Language material employed <strong>in</strong> term<strong>in</strong>ology retrieval and analysis has to account for<br />

transparency of its orig<strong>in</strong>, type and status. It has to be guaranteed that the (usually large<br />

amounts of) data can be searched exhaustively and that all <strong>in</strong>formation is accessible <strong>in</strong> a<br />

structured manner. Next to work<strong>in</strong>g with sentence-like text units (as provided by keyword <strong>in</strong><br />

context searches, so-called KWIC searches) the opportunity to also look at a wider context or<br />

to even go back to the orig<strong>in</strong>al full text document should be given to the term<strong>in</strong>ologist at any<br />

po<strong>in</strong>t. Whenever multil<strong>in</strong>gual versions of one document are available, it is desirable to allow<br />

for parallel searches <strong>in</strong> two or more languages.<br />

To comply with these demands the corpus is annotated <strong>in</strong> different ways.<br />

3.1. Levels of annotation<br />

Currently the LexALP corpus implements two levels of annotation: meta <strong>in</strong>formation for each<br />

document and <strong>in</strong>formation about text structure. The sets of meta data hold <strong>in</strong>formation for<br />

every s<strong>in</strong>gle document, e.g. the title or pass<strong>in</strong>g date of the law. This <strong>in</strong>formation facilitates<br />

correct citation and thus answers for scientifi c reliability of data. Furthermore, it allows for<br />

retrieval of documents at a later date. In addition, each document is classifi ed <strong>in</strong> respect to a<br />

10 R = regional/Länder level, N = national/Bund level, SUP = supranational level<br />

Multil<strong>in</strong>gualism.<strong>in</strong>db 509 4-12-2006 12:30:20<br />

509

Hurra! Ihre Datei wurde hochgeladen und ist bereit für die Veröffentlichung.

Erfolgreich gespeichert!

Leider ist etwas schief gelaufen!