18.07.2013 Views

The Corpus Thread - Det Danske Sprog- og Litteraturselskab

The Corpus Thread - Det Danske Sprog- og Litteraturselskab

The Corpus Thread - Det Danske Sprog- og Litteraturselskab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.2. Header structure 34<br />

<br />

projectIdentifier<br />

<br />

In the case of new texts captured by WP 2.1 of the DK-CLARIN project, the<br />

value of projectIdentifier is “DK-CLARIN-WP2.1”. Similar fixed contents<br />

are defined for other relevant DK-CLARIN projects and for other finished<br />

projects like DDOC or KORPUS 2000, see Section 3.3.<br />

3.2.2.3 Application information<br />

<strong>The</strong> element gives information about all applications or other<br />

(manual) procedures by which the text sample has been enriched with<br />

markup. <strong>The</strong> header itself may also be manipulated by such applications<br />

or procedures, but this is not registered in the element – this<br />

may however be recorded under, see Section 3.2.4. <strong>The</strong><br />

application information helps determining whether texts are structurally<br />

comparable, i.e. texts that have been processed by the same bundle of<br />

applications and procedures should be structurally identical.<br />

<strong>The</strong> element should be filled in with one empty dummyapplication<br />

if the file just contains the default-segmented (i.e. pretokenized)<br />

version of the text, the so-called base version, however the<br />

whole structure may be left out in this case as well. 18 <strong>The</strong> following<br />

example shows an with one empty dummy-application.<br />

<strong>The</strong> values given are explained further in Section 3.3.2.<br />

<br />

<br />

nil<br />

<br />

<br />

<br />

<br />

Otherwise, there is one element for each annotation layer belonging<br />

to the text in the file, see 4. <strong>The</strong> general structure is as follows:<br />

18 Leaving out is recommended by DK-CLARIN WP 5.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!