The Corpus Thread - Det Danske Sprog- og Litteraturselskab
The Corpus Thread - Det Danske Sprog- og Litteraturselskab
The Corpus Thread - Det Danske Sprog- og Litteraturselskab
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
3.2. Header structure 25<br />
great verbosity. In PAROLE-DK, this information instead is virtually part<br />
of the assuming that the distributor is always the<br />
same as the organization responsible for data capture (and is the sponsor).<br />
Here, it is assumed that the sponsor, the collector, and the distributor are<br />
of central importance and that it cannot be taken for granted that these<br />
decisive roles are played by one organization only. However, it is assumed<br />
that these roles are fully sufficient to describe the institutional background<br />
of a potential corpus text. Additional roles may come into play for a whole<br />
corpus or text collection and must be part of the headers of these resources.<br />
OBS! Author and editor information for the source from which a text is<br />
derived (e.g. the author of a book) is not included in the element<br />
but in the element discussed below in Section 3.2.1.5.<br />
3.2.1.2 <strong>The</strong> extent statement<br />
<strong>The</strong> element is used in each text header to specify the size of the<br />
text to which it is attached. <strong>The</strong> size is given as the number of words in the<br />
element, the n attribute is set to “words”. In another element<br />
with the n attribute set to “paragraphs” the number of paragraphs is stated. 8<br />
Other elements measuring extent in other units may be added, but<br />
must be registered as part of the legal inventory described in Section 3.3:<br />
<br />
numberOfWords<br />
numberOfParagraphs<br />
<br />
<strong>The</strong> count given does not include the size of the header itself. <strong>The</strong> number<br />
of words and paragraphs must be mechanically computed prior to insertion<br />
of the text into the text bank.<br />
3.2.1.3 <strong>The</strong> publication statement<br />
<strong>The</strong>element is used to specify publication and availability<br />
information for an electronic text. It contains the following three elements:<br />
supplies the name of a person or agency responsible for<br />
the distribution of a text.<br />
supplies information about the availability of a text, for<br />
example any restrictions on its use or distribution, its copyright status,<br />
etc.<br />
8 This is a necessary extent information particularly for texts which are to be included in<br />
parallel corpora.