The Corpus Thread - Det Danske Sprog- og Litteraturselskab
The Corpus Thread - Det Danske Sprog- og Litteraturselskab
The Corpus Thread - Det Danske Sprog- og Litteraturselskab
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
7.2. <strong>The</strong> ePOS tag set for Danish 147<br />
7.2.2.1 Nouns<br />
Nouns are sub-classified into common nouns (C) and proper nouns (P).<br />
Like in PAROLE, gender is considered an inflectional category, in ONC it is<br />
considered an inherent information. <strong>The</strong> ePOS marker pattern for common<br />
nouns is<br />
CAT NUM DEF CAS GEN TMP VOC DEG PER RFL POS<br />
NC + + + + − − − − − −<br />
Number [NUM]: singular, plural.<br />
Definiteness [DEF]: indefinite, definite.<br />
Case [CAS]: unmarked, genitive, fossilized. Fossilized case is found in examples<br />
like til huse, i aftes.<br />
Gender [GEN]: common, neuter.<br />
<strong>The</strong> ePOS marker pattern for proper nouns is<br />
CAT NUM DEF CAS GEN TMP VOC DEG PER RFL POS<br />
NP + + + + - - - - - -<br />
Proper nouns 6 only have marker alternation for case (genitive/unmarked).<br />
However, they are implicitly also marked for number (always singular), definiteness<br />
(always indefinite, and gender (either common or neuter).<br />
As the applied token concept underlying ePOS in its initial form is that<br />
one described in Chapter 4, multiword proper nouns cannot be identified<br />
as one entity but will be tagged token by token, e.g. <strong>Det</strong> [PD:nsu-/----/----]<br />
Kongelige [AC:xsud/----/p---] Bibliotek [NC:nsui/----/----].<br />
In the converted version of the corpus, where multiword tokens have<br />
been split up into individual tokens, case may be dative if the noun is governed<br />
by certain prepositions. This is indicated by an F (≈ ‘fossilized’) as<br />
case marker that has been added to the PAROLE inventory. <strong>The</strong> same applies<br />
to adjectives, cf. til fulde, with the extra PAROLE tag ANPCSF=IU.<br />
Table 7.3: Common nouns<br />
ePOS ONC Flexion PAROLE Examples<br />
NC:siuc/--/---- sg-ubest NCCSU==I radio, katal<strong>og</strong><br />
NC:siun/--/---- sg-ubest NCNSU==I organ, katal<strong>og</strong><br />
NC:sigc/--/---- sg-ubest-gen NCCSG==I radios, katal<strong>og</strong>s<br />
Table continues on next page. . .<br />
6 An adequate lexicon of proper nouns and/or a rec<strong>og</strong>nition algorithm needs to be built.