18.07.2013 Views

The Corpus Thread - Det Danske Sprog- og Litteraturselskab

The Corpus Thread - Det Danske Sprog- og Litteraturselskab

The Corpus Thread - Det Danske Sprog- og Litteraturselskab

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

7.2. <strong>The</strong> ePOS tag set for Danish 147<br />

7.2.2.1 Nouns<br />

Nouns are sub-classified into common nouns (C) and proper nouns (P).<br />

Like in PAROLE, gender is considered an inflectional category, in ONC it is<br />

considered an inherent information. <strong>The</strong> ePOS marker pattern for common<br />

nouns is<br />

CAT NUM DEF CAS GEN TMP VOC DEG PER RFL POS<br />

NC + + + + − − − − − −<br />

Number [NUM]: singular, plural.<br />

Definiteness [DEF]: indefinite, definite.<br />

Case [CAS]: unmarked, genitive, fossilized. Fossilized case is found in examples<br />

like til huse, i aftes.<br />

Gender [GEN]: common, neuter.<br />

<strong>The</strong> ePOS marker pattern for proper nouns is<br />

CAT NUM DEF CAS GEN TMP VOC DEG PER RFL POS<br />

NP + + + + - - - - - -<br />

Proper nouns 6 only have marker alternation for case (genitive/unmarked).<br />

However, they are implicitly also marked for number (always singular), definiteness<br />

(always indefinite, and gender (either common or neuter).<br />

As the applied token concept underlying ePOS in its initial form is that<br />

one described in Chapter 4, multiword proper nouns cannot be identified<br />

as one entity but will be tagged token by token, e.g. <strong>Det</strong> [PD:nsu-/----/----]<br />

Kongelige [AC:xsud/----/p---] Bibliotek [NC:nsui/----/----].<br />

In the converted version of the corpus, where multiword tokens have<br />

been split up into individual tokens, case may be dative if the noun is governed<br />

by certain prepositions. This is indicated by an F (≈ ‘fossilized’) as<br />

case marker that has been added to the PAROLE inventory. <strong>The</strong> same applies<br />

to adjectives, cf. til fulde, with the extra PAROLE tag ANPCSF=IU.<br />

Table 7.3: Common nouns<br />

ePOS ONC Flexion PAROLE Examples<br />

NC:siuc/--/---- sg-ubest NCCSU==I radio, katal<strong>og</strong><br />

NC:siun/--/---- sg-ubest NCNSU==I organ, katal<strong>og</strong><br />

NC:sigc/--/---- sg-ubest-gen NCCSG==I radios, katal<strong>og</strong>s<br />

Table continues on next page. . .<br />

6 An adequate lexicon of proper nouns and/or a rec<strong>og</strong>nition algorithm needs to be built.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!