18.07.2013 Views

The Corpus Thread - Det Danske Sprog- og Litteraturselskab

The Corpus Thread - Det Danske Sprog- og Litteraturselskab

The Corpus Thread - Det Danske Sprog- og Litteraturselskab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.3. Filling in the header 50<br />

Legal values<br />

Value Description<br />

nil Info has not been determined yet. Default<br />

empty Info is irrelevant, non-existent, or undeterminable<br />

pretokenizer Splits a text into word-like segments. A pretokenizer is only<br />

applied once, all other applications are based on the<br />

pretokenized version of the text<br />

tokenizer Splits a text into word-like segments<br />

s-splitter Sentence splitter. Splits the text into sentences, i.e. a segment<br />

between two full stops or some similar type of punctuation.<br />

Inserts and tags around sentence-like text segments<br />

p-splitter Paragraph splitter. Splits the text into paragraphs. Inserts<br />

and tags around paragraph-like text segments<br />

regularizer Tags a token with a regularised version of its surface<br />

representation, i.e. its orth<strong>og</strong>raphy<br />

lemmatizer Tags a token with its lemma form<br />

pos-tagger Tags a token with part-of-speech info<br />

morph-tagger Tags a token with morphol<strong>og</strong>ical/inflectional info<br />

term-tagger Tags a token with some indication of whether it is a term (in<br />

multi-<br />

processor<br />

texts to be included in LSP corpora)<br />

Multifunctional tool that performs various tasks like<br />

tokenizing, lemmatizing, tagging as one complex process<br />

other Tool performing tasks not yet listed<br />

⊲ appType<br />

Specifies whether an application or procedure that operated on the<br />

text was automatic (or a combination of both) as well as the type of<br />

task of the application/procedure in terms of segmentation or annotation.<br />

Properties<br />

Value set<br />

type<br />

enumerated, closed<br />

XML name vs_appType.xml

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!