The Corpus Thread - Det Danske Sprog- og Litteraturselskab
The Corpus Thread - Det Danske Sprog- og Litteraturselskab
The Corpus Thread - Det Danske Sprog- og Litteraturselskab
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
3.3. Filling in the header 50<br />
Legal values<br />
Value Description<br />
nil Info has not been determined yet. Default<br />
empty Info is irrelevant, non-existent, or undeterminable<br />
pretokenizer Splits a text into word-like segments. A pretokenizer is only<br />
applied once, all other applications are based on the<br />
pretokenized version of the text<br />
tokenizer Splits a text into word-like segments<br />
s-splitter Sentence splitter. Splits the text into sentences, i.e. a segment<br />
between two full stops or some similar type of punctuation.<br />
Inserts and tags around sentence-like text segments<br />
p-splitter Paragraph splitter. Splits the text into paragraphs. Inserts<br />
and tags around paragraph-like text segments<br />
regularizer Tags a token with a regularised version of its surface<br />
representation, i.e. its orth<strong>og</strong>raphy<br />
lemmatizer Tags a token with its lemma form<br />
pos-tagger Tags a token with part-of-speech info<br />
morph-tagger Tags a token with morphol<strong>og</strong>ical/inflectional info<br />
term-tagger Tags a token with some indication of whether it is a term (in<br />
multi-<br />
processor<br />
texts to be included in LSP corpora)<br />
Multifunctional tool that performs various tasks like<br />
tokenizing, lemmatizing, tagging as one complex process<br />
other Tool performing tasks not yet listed<br />
⊲ appType<br />
Specifies whether an application or procedure that operated on the<br />
text was automatic (or a combination of both) as well as the type of<br />
task of the application/procedure in terms of segmentation or annotation.<br />
Properties<br />
Value set<br />
type<br />
enumerated, closed<br />
XML name vs_appType.xml