Apache UIMA Ruta Guide and Reference - Apache UIMA - The ...
Apache UIMA Ruta Guide and Reference - Apache UIMA - The ...
Apache UIMA Ruta Guide and Reference - Apache UIMA - The ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Basic annotations <strong>and</strong> tokens<br />
hierarchy (blue) are created by the lexer. Each leaf is an own type, but also inherits the types of the<br />
abstract annotation types further up in the hierarchy. <strong>The</strong> leaf types are described in more detail<br />
in Table 2.2, “Annotations created by lexer” [21]. Each text unit within an input document<br />
belongs to exactly one of these annotation types.<br />
Table 2.1. Abstract annotations<br />
Figure 2.1. Basic token hierarchy<br />
Annotation Parent Description<br />
ALL - parent type of all tokens<br />
ANY ALL all tokens except for markup<br />
W ANY all kinds of words<br />
PM ANY all kinds of punctuation marks<br />
WS ANY all kinds of white spaces<br />
SENTENCEEND PM all kinds of punctuation marks that indicate the end of a<br />
sentence<br />
Table 2.2. Annotations created by lexer<br />
Annotation Parent Description Example<br />
MARKUP ALL HTML <strong>and</strong> XML<br />
elements<br />
NBSP ANY non breaking space " "<br />
<br />
<strong>UIMA</strong> <strong>Ruta</strong> Version 2.0.1 <strong>Apache</strong> <strong>UIMA</strong> <strong>Ruta</strong> Language 21