20.08.2013 Views

Apache UIMA Ruta Guide and Reference - Apache UIMA - The ...

Apache UIMA Ruta Guide and Reference - Apache UIMA - The ...

Apache UIMA Ruta Guide and Reference - Apache UIMA - The ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Basic annotations <strong>and</strong> tokens<br />

hierarchy (blue) are created by the lexer. Each leaf is an own type, but also inherits the types of the<br />

abstract annotation types further up in the hierarchy. <strong>The</strong> leaf types are described in more detail<br />

in Table 2.2, “Annotations created by lexer” [21]. Each text unit within an input document<br />

belongs to exactly one of these annotation types.<br />

Table 2.1. Abstract annotations<br />

Figure 2.1. Basic token hierarchy<br />

Annotation Parent Description<br />

ALL - parent type of all tokens<br />

ANY ALL all tokens except for markup<br />

W ANY all kinds of words<br />

PM ANY all kinds of punctuation marks<br />

WS ANY all kinds of white spaces<br />

SENTENCEEND PM all kinds of punctuation marks that indicate the end of a<br />

sentence<br />

Table 2.2. Annotations created by lexer<br />

Annotation Parent Description Example<br />

MARKUP ALL HTML <strong>and</strong> XML<br />

elements<br />

NBSP ANY non breaking space " "<br />

<br />

<strong>UIMA</strong> <strong>Ruta</strong> Version 2.0.1 <strong>Apache</strong> <strong>UIMA</strong> <strong>Ruta</strong> Language 21

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!