20.08.2013 Views

Apache UIMA Ruta Guide and Reference - Apache UIMA - The ...

Apache UIMA Ruta Guide and Reference - Apache UIMA - The ...

Apache UIMA Ruta Guide and Reference - Apache UIMA - The ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Learning by Example<br />

This rule annotates everything between two “PERIOD” annotations with the type “Sentence”.<br />

Please note that the resulting annotations are probably invisible, if they start or end with an filtered<br />

type.<br />

Rule elements can contain more then one condition. <strong>The</strong> rule in the next example tries to identify<br />

headlines, which are bold, underlined <strong>and</strong> end with a colon.<br />

DECLARE Headline;<br />

Paragraph{CONTAINS(Bold, 90, 100, true),<br />

CONTAINS(Underlined, 90, 100, true), ENDSWITH(COLON)<br />

-> MARK(Headline)};<br />

<strong>The</strong> matching condition of this rule element is given with the type “Paragraph”, thus the rule takes<br />

a look at all Paragraph annotations. <strong>The</strong> rule matches only if the three conditions, separated by<br />

commas, are fulfilled. <strong>The</strong> first condition “CONTAINS(Bold, 90, 100, true)” states that 90%-100%<br />

of the matched paragraph annotation should also be annotated with annotations of the type “Bold”.<br />

<strong>The</strong> boolean parameter “true” indicates that amount of Bold annotations should be calculated<br />

relatively to the matched annotation. <strong>The</strong> two numbers “90,100” are, therefore, interpreted as<br />

percent amounts. <strong>The</strong> exact calculation of the coverage is dependent on the tokenization of the<br />

document <strong>and</strong> is neglected for now. <strong>The</strong> second condition “CONTAINS(Underlined, 90, 100,<br />

true)” consequently states that the paragraph should also contain at least 90% of annotations of<br />

the type “underlined”. <strong>The</strong> third condition “ENDSWITH(COLON)” finally forces the Paragraph<br />

annotation to end with a colon. It is only fulfilled, if there is an annotation of the type “COLON”,<br />

which has an end offset equal to the end offset of the matched Paragraph annotation.<br />

<strong>The</strong> readability <strong>and</strong> maintenance of rules does not increase, if more conditions are added. One<br />

of the strengths of the <strong>UIMA</strong> <strong>Ruta</strong> language is that it provides different approaches to solve an<br />

annotation task. <strong>The</strong> next two examples introduce actions for transformation-based rules.<br />

Headline{-CONTAINS(W) -> UNMARK(Headline)};<br />

This rule consists of one condition <strong>and</strong> one action. <strong>The</strong> condition “-CONTAINS(W)” is negated<br />

(indicated by the character “-”), <strong>and</strong> is therefore only fulfilled, if there are no annotations of the<br />

type “W” within the bound of the matched Headline annotation. <strong>The</strong> action “UNMARK(Headline)”<br />

removes the matched Headline annotation. Put into simple words, headlines that contain no words<br />

at all are not headlines.<br />

<strong>The</strong> next rule does not remove an annotation, but changes its offsets dependent on the context.<br />

Headline{-> SHIFT(Headline, 1, 2)} COLON;<br />

Here, the action “SHIFT(Headline, 1, 2)” exp<strong>and</strong>s the matched Headline annotation to the next<br />

colon, if that Headline annotation is followed by a COLON annotation.<br />

<strong>UIMA</strong> <strong>Ruta</strong> rules can contain arbitrary conditions <strong>and</strong> actions, which is illustrated by the next<br />

example.<br />

DECLARE Month, Year, Date;<br />

ANY{INLIST(MonthsList) -> MARK(Month), MARK(Date,1,3)}<br />

PERIOD? NUM{REGEXP(".{2,4}") -> MARK(Year))};<br />

This rule consists of three rule elements. <strong>The</strong> first one matches on every token, which has a<br />

covered text that occurs in a word lists named “MonthsList”. <strong>The</strong> second rule element is optional<br />

<strong>and</strong> does not need to be fulfilled, which is indicated by the quantifier “?”. <strong>The</strong> last rule element<br />

<strong>UIMA</strong> <strong>Ruta</strong> Version 2.0.1 <strong>Apache</strong> <strong>UIMA</strong> <strong>Ruta</strong> Overview 5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!