Apache UIMA Ruta Guide and Reference - Apache UIMA - The ...
Apache UIMA Ruta Guide and Reference - Apache UIMA - The ...
Apache UIMA Ruta Guide and Reference - Apache UIMA - The ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Learning by Example<br />
This rule annotates everything between two “PERIOD” annotations with the type “Sentence”.<br />
Please note that the resulting annotations are probably invisible, if they start or end with an filtered<br />
type.<br />
Rule elements can contain more then one condition. <strong>The</strong> rule in the next example tries to identify<br />
headlines, which are bold, underlined <strong>and</strong> end with a colon.<br />
DECLARE Headline;<br />
Paragraph{CONTAINS(Bold, 90, 100, true),<br />
CONTAINS(Underlined, 90, 100, true), ENDSWITH(COLON)<br />
-> MARK(Headline)};<br />
<strong>The</strong> matching condition of this rule element is given with the type “Paragraph”, thus the rule takes<br />
a look at all Paragraph annotations. <strong>The</strong> rule matches only if the three conditions, separated by<br />
commas, are fulfilled. <strong>The</strong> first condition “CONTAINS(Bold, 90, 100, true)” states that 90%-100%<br />
of the matched paragraph annotation should also be annotated with annotations of the type “Bold”.<br />
<strong>The</strong> boolean parameter “true” indicates that amount of Bold annotations should be calculated<br />
relatively to the matched annotation. <strong>The</strong> two numbers “90,100” are, therefore, interpreted as<br />
percent amounts. <strong>The</strong> exact calculation of the coverage is dependent on the tokenization of the<br />
document <strong>and</strong> is neglected for now. <strong>The</strong> second condition “CONTAINS(Underlined, 90, 100,<br />
true)” consequently states that the paragraph should also contain at least 90% of annotations of<br />
the type “underlined”. <strong>The</strong> third condition “ENDSWITH(COLON)” finally forces the Paragraph<br />
annotation to end with a colon. It is only fulfilled, if there is an annotation of the type “COLON”,<br />
which has an end offset equal to the end offset of the matched Paragraph annotation.<br />
<strong>The</strong> readability <strong>and</strong> maintenance of rules does not increase, if more conditions are added. One<br />
of the strengths of the <strong>UIMA</strong> <strong>Ruta</strong> language is that it provides different approaches to solve an<br />
annotation task. <strong>The</strong> next two examples introduce actions for transformation-based rules.<br />
Headline{-CONTAINS(W) -> UNMARK(Headline)};<br />
This rule consists of one condition <strong>and</strong> one action. <strong>The</strong> condition “-CONTAINS(W)” is negated<br />
(indicated by the character “-”), <strong>and</strong> is therefore only fulfilled, if there are no annotations of the<br />
type “W” within the bound of the matched Headline annotation. <strong>The</strong> action “UNMARK(Headline)”<br />
removes the matched Headline annotation. Put into simple words, headlines that contain no words<br />
at all are not headlines.<br />
<strong>The</strong> next rule does not remove an annotation, but changes its offsets dependent on the context.<br />
Headline{-> SHIFT(Headline, 1, 2)} COLON;<br />
Here, the action “SHIFT(Headline, 1, 2)” exp<strong>and</strong>s the matched Headline annotation to the next<br />
colon, if that Headline annotation is followed by a COLON annotation.<br />
<strong>UIMA</strong> <strong>Ruta</strong> rules can contain arbitrary conditions <strong>and</strong> actions, which is illustrated by the next<br />
example.<br />
DECLARE Month, Year, Date;<br />
ANY{INLIST(MonthsList) -> MARK(Month), MARK(Date,1,3)}<br />
PERIOD? NUM{REGEXP(".{2,4}") -> MARK(Year))};<br />
This rule consists of three rule elements. <strong>The</strong> first one matches on every token, which has a<br />
covered text that occurs in a word lists named “MonthsList”. <strong>The</strong> second rule element is optional<br />
<strong>and</strong> does not need to be fulfilled, which is indicated by the quantifier “?”. <strong>The</strong> last rule element<br />
<strong>UIMA</strong> <strong>Ruta</strong> Version 2.0.1 <strong>Apache</strong> <strong>UIMA</strong> <strong>Ruta</strong> Overview 5