Apache UIMA Ruta Guide and Reference - Apache UIMA - The ...
Apache UIMA Ruta Guide and Reference - Apache UIMA - The ...
Apache UIMA Ruta Guide and Reference - Apache UIMA - The ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Learning by Example<br />
development or quality maintenance. Sometimes, it is necessary to debug the rules because they do<br />
not match as expected. In this case, the explanation perspective provides views that explain every<br />
detail of the matching process. Finally, the <strong>UIMA</strong> <strong>Ruta</strong> language can also be used by the tooling,<br />
for example, by the “Query” view. Here, <strong>UIMA</strong> <strong>Ruta</strong> rules can be used as query statements in order<br />
to investigate annotated documents.<br />
<strong>UIMA</strong> <strong>Ruta</strong> smoothly integrates with <strong>Apache</strong> <strong>UIMA</strong>. First of all, the <strong>UIMA</strong> <strong>Ruta</strong> rules are applied<br />
using a generic Analysis Engine <strong>and</strong> thus <strong>UIMA</strong> <strong>Ruta</strong> scripts can easily be added to <strong>Apache</strong> <strong>UIMA</strong><br />
pipelines. <strong>UIMA</strong> <strong>Ruta</strong> also provides the functionality to import <strong>and</strong> use other <strong>UIMA</strong> components<br />
like Analysis Engines <strong>and</strong> Type Systems. <strong>UIMA</strong> <strong>Ruta</strong> rules can refer to every type defined in an<br />
imported type system, <strong>and</strong> the <strong>UIMA</strong> <strong>Ruta</strong> Workbench generates a type system descriptor file<br />
containing all types that were defined in a script file. Any Analysis Engine can be executed by rules<br />
as long as their implementation is available in the classpath. <strong>The</strong>refore, functionality outsourced in<br />
an arbitrary Analysis Engine can be added <strong>and</strong> used within <strong>UIMA</strong> <strong>Ruta</strong>.<br />
1.4. Learning by Example<br />
This section gives an introduction to the <strong>UIMA</strong> <strong>Ruta</strong> language by explaining the rule syntax <strong>and</strong><br />
inference with some simplified examples. It is recommended to use the <strong>UIMA</strong> <strong>Ruta</strong> Workbench<br />
to write <strong>UIMA</strong> <strong>Ruta</strong> rules in order to gain advantages like syntax checking. A short description<br />
how to install the <strong>UIMA</strong> <strong>Ruta</strong> Workbench is given here. <strong>The</strong> following examples make use of the<br />
annotations added by the default seeding of the <strong>UIMA</strong> <strong>Ruta</strong> Analysis Engine. <strong>The</strong>ir meaning is<br />
explained along with the examples.<br />
Note: <strong>The</strong> examples in this section are not valid script files as they are missing at<br />
least a package declaration. In order to obtain a valid script file, please ensure that all<br />
used types are imported or declared <strong>and</strong> that a package declaration like “PACKAGE<br />
uima.ruta.example;” is added in the first line of the script.<br />
<strong>The</strong> first example consists of a declaration of a type followed by a simple rule. Type declarations<br />
always start with the keyword “DECLARE” followed by the short name of the new type. <strong>The</strong><br />
namespace of the type is equal to the package declaration of the script file. <strong>The</strong>re is also the<br />
possibility to create more complex types with features or specific parent types, but this will be<br />
neglected for now. In the example, a simple annotation type with the short name “Animal” is<br />
defined. After the declaration of the type, a rule with one rule element is given. <strong>UIMA</strong> <strong>Ruta</strong> rules<br />
in general can consist of a sequence of rule elements. Simple rule elements themselves consist<br />
of four parts: A matching condition, an optional quantifier, an optional list of conditions <strong>and</strong> an<br />
optional list of actions. <strong>The</strong> rule element in the following example has a matching condition “W”,<br />
an annotation type st<strong>and</strong>ing for normal words. Statements like declarations <strong>and</strong> rules always end<br />
with a semicolon.<br />
DECLARE Animal;<br />
W{REGEXP("dog") -> MARK(Animal)};<br />
<strong>The</strong> rule element also contains one condition <strong>and</strong> one action, both surrounded by curly parentheses.<br />
In order to distinguish conditions from actions they are separated by “->”. <strong>The</strong> condition<br />
“REGEXP("dog")” indicates that the matched word must match the regular expression “dog”. If the<br />
matching condition <strong>and</strong> the additional regular expression are fulfilled, then the action is executed,<br />
which creates a new annotation of the type “Animal” with the same offsets as the matched token.<br />
<strong>The</strong> default seeder does actually not add annotations of the type “W”, but annotations of the types<br />
“SW” <strong>and</strong> “CW” for small written words <strong>and</strong> capitalized words, which both have the parent type<br />
“W”.<br />
<strong>UIMA</strong> <strong>Ruta</strong> Version 2.0.1 <strong>Apache</strong> <strong>UIMA</strong> <strong>Ruta</strong> Overview 3