20.08.2013 Views

Apache UIMA Ruta Guide and Reference - Apache UIMA - The ...

Apache UIMA Ruta Guide and Reference - Apache UIMA - The ...

Apache UIMA Ruta Guide and Reference - Apache UIMA - The ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Learning by Example<br />

development or quality maintenance. Sometimes, it is necessary to debug the rules because they do<br />

not match as expected. In this case, the explanation perspective provides views that explain every<br />

detail of the matching process. Finally, the <strong>UIMA</strong> <strong>Ruta</strong> language can also be used by the tooling,<br />

for example, by the “Query” view. Here, <strong>UIMA</strong> <strong>Ruta</strong> rules can be used as query statements in order<br />

to investigate annotated documents.<br />

<strong>UIMA</strong> <strong>Ruta</strong> smoothly integrates with <strong>Apache</strong> <strong>UIMA</strong>. First of all, the <strong>UIMA</strong> <strong>Ruta</strong> rules are applied<br />

using a generic Analysis Engine <strong>and</strong> thus <strong>UIMA</strong> <strong>Ruta</strong> scripts can easily be added to <strong>Apache</strong> <strong>UIMA</strong><br />

pipelines. <strong>UIMA</strong> <strong>Ruta</strong> also provides the functionality to import <strong>and</strong> use other <strong>UIMA</strong> components<br />

like Analysis Engines <strong>and</strong> Type Systems. <strong>UIMA</strong> <strong>Ruta</strong> rules can refer to every type defined in an<br />

imported type system, <strong>and</strong> the <strong>UIMA</strong> <strong>Ruta</strong> Workbench generates a type system descriptor file<br />

containing all types that were defined in a script file. Any Analysis Engine can be executed by rules<br />

as long as their implementation is available in the classpath. <strong>The</strong>refore, functionality outsourced in<br />

an arbitrary Analysis Engine can be added <strong>and</strong> used within <strong>UIMA</strong> <strong>Ruta</strong>.<br />

1.4. Learning by Example<br />

This section gives an introduction to the <strong>UIMA</strong> <strong>Ruta</strong> language by explaining the rule syntax <strong>and</strong><br />

inference with some simplified examples. It is recommended to use the <strong>UIMA</strong> <strong>Ruta</strong> Workbench<br />

to write <strong>UIMA</strong> <strong>Ruta</strong> rules in order to gain advantages like syntax checking. A short description<br />

how to install the <strong>UIMA</strong> <strong>Ruta</strong> Workbench is given here. <strong>The</strong> following examples make use of the<br />

annotations added by the default seeding of the <strong>UIMA</strong> <strong>Ruta</strong> Analysis Engine. <strong>The</strong>ir meaning is<br />

explained along with the examples.<br />

Note: <strong>The</strong> examples in this section are not valid script files as they are missing at<br />

least a package declaration. In order to obtain a valid script file, please ensure that all<br />

used types are imported or declared <strong>and</strong> that a package declaration like “PACKAGE<br />

uima.ruta.example;” is added in the first line of the script.<br />

<strong>The</strong> first example consists of a declaration of a type followed by a simple rule. Type declarations<br />

always start with the keyword “DECLARE” followed by the short name of the new type. <strong>The</strong><br />

namespace of the type is equal to the package declaration of the script file. <strong>The</strong>re is also the<br />

possibility to create more complex types with features or specific parent types, but this will be<br />

neglected for now. In the example, a simple annotation type with the short name “Animal” is<br />

defined. After the declaration of the type, a rule with one rule element is given. <strong>UIMA</strong> <strong>Ruta</strong> rules<br />

in general can consist of a sequence of rule elements. Simple rule elements themselves consist<br />

of four parts: A matching condition, an optional quantifier, an optional list of conditions <strong>and</strong> an<br />

optional list of actions. <strong>The</strong> rule element in the following example has a matching condition “W”,<br />

an annotation type st<strong>and</strong>ing for normal words. Statements like declarations <strong>and</strong> rules always end<br />

with a semicolon.<br />

DECLARE Animal;<br />

W{REGEXP("dog") -> MARK(Animal)};<br />

<strong>The</strong> rule element also contains one condition <strong>and</strong> one action, both surrounded by curly parentheses.<br />

In order to distinguish conditions from actions they are separated by “->”. <strong>The</strong> condition<br />

“REGEXP("dog")” indicates that the matched word must match the regular expression “dog”. If the<br />

matching condition <strong>and</strong> the additional regular expression are fulfilled, then the action is executed,<br />

which creates a new annotation of the type “Animal” with the same offsets as the matched token.<br />

<strong>The</strong> default seeder does actually not add annotations of the type “W”, but annotations of the types<br />

“SW” <strong>and</strong> “CW” for small written words <strong>and</strong> capitalized words, which both have the parent type<br />

“W”.<br />

<strong>UIMA</strong> <strong>Ruta</strong> Version 2.0.1 <strong>Apache</strong> <strong>UIMA</strong> <strong>Ruta</strong> Overview 3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!