12.07.2015 Views

An efficient mechanism for Matching multiple patterns on XML Streams

An efficient mechanism for Matching multiple patterns on XML Streams

An efficient mechanism for Matching multiple patterns on XML Streams

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Predicates may also evaluate a boolean expressi<strong>on</strong>given by combinati<strong>on</strong>s of relati<strong>on</strong>al operators (=,!=, , =), the assignment operator (=:), logicaloperators (|, &, !), brackets ((, )), variable references(using $) or the current value from the <strong>XML</strong>data stream (if an operand is left empty). Variable referencesmay be complex as they also need to care <str<strong>on</strong>g>for</str<strong>on</strong>g>template references. For example $foo.bar[3].dingidentifies the value of the field ding of the 3. elementin the sequence bar, where bar is a field of the scalarfoo. The definiti<strong>on</strong> of TMPL has a set of rules thatgovern the exact evaluati<strong>on</strong> of predicates. We will notgo into more detail in this article. Please refer to thedocumentati<strong>on</strong> found at the website [6].3.5 Allowing Mixed C<strong>on</strong>tentThe language allows <str<strong>on</strong>g>for</str<strong>on</strong>g> matching of mixed c<strong>on</strong>tent;values are allowed to appear unrestricted am<strong>on</strong>g the<strong>XML</strong> tags and not <strong>on</strong>ly as a single child of an element.Opposite to other processors, the TMPL engine treatsleading and trailing white space as empty c<strong>on</strong>tent andstrips it from the data.4 Changes in the Evaluati<strong>on</strong> RuntimeImplementati<strong>on</strong> of the stream analysis runtime haschanged c<strong>on</strong>siderably, as shown in figure 1. The evaluati<strong>on</strong>engine is interpreting EFSM template matchingautomata using a proprietary binary <str<strong>on</strong>g>for</str<strong>on</strong>g>mat or GXLenabling a more flexible handling of templates, e.g. itis now possible to add or remove templates during runtimeand to combine automata. Per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance of thisapproach using <strong>on</strong>ly a single template is inferior tothe <str<strong>on</strong>g>for</str<strong>on</strong>g>mer compiled versi<strong>on</strong>, but superior when severaltemplates are matched in parallel.4.1 Automata Generati<strong>on</strong>The <str<strong>on</strong>g>for</str<strong>on</strong>g>malism <str<strong>on</strong>g>for</str<strong>on</strong>g> generated automata had to supportc<strong>on</strong>text (which may be values from the data stream,TMPL declared variables and c<strong>on</strong>stants, or internalvariables used by the evaluati<strong>on</strong> engine) and transiti<strong>on</strong>sthat may depend <strong>on</strong> any of them. These premiseslead to the adopti<strong>on</strong> of an EFSM <str<strong>on</strong>g>for</str<strong>on</strong>g>malism. Morespecifically we choose to extend the <strong>on</strong>e proposed byHenniger and Neumann (see [7]) because of the abovementi<strong>on</strong>ed features, the additi<strong>on</strong>al support <str<strong>on</strong>g>for</str<strong>on</strong>g> internalɛ transiti<strong>on</strong>s and the ability <str<strong>on</strong>g>for</str<strong>on</strong>g> transiti<strong>on</strong>s to changethe complete c<strong>on</strong>text of an automat<strong>on</strong>.<str<strong>on</strong>g>An</str<strong>on</strong>g> EFSM is defined as a tuple(S, C, I, O, T, s 0 , c 0 ), where S is a finite set ofstates, C the c<strong>on</strong>text 4 , I a n<strong>on</strong>-empty finite setof input events including ɛ, and O a n<strong>on</strong>-emptyfinite set of output events. T denotes the transiti<strong>on</strong>4 All possible values of a finite set of variablesrelati<strong>on</strong> from current state, c<strong>on</strong>text and input eventto the next state, modified c<strong>on</strong>text and output event(T ⊆ S × C × I × O × S × C). The two last elementss 0 and c 0 describe the initial state and c<strong>on</strong>text values.Every TMPL comp<strong>on</strong>ent (element, c<strong>on</strong>tent, attribute,template reference, . . . ) is translated to <strong>on</strong>eor more states that are combined to a graph using<str<strong>on</strong>g>for</str<strong>on</strong>g>ward and reverse transiti<strong>on</strong>s. A <str<strong>on</strong>g>for</str<strong>on</strong>g>ward transiti<strong>on</strong>is understood as to be leading towards the endstate, whereas a reverse transiti<strong>on</strong> is used when a pathcannot completed, e.g. when a parent <strong>XML</strong> elementcloses. As an example c<strong>on</strong>sider the following template<str<strong>on</strong>g>for</str<strong>on</strong>g> searching a (hypothetical) music database:template fooMusicList {}c<strong>on</strong>st integer c = 10;float ranking;template tSingle[] single;template tAlbum[] album;$c]>[=:ranking]single*album*The template fooMusicList would be translated toan EFSM similar 5 to the <strong>on</strong>e shown in figure 2. Astraight <str<strong>on</strong>g>for</str<strong>on</strong>g>ward path through the generated automat<strong>on</strong>is marked with bold transiti<strong>on</strong> lines. Several detailsmight be associated with a transiti<strong>on</strong>: c<strong>on</strong>textc<strong>on</strong>diti<strong>on</strong>s which are enclosed in square brackets, acti<strong>on</strong>s<str<strong>on</strong>g>for</str<strong>on</strong>g> changing the c<strong>on</strong>text in curly brackets, an inputevent from the <strong>XML</strong> Stream (e.g. START, END,ATTRIBUTE, . . . ) and an output event or n<strong>on</strong>e inthe case of an ɛ transiti<strong>on</strong>.The first two states are used to initialise c<strong>on</strong>stantsand output variables; the matching processstarts in the START and stops in the END statewhere a matching pattern has been identified whichwill be reported back to the analysing applicati<strong>on</strong>.For every element the generator c<strong>on</strong>structstwo states that match corresp<strong>on</strong>ding <strong>XML</strong> startand end elements (e.g. MATCH AUTHOR andMATCH AUTHOR END); <str<strong>on</strong>g>for</str<strong>on</strong>g> attributes the automat<strong>on</strong>has a state that first registers each attribute’sname and then uses a single state to match all of them,this is similar <str<strong>on</strong>g>for</str<strong>on</strong>g> c<strong>on</strong>tent. Template references aretranslated as single states.5 Some details emitted <str<strong>on</strong>g>for</str<strong>on</strong>g> brevity of examples.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!