30.08.2014 Views

url - Universität zu Lübeck

url - Universität zu Lübeck

url - Universität zu Lübeck

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

7.2. INTERSECTION OF TWO PATH EXPRESSIONS 129<br />

7.2.2 Automaton for Mod(p)<br />

The general idea of the intersection algorithm is to build two finite automata A<br />

and A ′ with A accepting Mod(p) and A ′ accepting Mod(p ′ ) with p, p ′ ∈ P labs two absolute<br />

linear path expressions.<br />

Having A and A ′ we build the product automaton B . The emptiness of the intersection<br />

of p and p’ is a property of B.<br />

Unfortunately, finite automata are defined on the basis of a finite alphabet. In<br />

contrast, path expressions operate on XML data with an infinite alphabet because<br />

the node labels are not limited.<br />

In section 3.2.1 we showed that a linear path expression p ∈ P labs can be transformed<br />

into a regular expression r ∈ REG Σ,α with Σ = Σ(p) and α ∉ Σ an arbitrary<br />

new symbol.<br />

Lemma 3 For any regular expression r ∈ REG Σ,α , respectively its language L r , one<br />

can construct a finite automaton A that decides whether an input string is a word<br />

of L r or not.<br />

<br />

The proof of the lemma is omitted here because it is basic knowledge in theory of<br />

finite automata. The proof can be found, for instance, in [52].<br />

When reading an XML data as input for a finite automaton we have the following<br />

problem: The automaton expects a string of several symbols in a defined order.<br />

In tree-like XML data a node may have several children so that the next symbol<br />

(element label) is not defined unambigiously.<br />

Therefore, we define a function path leaf : T → P(string) that extracts all paths<br />

(sequences of nodes) from the root node to each leaf element node in the XML<br />

data. Text nodes are ignored as they are not affected by linear path expressions.<br />

The paths are returned as strings built from the labels of the contained nodes.<br />

The function is defined as follows:<br />

Definition 28 (Function path leaf )<br />

path leaf (t) = path(t.root)<br />

{<br />

path(n) =<br />

n.label : n.children = ∅<br />

n.label + ”; ” + {path(c)|c ∈ n.children} : otherwise<br />

with t ∈ T and n ∈ N; + denotes the concatenation of strings with a + ”; ”{b, c, d} =<br />

{a; b, a; c, a; d}. The semicolon is a delimiter used to distinguish different element<br />

label (e.g. a; b ≠ ab).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!