30.08.2014 Views

url - Universität zu Lübeck

url - Universität zu Lübeck

url - Universität zu Lübeck

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.3. HYBRID APPROACHES 69<br />

paths when processing a path expression. Two functions may be evaluated for<br />

an extent e: The first Boolean function governs(e, v) responds if the extent or one<br />

of its descendant contains an element with the requested value v. The second<br />

Boolean function governs(e, v) returns only true, if e itself contains v. In the sample<br />

data only the node with the id 5 contains an element with the value Singapur,<br />

but all nodes with ids 0 to 5 govern the value.<br />

The authors propose two methods to capture the content in the extents: The first<br />

approach assigns a unique id to every value in the XML data and puts it into<br />

the extents that contain/govern this value. Because an own id for each values<br />

increases the size of the DataGuide dramatically the second proposal uses binary<br />

signatures of a restricted length. A non-bijective function assigns values to signatures.<br />

If an extent governs or contains more than one value the corresponding<br />

signatures are unified bitwise to a single signature that represents all values. This<br />

process is not lossless, leads to false positives and therefore requires postprocessing<br />

when evaluating a path expression.<br />

A major issue of the CADG is its limited capability to deal with updates. When<br />

adding/deleting a node or when changing the value of a node the corresponding<br />

signatures/ids must be identified and recalculated respectively deleted. In general,<br />

an update implies that all extents of the Content-Aware DataGuide must be<br />

touched. This is a linear complexity in the size of the database.<br />

4.3.3 ViST<br />

With the Virtual Suffix Tree (ViST) [114] Wang et al. introduce an approach that<br />

encodes and represents XML data and path expressions as structure-encoded<br />

sequences. XML data is represented by the preorder sequence of its tree structure<br />

produced by a depth-first traversal of the XML data. The value of elements<br />

and attributes and the labels of all elements are combined to one large sequence.<br />

Therefore, ViST is comparable to a numbering schema. Since isomorphic trees<br />

may produce different preorder sequences an order among sibling nodes is enforced<br />

using the lexicographic order of the labels. Multiple siblings with the same<br />

label (e.g. the payment element in the XMark sample data) are ordered randomly.<br />

In order to motivate the ViST approach we use the following DOM-represented<br />

XML fragment in figure 4.10.<br />

ViST transforms an XML data into a sequence of (symbol, prefix) pairs - the socalled<br />

structure-encoded sequence D. The XML fragment of figure 4.10 leads to<br />

the following structure-encoded sequence D:<br />

1 D=<br />

2 ( , ) ,<br />

3 (,) ,<br />

4 (,) ,<br />

5 (,) ,<br />

6 (,) ,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!