url - Universität zu Lübeck
url - Universität zu Lübeck
url - Universität zu Lübeck
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
4.3. HYBRID APPROACHES 69<br />
paths when processing a path expression. Two functions may be evaluated for<br />
an extent e: The first Boolean function governs(e, v) responds if the extent or one<br />
of its descendant contains an element with the requested value v. The second<br />
Boolean function governs(e, v) returns only true, if e itself contains v. In the sample<br />
data only the node with the id 5 contains an element with the value Singapur,<br />
but all nodes with ids 0 to 5 govern the value.<br />
The authors propose two methods to capture the content in the extents: The first<br />
approach assigns a unique id to every value in the XML data and puts it into<br />
the extents that contain/govern this value. Because an own id for each values<br />
increases the size of the DataGuide dramatically the second proposal uses binary<br />
signatures of a restricted length. A non-bijective function assigns values to signatures.<br />
If an extent governs or contains more than one value the corresponding<br />
signatures are unified bitwise to a single signature that represents all values. This<br />
process is not lossless, leads to false positives and therefore requires postprocessing<br />
when evaluating a path expression.<br />
A major issue of the CADG is its limited capability to deal with updates. When<br />
adding/deleting a node or when changing the value of a node the corresponding<br />
signatures/ids must be identified and recalculated respectively deleted. In general,<br />
an update implies that all extents of the Content-Aware DataGuide must be<br />
touched. This is a linear complexity in the size of the database.<br />
4.3.3 ViST<br />
With the Virtual Suffix Tree (ViST) [114] Wang et al. introduce an approach that<br />
encodes and represents XML data and path expressions as structure-encoded<br />
sequences. XML data is represented by the preorder sequence of its tree structure<br />
produced by a depth-first traversal of the XML data. The value of elements<br />
and attributes and the labels of all elements are combined to one large sequence.<br />
Therefore, ViST is comparable to a numbering schema. Since isomorphic trees<br />
may produce different preorder sequences an order among sibling nodes is enforced<br />
using the lexicographic order of the labels. Multiple siblings with the same<br />
label (e.g. the payment element in the XMark sample data) are ordered randomly.<br />
In order to motivate the ViST approach we use the following DOM-represented<br />
XML fragment in figure 4.10.<br />
ViST transforms an XML data into a sequence of (symbol, prefix) pairs - the socalled<br />
structure-encoded sequence D. The XML fragment of figure 4.10 leads to<br />
the following structure-encoded sequence D:<br />
1 D=<br />
2 ( , ) ,<br />
3 (,) ,<br />
4 (,) ,<br />
5 (,) ,<br />
6 (,) ,