30.08.2014 Views

url - Universität zu Lübeck

url - Universität zu Lübeck

url - Universität zu Lübeck

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 5<br />

The Key-Oriented XML Index<br />

KeyX<br />

In this chapter we introduce a new approach for indexing XML data formally and<br />

by examples. Our approach - called KeyX - is motivated by the selective index<br />

structures used within the relational world. Relational indexes are defined upon<br />

a specific table and one (or multiple) columns. Only queries that operate on these<br />

columns can be accelerated with this index. Therefore, a relational index is selective<br />

to specific queries.<br />

Like relational indexes, KeyX is based on keys - the values of elements and attributes<br />

which are accessed by a specific path expression. The path expression<br />

can be part of an XQuery or XUpdate operation.<br />

For a set of frequent queries 1 the relevant keys are extracted from the original<br />

XML data and stored in a search structure optimized for efficient key retrieval.<br />

Those search structures include hashtables, tries, binary search trees, B + Trees<br />

for disk resident indexes, or any other data structure that is capable of storing<br />

and retrieving keys.<br />

An index is defined by the ’shape’ of the path expression to be optimized. After<br />

materializing the index, further queries with a matching shape are processed<br />

by the index - with logarithmic instead of linear complexity. For real databases<br />

with a size of several megabytes a set of suitable indexes implies an acceleration<br />

factor of many magnitudes.<br />

KeyX can also be used to accelerate specific navigational queries. In contrast to<br />

structural summaries like Strong DataGuides and APEX our indexing approach<br />

is defined for a set of frequent navigational queries 1 . A selective structure index<br />

consumes less space and can be tuned for update issues.<br />

In the following we introduce KeyX formally and by examples. We prove the quality<br />

of this approach by performance measurements.<br />

1 Frequent queries can be defined by a database administrator or by tools that analyze the workload<br />

of the database.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!