30.08.2014 Views

url - Universität zu Lübeck

url - Universität zu Lübeck

url - Universität zu Lübeck

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

90 CHAPTER 5. THE KEY-ORIENTED XML INDEX KEYX<br />

The postprocessing raises additional costs but if the set of elements referenced by<br />

v j is small the total costs of the query evaluation with the index will still outperform<br />

the exhaustive evaluation over the whole document.<br />

If there is a wildcard operator in the path expressions of the query the index<br />

may not be suitable anymore because it does not cover all requested elements.<br />

For instance, the index defined by /site/regions/asia/item[name=’x’] indexes<br />

all items located in asia. The query /site/regions/*/item[name=’Sinus<br />

MP3 Player’] does not have a regional restriction. We could calculate the path<br />

expression p △ = ../∗ that navigates from the item elements in asia to all children<br />

of its parent but this would not lead to success because the index does not cover<br />

the values of name elements that do not belong to asia. The decision whether<br />

we can use an index or not relies on the subset relationship (containment) of the<br />

corresponding keys.<br />

5.4.3 Containment Problem<br />

In general, a selective index covers all queries with a result set being a subset of<br />

the query that defined the index. For instance, an index that is designed to accelerate<br />

queries of the form /dblp/ ∗ [author = ′ x ′ ] is also capable of evaluating queries<br />

like /dblp/book[author = ′ x ′ ] or /dblp/article[author = ′ x ′ ] because the selected keys<br />

are a subset of the keys of the index.<br />

When using an index that covers a superset of the elements that are selected by<br />

the query an additional postprocessing step has to filter wrong hits: A simple<br />

node test checks if the selected nodes are of the requested element type (e.g. an<br />

element selected by ∗ is checked if it has the label book). Like in the previous case<br />

the postprocessing requires linear complexity in the size of the elements that are<br />

returned by the index.<br />

If an index is defined with a non-empty set of qualifiers (e.g. only books with an<br />

isbn child) it cannot be used to process a query that ignores the qualifier because<br />

the index does not cover all requested elements. In contrast, a query that poses<br />

more qualifiers than the index can be processed by the index with additional postprocessing<br />

because the query’s result is a subset of the elements that are indexed.<br />

The decision whether the selected nodes of one XPath expression p are a subset<br />

of the result set of a second expression p ′ (p ⊆ p ′ ) can be solved using the containment<br />

algorithm presented by Miklau and Suciu [82]. This algorithm constructs<br />

tree patterns for the path expressions p and p ′ and creates two (alternating) tree<br />

automata A and A ′ accepting XML data that can be queried by p and p ′ . Containment<br />

holds (p ⊆ p ′ ) if lang(A) ⊆ lang(A ′ ). A third automaton A ′′ accepting the<br />

complement of lang(A ′ ) (lang(A ′ )) is built on the base of A ′ by exchanging all accepting<br />

states with the non-accepting states. If the product automaton B = A x A ′′<br />

has no reachable accepting state it holds, that lang(A)∩lang(A ′ ) = ∅. This is equiv-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!