Why Off-the-Shelf RDBMSs are Better at XPath ... - ACM Digital Library
Why Off-the-Shelf RDBMSs are Better at XPath ... - ACM Digital Library
Why Off-the-Shelf RDBMSs are Better at XPath ... - ACM Digital Library
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
ABSTRACT<br />
<strong>Why</strong> <strong>Off</strong>-<strong>the</strong>-<strong>Shelf</strong> <strong>RDBMSs</strong> <strong>are</strong> <strong>Better</strong> <strong>at</strong> XP<strong>at</strong>h<br />
Than You Might Expect<br />
Torsten Grust Jan Rittinger Jens Teubner<br />
To compens<strong>at</strong>e for <strong>the</strong> inherent impedance mism<strong>at</strong>ch between<br />
<strong>the</strong> rel<strong>at</strong>ional d<strong>at</strong>a model (tables of tuples) and XML<br />
(ordered, unranked trees), tree join algorithms have become<br />
<strong>the</strong> prevalent means to process XML d<strong>at</strong>a in rel<strong>at</strong>ional d<strong>at</strong>abases,<br />
most notably <strong>the</strong> TwigStack [6], structural join [1],<br />
and staircase join [13] algorithms. However, <strong>the</strong> addition of<br />
<strong>the</strong>se algorithms to existing systems depends on a significant<br />
invasion of <strong>the</strong> underlying d<strong>at</strong>abase kernel, an option<br />
intolerable for most d<strong>at</strong>abase vendors.<br />
Here, we demonstr<strong>at</strong>e th<strong>at</strong> we can achieve comparable<br />
XP<strong>at</strong>h performance without touching <strong>the</strong> heart of <strong>the</strong> system.<br />
We c<strong>are</strong>fully exploit existing d<strong>at</strong>abase functionality<br />
and acceler<strong>at</strong>e XP<strong>at</strong>h navig<strong>at</strong>ion by purely rel<strong>at</strong>ional means:<br />
partitioned B-trees bring access costs to secondary storage to<br />
a minimum, while aggreg<strong>at</strong>ion functions avoid an expensive<br />
comput<strong>at</strong>ion and removal of duplic<strong>at</strong>e result nodes to comply<br />
with <strong>the</strong> XP<strong>at</strong>h semantics. Experiments carried out on<br />
IBM DB2 confirm th<strong>at</strong> our approach can turn off-<strong>the</strong>-shelf<br />
d<strong>at</strong>abase systems into efficient XP<strong>at</strong>h processors.<br />
C<strong>at</strong>egories and Subject Descriptors<br />
H.2.4 [Inform<strong>at</strong>ion Systems]: D<strong>at</strong>abase Management—<br />
Systems; H.3.4 [Inform<strong>at</strong>ion Systems]: Inform<strong>at</strong>ion Storage<br />
and Retrieval—Systems and Softw<strong>are</strong><br />
General Terms<br />
Performance, Experiment<strong>at</strong>ion, Languages<br />
Keywords<br />
XP<strong>at</strong>h, SQL, Partitioned B-Tree, Rel<strong>at</strong>ional D<strong>at</strong>abases<br />
1. INTRODUCTION<br />
The use of suitable tree encodings can turn rel<strong>at</strong>ional d<strong>at</strong>abase<br />
back-ends into highly efficient XP<strong>at</strong>h processors, an<br />
XML storage approach th<strong>at</strong> has since become widely accepted<br />
in research and industry. With efficient RDBMS<br />
Permission to make digital or hard copies of all or part of this work for<br />
personal or classroom use is granted without fee provided th<strong>at</strong> copies <strong>are</strong><br />
not made or distributed for profit or commercial advantage and th<strong>at</strong> copies<br />
bear this notice and <strong>the</strong> full cit<strong>at</strong>ion on <strong>the</strong> first page. To copy o<strong>the</strong>rwise, to<br />
republish, to post on servers or to redistribute to lists, requires prior specific<br />
permission and/or a fee.<br />
SIGMOD’07, June 12–14, 2007, Beijing, China.<br />
Copyright 2007 <strong>ACM</strong> 978-1-59593-686-8/07/0006 ...$5.00.<br />
Technische Universität München<br />
Munich, Germany<br />
torsten.grust | jan.rittinger | jens.teubner@in.tum.de<br />
949<br />
implement<strong>at</strong>ions in <strong>the</strong> back, rel<strong>at</strong>ional XQuery implement<strong>at</strong>ions<br />
excel with a scalability to multi-gigabyte XML instances<br />
(e.g., [5, 17]).<br />
To compens<strong>at</strong>e for <strong>the</strong> lack of tree-aw<strong>are</strong>ness in rel<strong>at</strong>ional<br />
systems, various algorithms have been established to process<br />
encoded XML tree instances, most notably <strong>the</strong> TwigStack<br />
[6], structural join [1], multi-predic<strong>at</strong>e merge join [24], and<br />
staircase join [13] algorithms. But although c<strong>are</strong> has been<br />
taken to limit <strong>the</strong> required change impact to a single d<strong>at</strong>abase<br />
oper<strong>at</strong>or, all of <strong>the</strong>se proposals depend on modific<strong>at</strong>ions<br />
to <strong>the</strong> internals of <strong>the</strong> underlying RDBMS kernel, a requirement<br />
th<strong>at</strong> may be unacceptable in actual systems (e.g., if<br />
<strong>the</strong> RDBMS does not allow kernel modific<strong>at</strong>ions).<br />
In this paper, we demonstr<strong>at</strong>e th<strong>at</strong> existing and commonly<br />
available d<strong>at</strong>abase techniques can make up for a large<br />
part of this limit<strong>at</strong>ion. We deliber<strong>at</strong>ely use an off-<strong>the</strong>-shelf<br />
RDBMS implement<strong>at</strong>ion for <strong>the</strong> task of evalu<strong>at</strong>ing XP<strong>at</strong>h<br />
loc<strong>at</strong>ion steps (IBM DB2 for th<strong>at</strong> m<strong>at</strong>ter). A closer look<br />
<strong>at</strong> <strong>the</strong> indexing and execution techniques th<strong>at</strong> this system<br />
uses reveals interesting insights into why rel<strong>at</strong>ional systems<br />
perform so well <strong>at</strong> XP<strong>at</strong>h. Three common techniques from<br />
<strong>the</strong> rel<strong>at</strong>ional domain proved particularly effective for XML<br />
tree processing:<br />
(i) The use of partitioned B-trees, a technique recently described<br />
by Graefe [9] significantly speeds up <strong>the</strong> evalu<strong>at</strong>ion<br />
of non-recursive XP<strong>at</strong>h axes (child, p<strong>are</strong>nt) and<br />
XP<strong>at</strong>h node tests (name and/or kind tests). The same<br />
idea can implement TwigStack’s node streams [6] or<br />
simul<strong>at</strong>e schema-based encoding techniques [20, 22] in<br />
a seamless fashion.<br />
(ii) Aggreg<strong>at</strong>ion functions provide a purely rel<strong>at</strong>ional implement<strong>at</strong>ion<br />
of <strong>the</strong> pruning idea, a key aspect in <strong>the</strong><br />
aforementioned tree join algorithms.<br />
(iii) By purely rel<strong>at</strong>ional means, RDBMS query optimizers<br />
embrace rewriting techniques for queries over treestructured<br />
d<strong>at</strong>a th<strong>at</strong> required tailor-made XML support<br />
in earlier work.<br />
We will motiv<strong>at</strong>e all <strong>the</strong>se techniques with experiments<br />
carried out in IBM DB2 (using its SQL functionalities only)<br />
and closely look <strong>at</strong> <strong>the</strong> query plans employed by this system.<br />
Among <strong>the</strong> tree encodings proposed in <strong>the</strong> liter<strong>at</strong>ure,<br />
we chose <strong>the</strong> pre/size/level range encoding to perform<br />
<strong>the</strong>se experiments. The effects we describe, however, apply<br />
equally well to o<strong>the</strong>r node-based numbering schemes,<br />
including those based on pre/post [10] or Dewey numbers<br />
[23].
We will proceed as follows. Section 2 briefly reviews rel<strong>at</strong>ional<br />
XML tree encodings, <strong>the</strong> range encoding (pre/size/<br />
level) in particular. Based on this encoding, we demonstr<strong>at</strong>e<br />
<strong>the</strong> use of partitioned B-trees to acceler<strong>at</strong>e <strong>the</strong> evalu<strong>at</strong>ion of<br />
XP<strong>at</strong>h’s child and p<strong>are</strong>nt axes as well as XP<strong>at</strong>h node tests<br />
(Section 3). In Section 4, we realize staircase join’s pruning<br />
idea [13] on an SQL-only system, before we exploit fur<strong>the</strong>r<br />
RDBMS facilities for efficient XP<strong>at</strong>h processing in Section 5.<br />
We comp<strong>are</strong> our results with rel<strong>at</strong>ed work in Section 6 and<br />
wrap up in Section 7.<br />
2. RELATIONAL TREE ENCODINGS<br />
Any rel<strong>at</strong>ional XQuery implement<strong>at</strong>ion th<strong>at</strong> strives for<br />
compliance with <strong>the</strong> W3C specific<strong>at</strong>ions [4] is bound to represent<br />
XML document trees in a schema-oblivious fashion.<br />
Acceptable support for <strong>the</strong> recursion inherent to XML d<strong>at</strong>a<br />
is provided by Dewey-based encodings (e.g., [18, 23]) and encodings<br />
th<strong>at</strong> use pre- and postorder ranks to describe node<br />
rel<strong>at</strong>ionships in terms of region conditions (e.g., [5, 7, 10,<br />
15, 24]).<br />
2.1 Zooming in: Range Encodings<br />
We will focus on a represent<strong>at</strong>ive of <strong>the</strong> l<strong>at</strong>ter c<strong>at</strong>egory,<br />
where <strong>the</strong> idea is to record for each node v a range th<strong>at</strong><br />
hosts all nodes v ′ in<strong>the</strong>subtreebelowv. More specifically,<br />
we enumer<strong>at</strong>e all nodes according to <strong>the</strong> XML document<br />
order (v’s preorder rank pre(v)). Fur<strong>the</strong>r, for each node v,<br />
we maintain size(v) asv’s number of descendant nodes and<br />
level(v), v’s distance from <strong>the</strong> tree’s root, which completes<br />
<strong>the</strong> structural component of our encoding. Two properties<br />
kind(v) ∈ {elem, text, comment,...} and prop(v) (holding<br />
v’s tag name or textual content for text/comment nodes)<br />
account for v’s semantical content. Figure 1(b) illustr<strong>at</strong>es<br />
this encoding for <strong>the</strong> XML fragment (see Figure 1(a) for <strong>the</strong><br />
corresponding tree):<br />
ci .<br />
The use of this range encoding variant is a reasonable<br />
choice: it serves as <strong>the</strong> backbone of <strong>the</strong> open-source XQuery<br />
implement<strong>at</strong>ion MonetDB/XQuery 1 and, hence, has proven<br />
its applicability for large-scale XML processing. Note th<strong>at</strong><br />
<strong>the</strong> correl<strong>at</strong>ion<br />
pre(v) − post (v) =level(v) − size(v)<br />
for any tree node v (with post (v) denotingv’s postorder<br />
rank) makes <strong>the</strong> encoding equivalent to o<strong>the</strong>r region-based<br />
tree encodings described in <strong>the</strong> past.<br />
2.1.1 Query Regions for XP<strong>at</strong>h<br />
Range encoding allows <strong>the</strong> characteriz<strong>at</strong>ion of XP<strong>at</strong>h navig<strong>at</strong>ion<br />
axes in a way th<strong>at</strong> perfectly suits <strong>the</strong> rel<strong>at</strong>ional processing<br />
model. The recursive axis descendant, e.g., turns<br />
into a simple range condition over preorder ranks:<br />
v ∈ c/descendant ⇔<br />
(Desc)<br />
pre(c) < pre(v) ≤ pre(c)+size(c) .<br />
This range query for <strong>the</strong> nodes v in <strong>the</strong> descendant region<br />
of context node c is efficiently supported in commodity systems,<br />
e.g., in terms of a B-tree index on column pre. Similar<br />
characteriz<strong>at</strong>ions arise for o<strong>the</strong>r XP<strong>at</strong>h axes as well, which<br />
we illustr<strong>at</strong>ed in Figure 1(c), assuming a singleton context<br />
1 http://www.monetdb-xquery.org/<br />
950<br />
node sequence containing node d of <strong>the</strong> example document<br />
in Figure 1(a).<br />
3. PARTITIONED B-TREES FOR XPATH<br />
While <strong>the</strong> range encoding efficiently describes <strong>the</strong> semantics<br />
of recursive XP<strong>at</strong>h axes, ideally this should not neg<strong>at</strong>ively<br />
affect <strong>the</strong> efficient evalu<strong>at</strong>ion of steps along any of<br />
<strong>the</strong> non-recursive axes. The two axes child and p<strong>are</strong>nt<br />
seem particularly crucial here. On range-encoded d<strong>at</strong>a, we<br />
characterize axis child based on Condition Desc and an<br />
additional predic<strong>at</strong>e on column level:<br />
v ∈ c/child ⇔<br />
pre(c) < pre(v) ≤ pre(c)+size(c) (Child)<br />
∧ level(v) =level(c)+1 .<br />
Assuming a context node sequence ctx encoded in <strong>the</strong> d<strong>at</strong>abase<br />
as table ctx, <strong>the</strong> p<strong>at</strong>h expression ctx /child::a <strong>the</strong>n<br />
compiles into <strong>the</strong> SQL expression<br />
SELECT DISTINCT d.*<br />
FROM ctx c, doc d<br />
WHERE c.pre
0a9<br />
1b4<br />
6g<br />
3<br />
2"c"<br />
0 3d2<br />
7h2<br />
4e0 5f<br />
0 8"i"<br />
0 9j<br />
0<br />
(a) Sample tree, annot<strong>at</strong>ed with<br />
properties pre (left) and size (right).<br />
Nodes c and i denote text nodes.<br />
pre size level kind prop<br />
0<br />
1<br />
2<br />
3<br />
4<br />
5<br />
.<br />
9<br />
4<br />
0<br />
2<br />
0<br />
0<br />
.<br />
0<br />
1<br />
2<br />
2<br />
3<br />
3<br />
.<br />
elem<br />
elem<br />
text<br />
elem<br />
elem<br />
elem<br />
.<br />
a<br />
b<br />
c<br />
d<br />
e<br />
f<br />
.<br />
(b) Rel<strong>at</strong>ional tree encoding.<br />
level<br />
ancestor +<br />
preceding<br />
a<br />
•<br />
desc. following<br />
b<br />
•<br />
g<br />
•<br />
c<br />
•<br />
d<br />
◦<br />
h<br />
•<br />
e<br />
•<br />
f<br />
•<br />
i<br />
•<br />
j<br />
•<br />
pre(d) pre(d)<br />
+size(d)<br />
pre<br />
(c) Associ<strong>at</strong>ed pre/level plane. Lengths of <strong>the</strong> lines<br />
correspond to <strong>the</strong> size property. Properties pre and<br />
size determine <strong>the</strong> descendant and following regions<br />
of context node d.<br />
Figure 1: Sample tree and associ<strong>at</strong>ed rel<strong>at</strong>ional encoding. An illustr<strong>at</strong>ive represent<strong>at</strong>ion is <strong>the</strong> pre/level plane.<br />
execution time [ms]<br />
10 6<br />
10 5<br />
10 4<br />
10 3<br />
10 2<br />
10 1<br />
2<br />
2<br />
4<br />
3<br />
child via p<strong>are</strong>nt<br />
child via pre/size/level<br />
12<br />
11<br />
26<br />
24<br />
78<br />
81<br />
223<br />
234<br />
696<br />
767<br />
2,079<br />
2,337<br />
7,663<br />
8,325<br />
0.11 0.29 1.1 3.3 11 34 111 335 1,118<br />
XML document size [MB]<br />
Figure 2: XP<strong>at</strong>h performance for edge mapping and<br />
range encoding as observed for <strong>the</strong> p<strong>at</strong>h expression<br />
/descendant::open_auction/bidder/increase.<br />
both query variants returned <strong>the</strong>ir results in roughly 8 seconds<br />
on, e.g., <strong>the</strong> 1.1 GB XML instance.<br />
The reason for this unexpectedly efficient execution on<br />
<strong>the</strong> pre/size/level encoding becomes app<strong>are</strong>nt when we look<br />
into <strong>the</strong> query plan th<strong>at</strong> IBM DB2 employs to evalu<strong>at</strong>e our<br />
query. A (simplified) sketch of this plan is shown in Figure 3.<br />
IBM DB2 implements <strong>the</strong> two child steps with <strong>the</strong> help of<br />
a conc<strong>at</strong>en<strong>at</strong>ed 〈level, pre〉 B-tree, with primary ordering on<br />
column level. We will now see why this index is particularly<br />
efficient in acceler<strong>at</strong>ing queries along <strong>the</strong> child axis.<br />
3.2 Partitioned B-Trees<br />
B-trees of this kind have also been referred to as partitioned<br />
B-trees [9]. With typical XML tree heights height(t)<br />
of only 10–20, <strong>the</strong> selectivity of column level is very low,<br />
particularly for large document instances. Effectively, <strong>the</strong><br />
prepending of level to a conc<strong>at</strong>en<strong>at</strong>ed B-tree thus leads to a<br />
partitioning of <strong>the</strong> resulting index tree into height(t) partitions<br />
(see Figure 4).<br />
Fortun<strong>at</strong>ely, a low-selectivity prefix will only have a marginal<br />
impact on <strong>the</strong> B-tree’s storage consumption in commodity<br />
RDBMS implement<strong>at</strong>ions. Prefix compression [2] avoids<br />
<strong>the</strong> repe<strong>at</strong>ed storage of <strong>the</strong> level inform<strong>at</strong>ion and minimizes<br />
<strong>the</strong> CPU overhead for search key comparisons during B-tree<br />
lookups. As discussed in [9], this economical dealing with<br />
system resources may even allow <strong>the</strong> use of a 〈level, pre〉 as<br />
<strong>are</strong>placementforapre-only key.<br />
951<br />
IXSCAN<br />
pre=0<br />
doc<br />
NLJOIN<br />
desc::open_auction<br />
NLJOIN<br />
child::bidder<br />
IXSCAN<br />
pre<br />
doc<br />
UNIQ<br />
SORT<br />
pre<br />
NLJOIN<br />
child::increase<br />
IXSCAN<br />
〈level,pre〉<br />
doc<br />
IXSCAN<br />
〈level,pre〉<br />
doc<br />
Figure 3: DB2 execution plan corresponding to <strong>the</strong><br />
p<strong>at</strong>h /descendant::open_auction/bidder/increase.<br />
···<br />
level =1 level =2 ··· level = height(t)<br />
Figure 4: B-tree partitioning.<br />
3.2.1 Partitioned B-Trees for <strong>the</strong> child Axis<br />
Most importantly, however, <strong>the</strong> partitioning leads to an<br />
evalu<strong>at</strong>ion str<strong>at</strong>egy th<strong>at</strong> will never encounter false hits with<br />
respect to <strong>the</strong> level property (see Figure 5 for an illustr<strong>at</strong>ion).<br />
For each context node c, <strong>the</strong> system<br />
1. initi<strong>at</strong>es a scan of <strong>the</strong> partitioned B-tree using level(v) =<br />
level(c)+1andpre(v) > pre(c) and<br />
2. scans <strong>the</strong> index as long as pre(v) ≤ pre(c)+size(c).<br />
All nodes encountered during this index scan will s<strong>at</strong>isfy<br />
Condition Child and, hence, qualify as children of c.<br />
3.2.2 Reduced Complexity<br />
Contrasted with a scan of an index with primary ordering<br />
on <strong>the</strong> key column pre, <strong>the</strong> work required to evalu<strong>at</strong>e child<br />
on a partitioned 〈level, pre〉 B-tree no longer depends on <strong>the</strong><br />
size of <strong>the</strong> subtree below <strong>the</strong> context node c. Todemonstr<strong>at</strong>e<br />
this effect, we chose /site/regions/africa as a p<strong>at</strong>h th<strong>at</strong><br />
returns exactly one node for all document sizes, while <strong>the</strong><br />
subtree below <strong>the</strong> single context node grows proportionally
level<br />
level(d)<br />
level(d)+1<br />
a<br />
•<br />
b<br />
•<br />
g<br />
•<br />
c<br />
•<br />
d<br />
◦<br />
scan<br />
h<br />
•<br />
e• f•<br />
pre(d) pre(d)<br />
+size(d)<br />
i j<br />
• •<br />
pre<br />
Figure 5: The sequential scan of a conc<strong>at</strong>en<strong>at</strong>ed<br />
〈level, pre〉 index will answer an XP<strong>at</strong>h child navig<strong>at</strong>ion<br />
and not encounter any false hits.<br />
execution time [ms]<br />
10 6<br />
10 5<br />
10 4<br />
10 3<br />
10 2<br />
10 1<br />
9<br />
1<br />
〈pre, level〉 index<br />
partitioned 〈level, pre〉 index<br />
33<br />
1<br />
111<br />
1<br />
322<br />
1<br />
1,083<br />
1<br />
3,197<br />
1<br />
5,505<br />
1<br />
31,455<br />
1<br />
104,955<br />
1<br />
0.11 0.29 1.1 3.3 11 34 111 335 1,118<br />
XML document size [MB]<br />
Figure 6: XP<strong>at</strong>h child navig<strong>at</strong>ion using a partitioned<br />
B-tree. (p<strong>at</strong>h: /site/regions/africa).<br />
to <strong>the</strong> XML instance size. We <strong>the</strong>n removed all indexes from<br />
<strong>the</strong> pre/size/level table and left <strong>the</strong> system with only<br />
(a) a 〈pre, level〉 B-tree or<br />
(b) a partitioned 〈level, pre〉 B-tree<br />
to evalu<strong>at</strong>e <strong>the</strong> three child steps. 3<br />
Figure 6 documents how <strong>the</strong> partitioned B-tree decouples<br />
<strong>the</strong> query complexity from <strong>the</strong> total document size. While<br />
<strong>the</strong> use of <strong>the</strong> 〈pre, level〉 B-tree requires an execution time<br />
linear in <strong>the</strong> size of <strong>the</strong> XML document, <strong>the</strong> partitioned Btree<br />
leads to sub-millisecond response times independent of<br />
<strong>the</strong> XML instance size.<br />
3.2.3 Partitioned B-Trees and ORDPATH<br />
ORDPATH labels [18] encode <strong>the</strong> XML tree structure in<br />
terms of a sequence of ordinals. To access a node’s subtree,<br />
<strong>the</strong> encoding depends on <strong>the</strong> lexicographic order among labels,<br />
which coincides with <strong>the</strong> preorder rank pre(v). Thus,<br />
a navig<strong>at</strong>ion along <strong>the</strong> descendant axis amounts to a range<br />
scan equivalent to Condition Desc. Likewise, <strong>the</strong> characteriz<strong>at</strong>ion<br />
of <strong>the</strong> XP<strong>at</strong>h child axis on ORDPATH labels is<br />
equivalent to Condition Child, where a regular expression<br />
m<strong>at</strong>ch assumes <strong>the</strong> place of <strong>the</strong> test on property level.<br />
We <strong>the</strong>refore expect similar performance advantages from<br />
a partitioned 〈level, ordp<strong>at</strong>h〉 B-tree in ORDPATH-based systems.<br />
Microsoft SQL Server does not currently maintain<br />
level inform<strong>at</strong>ion in its XML storage. Functional indexes,<br />
3 Both combin<strong>at</strong>ions allow <strong>the</strong> evalu<strong>at</strong>ion of Condition<br />
Child based on <strong>the</strong> index d<strong>at</strong>a only.<br />
952<br />
however, provided by most of <strong>the</strong> commercial systems (including<br />
SQL Server), allow tuple indexing based on computed<br />
values and could implement a 〈level, ordp<strong>at</strong>h〉 index<br />
without <strong>the</strong> need for an explicit storage of level.<br />
3.3 More Partitioned B-Trees<br />
Sofarwehaveonlyexploited<strong>the</strong>presenceoflow-selectivity<br />
columns for <strong>the</strong> structural component of <strong>the</strong> XML encoding<br />
(column level). The same idea, however, applies to <strong>the</strong> d<strong>at</strong>a<br />
component as well. The kind column is an enumer<strong>at</strong>ion of<br />
only six XML node kinds, and even large XML instances <strong>are</strong><br />
build from only few tag names. 4<br />
Both columns <strong>are</strong> suitable candid<strong>at</strong>es to prefix a partitioned<br />
B-tree. A leading kind column (such as 〈kind, pre〉<br />
or 〈kind, level, pre〉) effectively pushes XP<strong>at</strong>h kind tests (*,<br />
text(), comment(), ...) into <strong>the</strong> index scan, while a prefix<br />
〈kind, prop〉 (e.g., 〈kind, prop, pre〉 or 〈kind, prop, level, pre〉)<br />
will similarly speed up <strong>the</strong> evalu<strong>at</strong>ion of name tests.<br />
We have reported on <strong>the</strong> effectiveness of predic<strong>at</strong>e pushdowns<br />
in XP<strong>at</strong>h loc<strong>at</strong>ion steps in earlier work [13]. Partitioned<br />
B-trees provide an efficient implement<strong>at</strong>ion on commodity<br />
systems. In fact, we found <strong>the</strong> DB2 index advisor<br />
db2advis to seize this chance and suggest appropri<strong>at</strong>e<br />
indexes for workloads th<strong>at</strong> made use of XP<strong>at</strong>h name<br />
and/or kind tests. Note also how an index scan along a<br />
〈kind, prop, pre〉 index readily provides an implement<strong>at</strong>ion<br />
of <strong>the</strong> element streams required by <strong>the</strong> TwigStack algorithm<br />
in [6] using established indexing techniques only.<br />
3.4 Benefit from Early-Out: p<strong>are</strong>nt Axis<br />
Unfortun<strong>at</strong>ely, <strong>the</strong> idea of scanning a conc<strong>at</strong>en<strong>at</strong>ed 〈level,<br />
pre〉 index to evalu<strong>at</strong>e <strong>the</strong> child axis on range-encoded d<strong>at</strong>a<br />
cannot directly be transferred to an evalu<strong>at</strong>ion of XP<strong>at</strong>h’s<br />
p<strong>are</strong>nt axis. Thisaxisisdescribedbyaconstraintontwo<br />
independent columns in <strong>the</strong> pre/size encoding:<br />
v ∈ c/p<strong>are</strong>nt ⇔<br />
pre(v) < pre(c) ≤ pre(v)+size(v)<br />
∧ level(v) =level(c) − 1 .<br />
(P<strong>are</strong>nt)<br />
B-tree indexes, however, can only be used to answer queries<br />
for ranges in a single dimension and, hence, cannot be used<br />
to find tuples qualifying for Condition P<strong>are</strong>nt.<br />
Yet, if we consider tree-specific properties inherent to <strong>the</strong><br />
pre/size encoding, we can still remedy this restriction and<br />
evalu<strong>at</strong>e a p<strong>are</strong>nt step in terms of a single index lookup.<br />
The idea is illustr<strong>at</strong>ed in Figure 7 (assuming node d as <strong>the</strong><br />
context). Given <strong>the</strong> properties level(d) andpre(d) of<strong>the</strong><br />
context node d, we can trigger a reverse index scan 5 on a<br />
conc<strong>at</strong>en<strong>at</strong>ed 〈level, pre〉 index, starting <strong>at</strong> <strong>the</strong> index position<br />
〈level(d)−1, pre(d)〉. As shown in Figure 7, such a scan<br />
will always encounter d’s p<strong>are</strong>nt node as its first hit (if d has<br />
a p<strong>are</strong>nt <strong>at</strong> all).<br />
This approach to <strong>the</strong> evalu<strong>at</strong>ion of <strong>the</strong> p<strong>are</strong>nt axis blends<br />
perfectly with <strong>the</strong> execution model of existing d<strong>at</strong>abase systems.<br />
IBM DB2, e.g., allows index scans to be executed<br />
as single record scans, a fe<strong>at</strong>ure th<strong>at</strong> precisely m<strong>at</strong>ches <strong>the</strong><br />
evalu<strong>at</strong>ion str<strong>at</strong>egy we <strong>are</strong> after here. If evalu<strong>at</strong>ed as a single<br />
4 The XMark [21] DTD, e.g., lists 77 tag names.<br />
5 The functionality to scan indexes in a reverse fashion may<br />
presuppose an explicit declar<strong>at</strong>ion during index cre<strong>at</strong>ion,<br />
e.g., in terms of DB2’s CREATE INDEX ... ALLOW REVERSE<br />
SCANS instruction.
level<br />
level(d) − 1<br />
a<br />
•<br />
b<br />
•<br />
scan<br />
g<br />
•<br />
level(d)<br />
c<br />
•<br />
d<br />
◦<br />
h<br />
•<br />
e<br />
•<br />
f<br />
•<br />
i<br />
•<br />
j<br />
•<br />
pre(d)<br />
pre<br />
Figure 7: XP<strong>at</strong>h p<strong>are</strong>nt navig<strong>at</strong>ion by means of a<br />
reverse 〈level, pre〉 index scan (starting from context<br />
node d).<br />
IXSCAN<br />
kind=elem<br />
prop=description<br />
doc<br />
UNIQ<br />
SORT<br />
pre<br />
NLJOIN<br />
p<strong>are</strong>nt::node()<br />
IXSCAN<br />
〈level ,pre〉<br />
single record<br />
doc<br />
Figure 8: (Desired) DB2 execution plan to evalu<strong>at</strong>e<br />
QP<strong>are</strong>nt (/descendant::description/p<strong>are</strong>nt::node()).<br />
record scan, index scans will only return <strong>the</strong>ir first m<strong>at</strong>ching<br />
tuple for each index re-scan, which exactly m<strong>at</strong>ches our<br />
needs. In Figure 8, we illustr<strong>at</strong>ed <strong>the</strong> corresponding execution<br />
plan for <strong>the</strong> p<strong>at</strong>h<br />
/descendant::description/p<strong>are</strong>nt::node() . (QP<strong>are</strong>nt)<br />
Unfortun<strong>at</strong>ely, <strong>the</strong> decision to use a single record execution<br />
str<strong>at</strong>egy requires an explicit knowledge about <strong>the</strong> d<strong>at</strong>a’s<br />
tree origin, knowledge th<strong>at</strong> is hardly expressible on <strong>the</strong> SQL<br />
level. Therefore, we cannot evoke our intended execution<br />
str<strong>at</strong>egy using an SQL-only interface to <strong>the</strong> d<strong>at</strong>abase system.<br />
An XQuery front-end with direct access to <strong>the</strong> planner<br />
component of <strong>the</strong> underlying system, however, could easily<br />
gener<strong>at</strong>e <strong>the</strong> desired execution plan as an implement<strong>at</strong>ion<br />
of <strong>the</strong> p<strong>are</strong>nt step. Note th<strong>at</strong> an XQuery extension of this<br />
kind would not require any modific<strong>at</strong>ions to <strong>the</strong> system’s<br />
execution engine but merely recycle existing functionality.<br />
To evalu<strong>at</strong>e <strong>the</strong> potential of a 〈level, pre〉-based p<strong>are</strong>nt<br />
evalu<strong>at</strong>ion, we simul<strong>at</strong>ed <strong>the</strong> respective query plan on our<br />
DB2 instance in terms of a modified version of Query QP<strong>are</strong>nt:<br />
/descendant::description [ p<strong>are</strong>nt::node() ] .<br />
(Q ′ P<strong>are</strong>nt)<br />
This p<strong>at</strong>h essentially computes <strong>the</strong> result for Query QP<strong>are</strong>nt,<br />
but returns description nodes instead of <strong>the</strong>ir p<strong>are</strong>nts. ExpressedinSQL,QueryQ<br />
′ P<strong>are</strong>nt reads:<br />
SELECT DISTINCT d.*<br />
FROM doc d<br />
WHERE d.kind = elem AND d.prop = ’description’<br />
AND EXISTS ( SELECT *<br />
FROM doc p<br />
WHERE p.level = d.level − 1<br />
AND p.pre
level<br />
a<br />
•<br />
b<br />
•<br />
g<br />
•<br />
c<br />
◦<br />
d<br />
◦<br />
h<br />
•<br />
e<br />
•<br />
f<br />
•<br />
i<br />
•<br />
j<br />
•<br />
pre(c)<br />
+size(c)<br />
pre(d)<br />
+size(d)<br />
pre<br />
Figure 10: Nodes g, h, i, and j <strong>are</strong> loc<strong>at</strong>ed in <strong>the</strong><br />
following regions of both context nodes c and d.<br />
base kernel. We can achieve <strong>the</strong> same effect, though, with<br />
purely rel<strong>at</strong>ional plan rewrites in an off-<strong>the</strong>-shelf system.<br />
To illustr<strong>at</strong>e, we use <strong>the</strong> SQL represent<strong>at</strong>ion of <strong>the</strong> step<br />
ctx /following:<br />
SELECT DISTINCT d.*<br />
FROM doc d, ctx c<br />
WHERE d.pre >c.pre + c.size<br />
ORDER BY d.pre ,<br />
essentially a semi-join between <strong>the</strong> persistent document container<br />
doc and a set of context nodes ctx. Due to <strong>the</strong><br />
DISTINCT clause, this semi-join may be rewritten into<br />
SELECT d.*<br />
FROM doc d<br />
WHERE d.pre > ANY ( SELECT c.pre + c.size (2)<br />
FROM ctx c )<br />
ORDER BY d.pre ,<br />
which is most efficiently computed by aggreg<strong>at</strong>ing over <strong>the</strong><br />
context rel<strong>at</strong>ion ctx first:<br />
SELECT d.*<br />
FROM doc d<br />
WHERE d.pre > ( SELECT MIN (c.pre + c.size) (3)<br />
FROM ctx c )<br />
ORDER BY d.pre .<br />
The rewritten query now captures <strong>the</strong> idea of pruning by<br />
purely rel<strong>at</strong>ional means. In contrast to <strong>the</strong> technique suggested<br />
in [13], however, <strong>the</strong> rewrites do not depend on an<br />
explicit tree-aw<strong>are</strong>ness in <strong>the</strong> underlying kernel. Moreover,<br />
both rewrites <strong>are</strong> universally applicable to rel<strong>at</strong>ional plans<br />
and could speed up query execution even on d<strong>at</strong>a th<strong>at</strong> does<br />
not origin<strong>at</strong>e from an XML document encoding.<br />
4.2 Context Pruning on IBM DB2<br />
App<strong>are</strong>ntly, both rewrites <strong>are</strong> not among <strong>the</strong> rule set of<br />
<strong>the</strong> DB2 query optimizer. To assess <strong>the</strong>ir potential, we used<br />
<strong>the</strong> two SQL equivalents (1) and (3) to transl<strong>at</strong>e <strong>the</strong> XP<strong>at</strong>h<br />
expression /descendant::city/following::zipcode.<br />
Figure 11 illustr<strong>at</strong>es <strong>the</strong> resulting query execution times,<br />
where our pruning-enabled SQL code clearly outperforms<br />
<strong>the</strong> original SQL query. The same experiment also demonstr<strong>at</strong>es<br />
how <strong>the</strong> demand for duplic<strong>at</strong>e removal in <strong>the</strong> XP<strong>at</strong>h<br />
semantics puts a serious strain on <strong>the</strong> rel<strong>at</strong>ional system. On<br />
XMark-gener<strong>at</strong>ed documents, <strong>the</strong> number of result nodes<br />
retrieved from <strong>the</strong> base tables scales quadr<strong>at</strong>ically with <strong>the</strong><br />
XML document size. On large instances, we thus have to<br />
pay <strong>the</strong> toll for a growing sort overhead (8.1 × 10 9 nodes to<br />
sort for <strong>the</strong> 1.1 GB instance) and DB2 significantly falls back<br />
(1)<br />
954<br />
execution time [ms]<br />
10 7<br />
10 6<br />
10 5<br />
10 4<br />
10 3<br />
10 2<br />
10 1<br />
1<br />
1<br />
2<br />
1<br />
no pruning (orig. transl<strong>at</strong>ion (1))<br />
pruning (rewritten SQL code (3))<br />
8<br />
1<br />
77<br />
5<br />
344<br />
11<br />
2,913<br />
20<br />
32,306<br />
61<br />
320,981<br />
177<br />
19,356,783<br />
589<br />
0.11 0.29 1.1 3.3 11 34 111 335 1,118<br />
XML document size [MB]<br />
Figure 11: Context set pruning speeds up <strong>the</strong> evalu<strong>at</strong>ion<br />
of /descendant::city/following::zipcode by orders<br />
of magnitude.<br />
behind quadr<strong>at</strong>ic scaling. The rewritten query, in contrast,<br />
reaches sub-linear scaling across <strong>the</strong> entire size range.<br />
5. XPATH OPTIMIZATION ON RDBMSS<br />
Their effective measures to rewrite query execution plans<br />
for most efficient evalu<strong>at</strong>ion certainly <strong>are</strong> key aspects th<strong>at</strong><br />
made rel<strong>at</strong>ional d<strong>at</strong>abase systems so successful. It is interesting<br />
to see, since, how <strong>the</strong> same techniques will n<strong>at</strong>urally<br />
lead to efficient XP<strong>at</strong>h evalu<strong>at</strong>ion plans on off-<strong>the</strong>-shelf<br />
RDBMS implement<strong>at</strong>ions.<br />
5.1 Existential Quantific<strong>at</strong>ion and Early-Out<br />
The XP<strong>at</strong>h language specific<strong>at</strong>ion [3] makes extensive use<br />
of existential quantific<strong>at</strong>ion, e.g., to define <strong>the</strong> semantics of<br />
XP<strong>at</strong>h predic<strong>at</strong>e expressions. It has been demonstr<strong>at</strong>ed in<br />
[8] th<strong>at</strong> <strong>the</strong> lack of an early-out evalu<strong>at</strong>ion str<strong>at</strong>egy to implement<br />
predic<strong>at</strong>es can seriously hamper <strong>the</strong> scalability of<br />
any XP<strong>at</strong>h implement<strong>at</strong>ion.<br />
Such a str<strong>at</strong>egy perfectly blends with <strong>the</strong> aforementioned<br />
single record execution str<strong>at</strong>egy. A single record scan in <strong>the</strong><br />
rel<strong>at</strong>ional plan tree will immedi<strong>at</strong>ely abort <strong>the</strong> processing of<br />
its subplan as soon as <strong>the</strong> first m<strong>at</strong>ching tuple is found. In<br />
Section 3.4 we took advantage of this fe<strong>at</strong>ure to demonstr<strong>at</strong>e<br />
an enhanced evalu<strong>at</strong>ion str<strong>at</strong>egy for <strong>the</strong> XP<strong>at</strong>h p<strong>are</strong>nt axis.<br />
The P<strong>at</strong>hfinder XQuery compiler 6 implements a compil<strong>at</strong>ion<br />
procedure th<strong>at</strong> transl<strong>at</strong>es arbitrary XQuery expressions<br />
into SQL:1999 queries [12, 11]. In <strong>the</strong> execution plans resulting<br />
from this compil<strong>at</strong>ion, we found DB2 to make frequent<br />
use of its single record functionality: <strong>the</strong> execution plan th<strong>at</strong><br />
evalu<strong>at</strong>es <strong>the</strong> SQL counterpart of XMark Query Q1, e.g.,<br />
contains seven instances of a single record scan. On our test<br />
pl<strong>at</strong>form, this allows <strong>the</strong> evalu<strong>at</strong>ion of <strong>the</strong> query in less than<br />
one second over a 1.1 GB XMark instance.<br />
5.2 Bottom-Up Loc<strong>at</strong>ion Step Processing<br />
The same query plan also demonstr<strong>at</strong>es how <strong>the</strong> applic<strong>at</strong>ion<br />
of join reordering mechanisms—a well-studied problem<br />
in <strong>the</strong> rel<strong>at</strong>ional domain—enables <strong>the</strong> system to perform effective<br />
rewrites th<strong>at</strong> proved to be quite a challenge to n<strong>at</strong>ive<br />
XP<strong>at</strong>h processors in <strong>the</strong> past. As discussed by [16], a stepby-step<br />
evalu<strong>at</strong>ion of XP<strong>at</strong>h loc<strong>at</strong>ion p<strong>at</strong>hs usually leads to<br />
6 http://www.p<strong>at</strong>hfinder-xquery.org/
Str<strong>at</strong>egy exec. time [s]<br />
(a) arbitrary join order 129.1<br />
(b) step-by-step evalu<strong>at</strong>ion 1.037<br />
(c) simul<strong>at</strong>ion of “binary associ<strong>at</strong>ions” 0.001<br />
Table 1: DB2 execution times for XMark Q15 using<br />
(a) a system-determined join order, (b) step-by-step<br />
join order, and (c) a 〈p<strong>at</strong>h, pre〉 B-tree th<strong>at</strong> simul<strong>at</strong>es<br />
a binary associ<strong>at</strong>ions encoding (111 MB document).<br />
a top-down navig<strong>at</strong>ion in <strong>the</strong> XML document tree. Depending<br />
on <strong>the</strong> selectivity of <strong>the</strong> individual steps, however, it may<br />
be advantageous to process a p<strong>at</strong>h in a backward fashion and<br />
navig<strong>at</strong>e <strong>the</strong> tree bottom-up or use a hybrid combin<strong>at</strong>ion of<br />
both.<br />
On <strong>the</strong> rel<strong>at</strong>ional back-end, <strong>the</strong> same altern<strong>at</strong>ives surface<br />
as different join orders in <strong>the</strong> query execution plan, a situ<strong>at</strong>ion<br />
th<strong>at</strong> modern <strong>RDBMSs</strong> know well how to deal with.<br />
Commodity systems typically rely on st<strong>at</strong>istical inform<strong>at</strong>ion<br />
about <strong>the</strong> underlying tables to decide for a specific join order.<br />
These st<strong>at</strong>istics turn out to be a suitable measure also<br />
to find appropri<strong>at</strong>e evalu<strong>at</strong>ion str<strong>at</strong>egies for XP<strong>at</strong>h, even if<br />
<strong>the</strong> system is completely unaw<strong>are</strong> of <strong>the</strong> tree structure th<strong>at</strong><br />
defines <strong>the</strong> rel<strong>at</strong>ional table content.<br />
XMark Query Q1 is essentially a measure for <strong>the</strong> system’s<br />
XP<strong>at</strong>h performance for <strong>the</strong> p<strong>at</strong>h<br />
/site/people/person [ @id = "person0" ] .<br />
The query plan th<strong>at</strong> corresponds to <strong>the</strong> full XMark query<br />
amounts to 45 oper<strong>at</strong>ors (as obtained with DB2’s plan analyzer<br />
db2expln). Despite this complexity, DB2 detected <strong>the</strong><br />
chance to evalu<strong>at</strong>e this p<strong>at</strong>h in a backward fashion. Starting<br />
from <strong>the</strong> highly selective predic<strong>at</strong>e on <strong>the</strong> @id <strong>at</strong>tribute<br />
(accessed via an index on <strong>at</strong>tribute values), <strong>the</strong> system processes<br />
<strong>the</strong> above p<strong>at</strong>h from <strong>the</strong> leaf to <strong>the</strong> root. For o<strong>the</strong>r<br />
XMark queries, we found DB2’s optimizer to opt for a topdown<br />
str<strong>at</strong>egy or even hybrid combin<strong>at</strong>ions of both. For<br />
an equivalent decision, earlier work [16] had depended on<br />
specialized tree st<strong>at</strong>istics (e.g., d<strong>at</strong>a guides).<br />
Taking a good decision on one of <strong>the</strong> possible join orders<br />
can be a serious challenge to <strong>the</strong> underlying RDBMS optimizer.<br />
P<strong>at</strong>hs th<strong>at</strong> involve a large number of loc<strong>at</strong>ion steps<br />
can quickly exceed <strong>the</strong> limits of eager join re-ordering algorithms.<br />
XMark Q15 essentially consists of a sequence of 13<br />
child steps—app<strong>are</strong>ntly too much for <strong>the</strong> query optimizer of<br />
DB2. Assuming an XMark instance of 111 MB, this lead to<br />
an execution time of 129 seconds on our test pl<strong>at</strong>form. If we<br />
trick <strong>the</strong> system into using a step-by-step p<strong>at</strong>h evalu<strong>at</strong>ion by<br />
rewriting <strong>the</strong> SQL code, we can reduce <strong>the</strong> execution time<br />
to now only one second (see Table 1(b)).<br />
5.3 Schema-Aw<strong>are</strong>ness with Partitioned Trees<br />
The evalu<strong>at</strong>ion techniques we described assume a schemaoblivious<br />
storage, where each tree node maps to one tuple<br />
of a single d<strong>at</strong>abase table in absence of any XML Schema<br />
inform<strong>at</strong>ion. O<strong>the</strong>rs have suggested to incorpor<strong>at</strong>e such inform<strong>at</strong>ion<br />
and collect nodes with a common root-to-node<br />
p<strong>at</strong>h into a unique d<strong>at</strong>abase table [20, 22]. The resulting<br />
semantical grouping may—depending on <strong>the</strong> specific query<br />
workload—improve d<strong>at</strong>a access p<strong>at</strong>terns to <strong>the</strong> underlying<br />
tables <strong>at</strong> <strong>the</strong> cost of an increased assembly overhead to combine<br />
(intermedi<strong>at</strong>e) results from multiple base tables [22].<br />
955<br />
10 0 10 1 10 2 10 3 10 4 10 5<br />
696<br />
767<br />
13,534<br />
/descendant::open_auction/bidder/increase (Fig. 2)<br />
1<br />
6<br />
/site/regions/africa (Fig. 6)<br />
5,505<br />
427<br />
453<br />
14,595<br />
/descendant::description[p<strong>are</strong>nt::node()] (Fig. 9)<br />
100 101 102 103 104 105 execution time [ms]<br />
Figure 12: Rel<strong>at</strong>ional XP<strong>at</strong>h performance (light and<br />
medium gray) vs. DB2’s built-in XQuery support<br />
(dark) as observed for a 111 MB XML instance.<br />
The use of partitioned B-trees covers both situ<strong>at</strong>ions in<br />
a surprisingly simple manner. To this end, we enrich our<br />
encoding with a column p<strong>at</strong>h th<strong>at</strong> records <strong>the</strong> root-to-node<br />
p<strong>at</strong>h for each XML tree node. The XML storage of Microsoft’s<br />
SQL Server, in fact, comes with an implement<strong>at</strong>ion<br />
of a p<strong>at</strong>h column (dubbed PATH ID) [19]. Used as <strong>the</strong> prefix<br />
of a partitioned B-tree, this new column will partition<br />
<strong>the</strong> single document table doc into separ<strong>at</strong>e ranges for each<br />
p<strong>at</strong>h in <strong>the</strong> tree. For example, a conc<strong>at</strong>en<strong>at</strong>ed 〈p<strong>at</strong>h, pre〉 Btree<br />
readily realizes <strong>the</strong> binary associ<strong>at</strong>ion encoding of [20]<br />
over range-encoded d<strong>at</strong>a.<br />
We implemented this idea on DB2. For queries th<strong>at</strong> contain<br />
long sequences of child steps, <strong>the</strong> impact can be significant:<br />
<strong>the</strong> execution time for Query Q15 from <strong>the</strong> XMark<br />
benchmark, e.g., drops to one milli-second if a 〈p<strong>at</strong>h, pre〉<br />
B-tree is used to simul<strong>at</strong>e <strong>the</strong> binary associ<strong>at</strong>ion encoding<br />
(see Table 1(c)).<br />
6. RELATED TECHNIQUES<br />
Much like o<strong>the</strong>r commercial d<strong>at</strong>abase systems, <strong>the</strong> l<strong>at</strong>est<br />
version of IBM DB2 ships with <strong>the</strong> integr<strong>at</strong>ed “pureXML”<br />
XQuery execution engine. Obviously, this functionality provides<br />
an interesting baseline for <strong>the</strong> rel<strong>at</strong>ional XP<strong>at</strong>h evalu<strong>at</strong>ion<br />
str<strong>at</strong>egies th<strong>at</strong> we investig<strong>at</strong>e here.<br />
We have run all aforementioned queries on <strong>the</strong> same system<br />
over an 111 MB XMark instance using <strong>the</strong> pureXML<br />
query processor. 7 Since <strong>the</strong> figures of our rel<strong>at</strong>ional XP<strong>at</strong>h<br />
evalu<strong>at</strong>ion do not include serializ<strong>at</strong>ion time, we wrapped <strong>the</strong><br />
respective p<strong>at</strong>h expressions into a call of fn:count () to also<br />
elimin<strong>at</strong>e this cost factor for DB2’s n<strong>at</strong>ive XP<strong>at</strong>h processor.<br />
Figure 12 contains <strong>the</strong> execution times shown in previous<br />
figures (Figures 2, 6, and 9) for <strong>the</strong> rel<strong>at</strong>ional p<strong>at</strong>h evalu<strong>at</strong>ion,<br />
along with execution times achieved with <strong>the</strong> pureXML<br />
processor for <strong>the</strong> same queries.<br />
The workload we <strong>are</strong> using here is quite different to <strong>the</strong><br />
typical pureXML workload. The DB2 column type XML is<br />
designed to hold small but many XML instances, such as<br />
<strong>the</strong>y arise, e.g., in message processing systems. Such d<strong>at</strong>a<br />
7 DB2’s built-in XQuery processor does not support <strong>the</strong><br />
XP<strong>at</strong>h following axis, which is why we omitted a test for<br />
<strong>the</strong> query of Figure 11.
can optionally be indexed with XML p<strong>at</strong>tern indexes th<strong>at</strong> index<br />
all values th<strong>at</strong> <strong>are</strong> reachable via a given p<strong>at</strong>h expression.<br />
Since <strong>the</strong> focus of our experiments is raw XP<strong>at</strong>h navig<strong>at</strong>ion<br />
performance, DB2 cannot benefit from any of its value-based<br />
indexes, which puts a serious strain on <strong>the</strong> built-in XP<strong>at</strong>h<br />
processor. The resulting performance, shown in Figure 12,<br />
indic<strong>at</strong>es th<strong>at</strong>—<strong>at</strong> least for certain query workloads—a rel<strong>at</strong>ional<br />
approach to XP<strong>at</strong>h evalu<strong>at</strong>ion can significantly outperform<br />
an industrial-strength n<strong>at</strong>ive XQuery processor.<br />
6.1 More Rel<strong>at</strong>ed Work<br />
To add efficient tree processing capabilities to rel<strong>at</strong>ional<br />
systems, earlier work proposed to modify <strong>the</strong> existing d<strong>at</strong>abase<br />
engine and add highly specialized XP<strong>at</strong>h processing<br />
algorithms to <strong>the</strong> DBMS kernel. The multi-predic<strong>at</strong>e merge<br />
join (MPMGJN) has been suggested in [24] to address containment<br />
queries by rel<strong>at</strong>ional means, a generaliz<strong>at</strong>ion of<br />
XP<strong>at</strong>h descendant/child navig<strong>at</strong>ion steps. A n<strong>at</strong>ural extension<br />
of MPMGJN <strong>are</strong> <strong>the</strong> structural join [1] and staircase<br />
join [13] algorithms. Considering d<strong>at</strong>a dependencies<br />
th<strong>at</strong> origin<strong>at</strong>e from <strong>the</strong> encoded tree structure, <strong>the</strong>y target<br />
<strong>the</strong> full set of axes in <strong>the</strong> XP<strong>at</strong>h language. Similarly, <strong>the</strong><br />
P<strong>at</strong>hStack and TwigStack algorithms [6] <strong>are</strong> tailor-made to<br />
process tree navig<strong>at</strong>ion primitives in rel<strong>at</strong>ional systems.<br />
While <strong>the</strong>se proposals make profitably use of rel<strong>at</strong>ional<br />
processing paradigms, we feel th<strong>at</strong> <strong>the</strong> inherent invasion of<br />
<strong>the</strong> d<strong>at</strong>abase kernel to add <strong>the</strong> new functionality is an unacceptable<br />
option for most real-world systems. The techniques<br />
we described here solely depend on existing DBMS technology,<br />
available in off-<strong>the</strong>-shelf d<strong>at</strong>abase implement<strong>at</strong>ions.<br />
Crucial to <strong>the</strong>se techniques is <strong>the</strong> use of conc<strong>at</strong>en<strong>at</strong>ed Btree<br />
indexes. This distinguishes our work from earlier approaches<br />
th<strong>at</strong> depended on specialized index variants, such<br />
as XB- or XR-Trees [6, 14].<br />
7. SUMMARY<br />
In <strong>the</strong> interest of efficient XML processing on rel<strong>at</strong>ional<br />
systems, earlier work has proposed a number of novel d<strong>at</strong>abase<br />
oper<strong>at</strong>ors th<strong>at</strong> provide specialized tree processing support<br />
in <strong>the</strong> d<strong>at</strong>abase kernel. As an invasion of <strong>the</strong> system’s<br />
kernel seems hardly an option for any DBMS vendor, we<br />
evalu<strong>at</strong>ed how much existing and widely available d<strong>at</strong>abase<br />
functionality is suited to efficiently process XML d<strong>at</strong>a.<br />
Among this functionality, <strong>the</strong> use of partitioned B-trees<br />
proved particularly effective. Low-selectivity prefixes in conc<strong>at</strong>en<strong>at</strong>ed<br />
B-trees implement a semantical grouping which<br />
allows for efficient XP<strong>at</strong>h evalu<strong>at</strong>ion str<strong>at</strong>egies. Fur<strong>the</strong>r,<br />
well-known rewrite techniques from <strong>the</strong> rel<strong>at</strong>ional domain<br />
turned out to readily implement tree-processing functionality<br />
th<strong>at</strong> proved challenging in <strong>the</strong> past. Most notably,<br />
we realized staircase join’s pruning idea and bottom-up p<strong>at</strong>h<br />
processing by purely rel<strong>at</strong>ional means.<br />
The techniques we saw <strong>are</strong> of universal n<strong>at</strong>ure and equally<br />
applicable to any of <strong>the</strong> established tree encodings, including<br />
pre/post - [10] and Dewey-based [23] numbering schemes. As<br />
such, <strong>the</strong>y may prove fruitful for RDBMS vendors, as well<br />
as XQuery implementors.<br />
Acknowledgements<br />
We thank Rudolf Bayer who has contributed to this work<br />
with insightful feedback. Our research is supported by <strong>the</strong><br />
German Research Found<strong>at</strong>ion DFG (grant GR 2036/2-1).<br />
956<br />
8. REFERENCES<br />
[1] Shurug Al-Khalifa, H. V. Jagadish, Jignesh M. P<strong>at</strong>el,<br />
Yuqing Wu, Nick Koudas, and Divesh Srivastava.<br />
Structural Joins: A Primitive for Efficient XML Query<br />
P<strong>at</strong>tern M<strong>at</strong>ching. In Proc. of <strong>the</strong> 18th Int’l<br />
Conference on D<strong>at</strong>a Engineering (ICDE), SanJose,<br />
CA, USA, February 2002.<br />
[2] Rudolf Bayer and Karl Unterauer. Prefix B-Trees.<br />
<strong>ACM</strong> Transactions on D<strong>at</strong>abase Systems (TODS),<br />
2(1), March 1977.<br />
[3] Anders Berglund, Scott Boag, Don Chamberlin,<br />
Mary F. Fernández, Michael Kay, Jon<strong>at</strong>han Robie,<br />
and Jérôme Siméon. XML P<strong>at</strong>h Language (XP<strong>at</strong>h)<br />
2.0. World Wide Web Consortium Candid<strong>at</strong>e<br />
Recommend<strong>at</strong>ion, September 2005.<br />
[4] Scott Boag, Don Chamberlin, Mary F. Fernández,<br />
Daniela Florescu, Jon<strong>at</strong>han Robie, and Jérôme<br />
Siméon. XQuery 1.0: An XML Query Language.<br />
World Wide Web Consortium Candid<strong>at</strong>e<br />
Recommend<strong>at</strong>ion, September 2005.<br />
[5] Peter Boncz, Torsten Grust, Maurice van Keulen,<br />
Stefan Manegold, Jan Rittinger, and Jens Teubner.<br />
MonetDB/XQuery: A Fast XQuery Processor<br />
Powered by a Rel<strong>at</strong>ional Engine. In Proc. of <strong>the</strong> 2006<br />
<strong>ACM</strong> SIGMOD Int’l Conference on Management of<br />
D<strong>at</strong>a, Chicago, IL, USA, June 2006.<br />
[6] Nicolas Bruno, Nick Koudas, and Divesh Srivastava.<br />
Holistic Twig Joins: Optimal XML P<strong>at</strong>tern M<strong>at</strong>ching.<br />
In Proc. of <strong>the</strong> 2002 <strong>ACM</strong> SIGMOD Int’l Conference<br />
on Management of D<strong>at</strong>a, Madison, WI, USA, 2002.<br />
[7] David DeHaan, David Toman, Mariano P. Consens,<br />
and M. Tamer Özsu. A Comprehensive XQuery to<br />
SQL Transl<strong>at</strong>ion using Dynamic Interval Encoding. In<br />
Proc. of <strong>the</strong> 2003 <strong>ACM</strong> SIGMOD Int’l Conference on<br />
Management of D<strong>at</strong>a, San Diego, CA, USA, June<br />
2003.<br />
[8] Georg Gottlob, Christoph Koch, and Reinhard<br />
Pichler. Efficient Algorithms for Processing XP<strong>at</strong>h<br />
Queries. <strong>ACM</strong> Transactions on D<strong>at</strong>abase Systems<br />
(TODS), 30(2), June 2005.<br />
[9] Goetz Graefe. Sorting and Indexing with Partitioned<br />
B-Trees. In Proc. of <strong>the</strong> 1st Int’l Conference on<br />
Innov<strong>at</strong>ive D<strong>at</strong>a Systems Research (CIDR), Asilomar,<br />
CA, USA, January 2003.<br />
[10] Torsten Grust. Acceler<strong>at</strong>ing XP<strong>at</strong>h Loc<strong>at</strong>ion Steps. In<br />
Proc. of <strong>the</strong> 2002 <strong>ACM</strong> SIGMOD Int’l Conference on<br />
Management of D<strong>at</strong>a, Madison, WI, USA, June 2002.<br />
[11] Torsten Grust, Jan Rittinger, Sherif Sakr, and Jens<br />
Teubner. A SQL:1999 Code Gener<strong>at</strong>or for <strong>the</strong><br />
P<strong>at</strong>hfinder XQuery Compiler. In Proc. of <strong>the</strong> 2007<br />
<strong>ACM</strong> SIGMOD Int’l Conference on Management of<br />
D<strong>at</strong>a, Beijing, China, June 2007.<br />
[12] Torsten Grust, Sherif Sakr, and Jens Teubner. XQuery<br />
on SQL Hosts. In Proc. of <strong>the</strong> 30th Int’l Conference<br />
on Very Large D<strong>at</strong>abases (VLDB), Toronto, Canada,<br />
September 2004.<br />
[13] Torsten Grust, Maurice van Keulen, and Jens<br />
Teubner. Staircase Join: Teach a Rel<strong>at</strong>ional DBMS to<br />
W<strong>at</strong>ch its (Axis) Steps. In Proc. of <strong>the</strong> 29th Int’l<br />
Conference on Very Large D<strong>at</strong>abases (VLDB), Berlin,<br />
Germany, September 2003.
[14] Haifeng Jiang, Hongjun Lu, Wei Wang, and<br />
Beng Chin Ooi. XR-Tree: Indexing XML D<strong>at</strong>a for<br />
Efficient Structural Joins. In Proc. of <strong>the</strong> 19th Int’l<br />
Conference on D<strong>at</strong>a Engineering (ICDE), Bangalore,<br />
India, March 2003.<br />
[15] Quanzhong Li and Bongki Moon. Indexing and<br />
Querying XML D<strong>at</strong>a for Regular P<strong>at</strong>h Expressions. In<br />
Proc. of <strong>the</strong> 27th Int’l Conference on Very Large<br />
D<strong>at</strong>abases (VLDB), Rome, Italy, September 2001.<br />
[16] Jason McHugh and Jennifer Widom. Query<br />
Optimiz<strong>at</strong>ion for XML. In Proc. of <strong>the</strong> 25th Int’l<br />
Conference on Very Large D<strong>at</strong>abases (VLDB),<br />
Edinburgh, Scotland, UK, September 1999.<br />
[17] Microsoft Corpor<strong>at</strong>ion. SQL Server 2005, 2005.<br />
[18] P<strong>at</strong>rick E. O’Neil, Elizabeth J. O’Neil, Shankar Pal,<br />
Istvan Cseri, Gideon Schaller, and Nigel Westbury.<br />
ORDPATHs: Insert-Friendly XML Node Labels. In<br />
Proc. of <strong>the</strong> 2004 <strong>ACM</strong> SIGMOD Int’l Conference on<br />
Management of D<strong>at</strong>a, Paris, France, June 2004.<br />
[19] Shankar Pal, Istvan Cseri, Oliver Seeliger, Gideon<br />
Schaller, Leo Giakoumakis, and Vasili Zolotov.<br />
Indexing XML D<strong>at</strong>a Stored in a Rel<strong>at</strong>ional D<strong>at</strong>abase.<br />
In Proc. of <strong>the</strong> 30th Int’l Conference on Very Large<br />
D<strong>at</strong>abases (VLDB), Toronto, Canada, September<br />
2004.<br />
[20] Albrecht Schmidt, Martin L. Kersten, Menzo<br />
Windhouwer, and Florian Waas. Efficient Rel<strong>at</strong>ional<br />
957<br />
Storage and Retrieval of XML Documents. In The<br />
World Wide Web and D<strong>at</strong>abases (WebDB), 3rd Int’l<br />
Workshop, Dallas, TX, USA, May 2000.<br />
[21] Albrecht Schmidt, Florian Waas, Martin L. Kersten,<br />
Michael J. C<strong>are</strong>y, Ioana Manolescu, and Ralph Busse.<br />
XMark: A Benchmark for XML D<strong>at</strong>a Management. In<br />
Proc. of <strong>the</strong> 28th Int’l Conference on Very Large<br />
D<strong>at</strong>abases (VLDB), Hong Kong, China, August 2002.<br />
[22] Jayavel Shanmugasundaram, Kristin Tufte, Chun<br />
Zhang, Gang He, David J. DeWitt, and Jeffrey F.<br />
Naughton. Rel<strong>at</strong>ional D<strong>at</strong>abases for Querying XML<br />
Documents: Limit<strong>at</strong>ions and Opportunities. In Proc.<br />
of <strong>the</strong> 25th Int’l Conference on Very Large D<strong>at</strong>abases<br />
(VLDB), Edinburgh, Scotland, UK, September 1999.<br />
Morgan Kaufmann.<br />
[23] Igor T<strong>at</strong>arinov, Str<strong>at</strong>is D. Viglas, Kevin Beyer,<br />
Jayavel Shanmugasundaram, Eugene Shekita, and<br />
Chun Zhang. Storing and Querying Ordered XML<br />
Using a Rel<strong>at</strong>ional D<strong>at</strong>abase System. In Proc. of <strong>the</strong><br />
2002 <strong>ACM</strong> SIGMOD Int’l Conference on Management<br />
of D<strong>at</strong>a, Madison, WI, USA, 2002.<br />
[24] Chun Zhang, Jeffrey Naughton, David DeWitt, Qiong<br />
Luo, and Guy Lohman. On Supporting Containment<br />
Queries in Rel<strong>at</strong>ional D<strong>at</strong>abase Management Systems.<br />
In Proc. of <strong>the</strong> 2001 <strong>ACM</strong> SIGMOD Int’l Conference<br />
on Management of D<strong>at</strong>a, Santa Barbara, CA, USA,<br />
2001.