30.08.2014 Views

url - Universität zu Lübeck

url - Universität zu Lübeck

url - Universität zu Lübeck

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

6.4. AUTONOMOUS XML INDEXING 121<br />

Test Scenario 3<br />

The previous two scenarios were constructed to evaluate isolated characteristics<br />

of the KeyX auto index system and operated on artificial data. In order to determine<br />

the overall performance of KeyX we set up a more realistic test using real<br />

XML data from the DBLP project [70] - the well-known computer science bibliography.<br />

The full DBLP data consists of approximately 500,000 publications, mainly<br />

articles, inproceedings, and books.<br />

Our concrete test data is an extract of the full DBLP of roughly 26 Megabyte<br />

and consists of 586546 element nodes, attribute nodes and text nodes, more precisely<br />

534 articles, 57000 inproceedings and 1024 proceedings.<br />

For the test we set up 27 different XPath based queries. Each operation o has<br />

one index candidate of class ican 1 (o) which supports the query to the best. We<br />

created an initial workload by randomly selecting 25 of these database operations.<br />

In general, some operations are selected multiple times and others are not part of<br />

the workload. Additionally, the operations in the workload are assigned as querying<br />

or modifying at random using a predefined ratio.<br />

Further workloads are created by a delta algorithm that exchanges one operation<br />

from the workload with a new one that is selected randomly from the set of<br />

27 operations. The total size of the workload stays unchanged.<br />

The delta algorithm guarantees small and random changes in the workload -<br />

both in the contained path expressions and the ratio of querying and modifying<br />

operations. This should simulate a real database application that changes over30<br />

time. Due to the slowly changing workload the ISP Tool is able to adapt the KeyX<br />

index system: The index selection tool is called periodically (every 30 runs) and<br />

finds a new index configuration that suits better for the changed workloads. Of<br />

course, each run of this non-deterministic algorithm generates different results.<br />

The costs to drop and create new indexes are not taken into account because<br />

in realistic scenarios with less fast changing workloads the index selection tool<br />

would be called less frequently and index updated can be done in times when the<br />

CPU is less used. We present the measurements of a representative test run in<br />

figure 6.7.<br />

The first four workloads are executed without any index. Then, the index selection<br />

tool is called and creates indexes that accelerate ongoing workloads. The<br />

delta algorithm changes the workload more and more so that the established indexes<br />

are performing poorer. Each 30th there is seen an edge in the curve that<br />

indicates that the index selection tool has updated the index configuration. The<br />

sawtooth pattern is typical for this test.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!