25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

6.5 Experimental Evaluation<br />

demand re-optimization directly reacts to the workload change and switches plans. Hence,<br />

over time, execution time reductions are yielded due to the fast adaptation. In order to<br />

achieve this with periodical re-optimization, a really small ∆t is required. However, this<br />

would increase the total re-optimization time. Finally, note that the small differences<br />

<strong>of</strong> absolute execution times compared to Chapter 3 are caused by randomly generated<br />

messages for this experiment according to the given selectivities.<br />

Complex-Plan End-to-End Comparison<br />

In addition to the simple-plan scenario, we executed a second series <strong>of</strong> experiments using<br />

the more complex example plan P 7 ′. The difference <strong>of</strong> plan P 7 ′ compared to the already<br />

introduced plan P 7 is that we explicitly changed the join query type chain to a clique<br />

type in order to have the possibility <strong>of</strong> arbitrary join reordering. This plan receives a data<br />

set, loads data from four different systems, and executes schema transformations with<br />

translation operators (XSLT scripts). Finally, the five data sets are joined using four join<br />

operators and the result is sent to a fifth system. We executed 20,000 plan instances using<br />

the input cardinalities shown in Figure 6.15(b) and the same parameter configurations as<br />

for the simple plan scenario. For the four loaded data sets, we used input cardinalities <strong>of</strong><br />

d ∈ {2, 4, 8, 16} (in 100 kB). After every 2,000 instances, we changed the input cardinalities<br />

<strong>of</strong> the four external systems round-robin as shown in Figure 6.15(a).<br />

Regarding the results that are shown in Figures 6.15(c)-6.15(f), we observe similar characteristics<br />

as within the simple-plan scenario. It is important to note that (1) the higher the<br />

optimization opportunities <strong>of</strong> a plan, and (2) the higher the number <strong>of</strong> workload changes,<br />

the higher the execution time improvements achieved by on-demand re-optimization due<br />

to the higher importance <strong>of</strong> immediate adaptation (see Figures 6.15(d) and 6.15(f)). However,<br />

we restricted ourself to moderate differences <strong>of</strong> input data sizes in order to show only<br />

this main characteristic rather than showing arbitrarily high improvements. Further, we<br />

also observe, on average, higher re-optimization time improvements as within the simpleplan<br />

comparison scenario due to the higher influence <strong>of</strong> immediate re-optimization and<br />

directed re-optimization (see Figures 6.15(c) and 6.15(e)). The outliers for periodical reoptimization<br />

have been cause by the Java garbage collection because our implementation<br />

<strong>of</strong> the full join enumeration (DPSize) requires many temporary data objects, which are<br />

lazily deleted if space is required. This effect is only visible for periodical re-optimization<br />

because there, we used full join enumeration, where more objects have been created and the<br />

join enumeration is invoked more <strong>of</strong>ten than our on-demand re-optimization. In conclusion,<br />

the benefit <strong>of</strong> on-demand re-optimization increases with increasing plan complexity<br />

and increasing frequency <strong>of</strong> workload changes.<br />

Scalability<br />

In order to investigate the influencing aspects such as the number <strong>of</strong> workload shifts wc,<br />

the input data size d and the optimization interval ∆t in more detail, we conducted an<br />

additional series <strong>of</strong> scalability experiments, where we varied these parameters. We compare<br />

the unoptimized case with periodical and on-demand re-optimization using the cumulated<br />

plan execution time, the cumulated optimization time, and the number <strong>of</strong> re-optimizations.<br />

As the experimental setup, we executed 5,000 plan instances <strong>of</strong> P 5 for each configuration,<br />

where each workload shift switches between the two selectivity configurations <strong>of</strong> (sel(o 2 ) =<br />

0.8, sel(o 3 ) = 0.6, sel(o 4 ) = 0.1) and (sel(o 2 ) = 0.1, sel(o 3 ) = 0.6, sel(o 4 ) = 0.8). As<br />

191

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!