25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

6.5 Experimental Evaluation<br />

abstractions <strong>of</strong> statistics maintenance, cost estimation and re-optimization for enabling the<br />

use <strong>of</strong> both alternative optimization models. Most importantly, we extended the optimizer<br />

interface in the form <strong>of</strong> exchanging partial PlanOptTrees, which includes the modification<br />

<strong>of</strong> optimization techniques in order to enable directed re-optimization. Furthermore, we<br />

ran the experiments on the same platform as used within the rest <strong>of</strong> this thesis. With the<br />

aim <strong>of</strong> simulating arbitrary changing workload characteristics, we synthetically generated<br />

XML data sets with varying selectivities and cardinalities as input for our integration<br />

flows.<br />

Simple-Plan End-to-End Comparison<br />

In a first series <strong>of</strong> experiments, we compared periodical re-optimization with on-demand<br />

re-optimization (both asynchronous with inter-instance plan change). In order to conduct<br />

a fair evaluation, we used our fairly simple example plan P 5 because the benefit <strong>of</strong> ondemand<br />

re-optimization increases with increasing plan complexity. Essentially, we reused<br />

the end-to-end comparison experiment from Chapter 3, where we executed 100,000 instances<br />

for the non-optimized plan as well as for both optimization approaches, and we<br />

then measured re-optimization and plan execution times. The execution time already includes<br />

the synchronous statistic maintenance and the evaluation <strong>of</strong> optimality conditions<br />

in case <strong>of</strong> on-demand re-optimization. During execution, we varied the selectivities <strong>of</strong> the<br />

three selection operators (see Figure 6.14(a)) and the input cardinality (see Figure 6.14(b)).<br />

The input data was generated without correlations. With regard to re-optimization, there<br />

are four points (∗1, ∗2, ∗3, and ∗4) where a workload change (intersection points between<br />

selectivities) reasons the change <strong>of</strong> the optimal plan. For periodical re-optimization, we<br />

used a period <strong>of</strong> ∆t = 300 s, while for on-demand re-optimization, we used a minimum<br />

existence time <strong>of</strong> ∆t = 1 s and a lazy condition evaluation count <strong>of</strong> ten.<br />

Figure 6.14(c) shows the re-optimization times, while Figure 6.14(e) illustrates the cumulative<br />

optimization time. There, we used the elapsed scenario time as the x-axis in<br />

order to illustrate the characteristics <strong>of</strong> periodical re-optimization. We see that periodical<br />

re-optimization requires many unnecessary optimization steps (36 steps), while on-demand<br />

re-optimization is only triggered if a new plan will be found (6 steps). For workload shifts<br />

∗2 and ∗3, two on-demand re-optimizations were triggered due to the two intersection<br />

points <strong>of</strong> selectivities and the used exponential moving average, which caused a small<br />

statistics adaptation delay that led to exceeding the lazy count before converging to the<br />

final statistic measure. With a different parameterization, only one re-optimization was<br />

triggered for each workload shift. Despite directed re-optimization, a single optimization<br />

step is, on average, slightly slower than a full re-optimization step due to the small<br />

optimization search space <strong>of</strong> the applied optimization techniques (selection reordering<br />

and switch path reordering). The reason is that directed re-optimization requires some<br />

constant additional efforts and benefits only if several operators can be ignored during<br />

optimization. Further, the re-optimization time <strong>of</strong> this plan is dominated by the physical<br />

plan compilation and the waiting time for the next possible exchange <strong>of</strong> plans (due to<br />

the asynchronous optimization). As shown in Chapter 3, if we would use a larger ∆t for<br />

periodical re-optimization, we would use suboptimal plans for a longer time and hence, we<br />

would miss optimization opportunities. As a result, over time, on-demand re-optimization<br />

yields optimization time improvements because it requires fewer re-optimization steps.<br />

Figures 6.14(d) and 6.14(f) show the measured execution times. The different execution<br />

times are caused by the changing workload characteristics in the sense <strong>of</strong> different input<br />

189

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!