Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
6.5 Experimental Evaluation<br />
abstractions <strong>of</strong> statistics maintenance, cost estimation and re-optimization for enabling the<br />
use <strong>of</strong> both alternative optimization models. Most importantly, we extended the optimizer<br />
interface in the form <strong>of</strong> exchanging partial PlanOptTrees, which includes the modification<br />
<strong>of</strong> optimization techniques in order to enable directed re-optimization. Furthermore, we<br />
ran the experiments on the same platform as used within the rest <strong>of</strong> this thesis. With the<br />
aim <strong>of</strong> simulating arbitrary changing workload characteristics, we synthetically generated<br />
XML data sets with varying selectivities and cardinalities as input for our integration<br />
flows.<br />
Simple-Plan End-to-End Comparison<br />
In a first series <strong>of</strong> experiments, we compared periodical re-optimization with on-demand<br />
re-optimization (both asynchronous with inter-instance plan change). In order to conduct<br />
a fair evaluation, we used our fairly simple example plan P 5 because the benefit <strong>of</strong> ondemand<br />
re-optimization increases with increasing plan complexity. Essentially, we reused<br />
the end-to-end comparison experiment from Chapter 3, where we executed 100,000 instances<br />
for the non-optimized plan as well as for both optimization approaches, and we<br />
then measured re-optimization and plan execution times. The execution time already includes<br />
the synchronous statistic maintenance and the evaluation <strong>of</strong> optimality conditions<br />
in case <strong>of</strong> on-demand re-optimization. During execution, we varied the selectivities <strong>of</strong> the<br />
three selection operators (see Figure 6.14(a)) and the input cardinality (see Figure 6.14(b)).<br />
The input data was generated without correlations. With regard to re-optimization, there<br />
are four points (∗1, ∗2, ∗3, and ∗4) where a workload change (intersection points between<br />
selectivities) reasons the change <strong>of</strong> the optimal plan. For periodical re-optimization, we<br />
used a period <strong>of</strong> ∆t = 300 s, while for on-demand re-optimization, we used a minimum<br />
existence time <strong>of</strong> ∆t = 1 s and a lazy condition evaluation count <strong>of</strong> ten.<br />
Figure 6.14(c) shows the re-optimization times, while Figure 6.14(e) illustrates the cumulative<br />
optimization time. There, we used the elapsed scenario time as the x-axis in<br />
order to illustrate the characteristics <strong>of</strong> periodical re-optimization. We see that periodical<br />
re-optimization requires many unnecessary optimization steps (36 steps), while on-demand<br />
re-optimization is only triggered if a new plan will be found (6 steps). For workload shifts<br />
∗2 and ∗3, two on-demand re-optimizations were triggered due to the two intersection<br />
points <strong>of</strong> selectivities and the used exponential moving average, which caused a small<br />
statistics adaptation delay that led to exceeding the lazy count before converging to the<br />
final statistic measure. With a different parameterization, only one re-optimization was<br />
triggered for each workload shift. Despite directed re-optimization, a single optimization<br />
step is, on average, slightly slower than a full re-optimization step due to the small<br />
optimization search space <strong>of</strong> the applied optimization techniques (selection reordering<br />
and switch path reordering). The reason is that directed re-optimization requires some<br />
constant additional efforts and benefits only if several operators can be ignored during<br />
optimization. Further, the re-optimization time <strong>of</strong> this plan is dominated by the physical<br />
plan compilation and the waiting time for the next possible exchange <strong>of</strong> plans (due to<br />
the asynchronous optimization). As shown in Chapter 3, if we would use a larger ∆t for<br />
periodical re-optimization, we would use suboptimal plans for a longer time and hence, we<br />
would miss optimization opportunities. As a result, over time, on-demand re-optimization<br />
yields optimization time improvements because it requires fewer re-optimization steps.<br />
Figures 6.14(d) and 6.14(f) show the measured execution times. The different execution<br />
times are caused by the changing workload characteristics in the sense <strong>of</strong> different input<br />
189