Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
6.5 Experimental Evaluation<br />
Figure 6.17 illustrates the results using a log-scaled y-axis. The optimization time <strong>of</strong><br />
full join enumeration increases exponentially, while for both heuristic and directed reoptimization,<br />
the optimization time increases almost linearly (slightly super-linearly). In<br />
addition, directed re-optimization is even faster than the heuristic enumeration because<br />
we only reorder quantifiers <strong>of</strong> violated optimality conditions. Due to randomly generated<br />
statistics, on average, we take fewer operators into consideration, while still ensuring to<br />
find the global optimal solution over multiple re-optimization steps. As a result, with<br />
increasing plan complexity, the relative benefit <strong>of</strong> directed re-optimization increases.<br />
On-Demand Re-<strong>Optimization</strong> Overhead<br />
We additionally evaluated the overheads <strong>of</strong> statistics maintenance and <strong>of</strong> the PlanOptTree<br />
algorithms. Here, all experiments were repeated 100 times. It is important to note that<br />
both overheads were already included in the end-to-end comparison, where they have been<br />
amortized by the execution time improvements in several orders <strong>of</strong> magnitude.<br />
(a) Statistic Maintenance<br />
(b) Algorithm Overhead<br />
Figure 6.18: Overhead <strong>of</strong> PlanOptTrees<br />
Figure 6.18(a) illustrates the statistic maintenance overhead comparing our Estimator<br />
component (used for periodical re-optimization) versus our PlanOptTree (used for ondemand<br />
re-optimization), which both use statistics <strong>of</strong> all 100,000 plan instances <strong>of</strong> the<br />
simple-plan comparison scenario at the granularity <strong>of</strong> single operators (2,100,000 atomic<br />
statistic tuples). For both models, we used the exponential moving average as aggregation<br />
method. The costs include the transient maintenance <strong>of</strong> aggregates for all operators as<br />
well as the aggregation itself. Full Monitoring refers to periodical re-optimization, where<br />
all statistics <strong>of</strong> all operators are gathered by the Estimator. Min Monitoring refers to<br />
a hypothetical scenario, where we know the required statistics and maintain only these<br />
statistics with the Estimator. The relative improvement <strong>of</strong> Min Monitoring to Max Monitoring<br />
is the benefit we achieve by maintaining only relevant statistics, where the absolute<br />
benefit depends on the used workload aggregation method. In contrast, POT Monitoring<br />
refers to the use <strong>of</strong> our PlanOptTree. Although the PlanOptTree declines unnecessary<br />
statistics, it is slower than full monitoring because for each statistic tuple, we compute<br />
the hierarchy <strong>of</strong> complex statistics and evaluate optimality conditions. Therefore, we distinguish<br />
three variants <strong>of</strong> the POT monitoring: POT refers to the monitoring without<br />
condition evaluation, while POT2 (without the MEMO structure) and POT3 (with the MEMO<br />
structure) show the overhead <strong>of</strong> continuously evaluating the optimality conditions. It is<br />
important to note that the use <strong>of</strong> the MEMO structure is counterproductive in this scenario<br />
due to the rather simple optimality conditions and additional overhead for updating and<br />
195