25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.5 Experimental Evaluation<br />

Figure 6.17 illustrates the results using a log-scaled y-axis. The optimization time <strong>of</strong><br />

full join enumeration increases exponentially, while for both heuristic and directed reoptimization,<br />

the optimization time increases almost linearly (slightly super-linearly). In<br />

addition, directed re-optimization is even faster than the heuristic enumeration because<br />

we only reorder quantifiers <strong>of</strong> violated optimality conditions. Due to randomly generated<br />

statistics, on average, we take fewer operators into consideration, while still ensuring to<br />

find the global optimal solution over multiple re-optimization steps. As a result, with<br />

increasing plan complexity, the relative benefit <strong>of</strong> directed re-optimization increases.<br />

On-Demand Re-<strong>Optimization</strong> Overhead<br />

We additionally evaluated the overheads <strong>of</strong> statistics maintenance and <strong>of</strong> the PlanOptTree<br />

algorithms. Here, all experiments were repeated 100 times. It is important to note that<br />

both overheads were already included in the end-to-end comparison, where they have been<br />

amortized by the execution time improvements in several orders <strong>of</strong> magnitude.<br />

(a) Statistic Maintenance<br />

(b) Algorithm Overhead<br />

Figure 6.18: Overhead <strong>of</strong> PlanOptTrees<br />

Figure 6.18(a) illustrates the statistic maintenance overhead comparing our Estimator<br />

component (used for periodical re-optimization) versus our PlanOptTree (used for ondemand<br />

re-optimization), which both use statistics <strong>of</strong> all 100,000 plan instances <strong>of</strong> the<br />

simple-plan comparison scenario at the granularity <strong>of</strong> single operators (2,100,000 atomic<br />

statistic tuples). For both models, we used the exponential moving average as aggregation<br />

method. The costs include the transient maintenance <strong>of</strong> aggregates for all operators as<br />

well as the aggregation itself. Full Monitoring refers to periodical re-optimization, where<br />

all statistics <strong>of</strong> all operators are gathered by the Estimator. Min Monitoring refers to<br />

a hypothetical scenario, where we know the required statistics and maintain only these<br />

statistics with the Estimator. The relative improvement <strong>of</strong> Min Monitoring to Max Monitoring<br />

is the benefit we achieve by maintaining only relevant statistics, where the absolute<br />

benefit depends on the used workload aggregation method. In contrast, POT Monitoring<br />

refers to the use <strong>of</strong> our PlanOptTree. Although the PlanOptTree declines unnecessary<br />

statistics, it is slower than full monitoring because for each statistic tuple, we compute<br />

the hierarchy <strong>of</strong> complex statistics and evaluate optimality conditions. Therefore, we distinguish<br />

three variants <strong>of</strong> the POT monitoring: POT refers to the monitoring without<br />

condition evaluation, while POT2 (without the MEMO structure) and POT3 (with the MEMO<br />

structure) show the overhead <strong>of</strong> continuously evaluating the optimality conditions. It is<br />

important to note that the use <strong>of</strong> the MEMO structure is counterproductive in this scenario<br />

due to the rather simple optimality conditions and additional overhead for updating and<br />

195

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!