Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
3.5 Experimental Evaluation<br />
Figure 3.25: <strong>Optimization</strong> Overhead <strong>of</strong> Join Enumeration<br />
rithm [Moe09] and (2) our join reordering heuristic with quadratic time complexity that<br />
we have described in Subsection 3.3.2. Before optimization, we randomly generated statistics<br />
for input cardinalities and join selectivities. The experiment was repeated ten times.<br />
Figure 3.25 illustrates the results <strong>of</strong> this experiment using a log-scaled y-axis. The optimization<br />
time <strong>of</strong> the full join enumeration increases exponentially, while for the heuristic<br />
re-optimization, the optimization time increases slightly super-linear. However, we observe<br />
acceptable absolute optimization time <strong>of</strong> exhaustive join enumeration until eight Join operators,<br />
where the clique query is the worst-case for the DPSize algorithm. This justifies<br />
our algorithm selection rule <strong>of</strong> using the full optimization algorithm until eight joins and<br />
to use the heuristic for larger numbers <strong>of</strong> joins. As a result, we can guarantee that (1)<br />
the time required by our optimization algorithm will not increase exponentially with the<br />
complexity <strong>of</strong> plans and (2) the algorithm will find the optimal plan in the presence <strong>of</strong><br />
small numbers <strong>of</strong> Join operators.<br />
Second, we analyzed the overhead <strong>of</strong> statistics monitoring and statistics aggregation.<br />
If statistics monitoring is enabled, each operator propagates at least three (|ds in |, |ds out |,<br />
and W (o i )) statistics to the Estimator. However, there are operators that propagate<br />
more statistics such as Switch path frequencies, number <strong>of</strong> iterations and the cardinality<br />
<strong>of</strong> multiple input and output data sets. Thus, the efficiency <strong>of</strong> statistics monitoring and<br />
workload aggregation is important in order to achieve moderate re-optimization overheads.<br />
Figure 3.26: Cumulative Statistic Maintenance Overhead<br />
We used the statistic trace from our first comparison scenario (see Figure 3.20), where<br />
we executed n = 100,000 plan instances <strong>of</strong> P 5 . We used three statistics (execution time,<br />
input and output cardinalities) from seven <strong>of</strong> nine operators <strong>of</strong> this plan that results in<br />
a test set <strong>of</strong> 2,100,000 statistic tuples. All sub experiments were repeated 1,000 times.<br />
Figure 3.26 illustrates the results for different aggregation strategies. Full aggregation<br />
refers to a single estimation using all statistics, which is applicable if the optimization<br />
81