25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.5 Experimental Evaluation<br />

Figure 3.25: <strong>Optimization</strong> Overhead <strong>of</strong> Join Enumeration<br />

rithm [Moe09] and (2) our join reordering heuristic with quadratic time complexity that<br />

we have described in Subsection 3.3.2. Before optimization, we randomly generated statistics<br />

for input cardinalities and join selectivities. The experiment was repeated ten times.<br />

Figure 3.25 illustrates the results <strong>of</strong> this experiment using a log-scaled y-axis. The optimization<br />

time <strong>of</strong> the full join enumeration increases exponentially, while for the heuristic<br />

re-optimization, the optimization time increases slightly super-linear. However, we observe<br />

acceptable absolute optimization time <strong>of</strong> exhaustive join enumeration until eight Join operators,<br />

where the clique query is the worst-case for the DPSize algorithm. This justifies<br />

our algorithm selection rule <strong>of</strong> using the full optimization algorithm until eight joins and<br />

to use the heuristic for larger numbers <strong>of</strong> joins. As a result, we can guarantee that (1)<br />

the time required by our optimization algorithm will not increase exponentially with the<br />

complexity <strong>of</strong> plans and (2) the algorithm will find the optimal plan in the presence <strong>of</strong><br />

small numbers <strong>of</strong> Join operators.<br />

Second, we analyzed the overhead <strong>of</strong> statistics monitoring and statistics aggregation.<br />

If statistics monitoring is enabled, each operator propagates at least three (|ds in |, |ds out |,<br />

and W (o i )) statistics to the Estimator. However, there are operators that propagate<br />

more statistics such as Switch path frequencies, number <strong>of</strong> iterations and the cardinality<br />

<strong>of</strong> multiple input and output data sets. Thus, the efficiency <strong>of</strong> statistics monitoring and<br />

workload aggregation is important in order to achieve moderate re-optimization overheads.<br />

Figure 3.26: Cumulative Statistic Maintenance Overhead<br />

We used the statistic trace from our first comparison scenario (see Figure 3.20), where<br />

we executed n = 100,000 plan instances <strong>of</strong> P 5 . We used three statistics (execution time,<br />

input and output cardinalities) from seven <strong>of</strong> nine operators <strong>of</strong> this plan that results in<br />

a test set <strong>of</strong> 2,100,000 statistic tuples. All sub experiments were repeated 1,000 times.<br />

Figure 3.26 illustrates the results for different aggregation strategies. Full aggregation<br />

refers to a single estimation using all statistics, which is applicable if the optimization<br />

81

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!