25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.2 Prerequisites for <strong>Cost</strong>-<strong>Based</strong> <strong>Optimization</strong><br />

• Comparability <strong>of</strong> Control- and Data-Flow-Oriented Operators: The double-metric<br />

cost model enables the comparison <strong>of</strong> data-flow and control-flow-oriented operators<br />

by their normalized execution time. In contrast to empirical cost models, the double<br />

metric cost model includes interaction- and control-flow-oriented operators as well.<br />

• Plan <strong>Cost</strong> Monotonicity: The Picasso project [RH05] has shown that the assumption<br />

<strong>of</strong> Plan <strong>Cost</strong> Monotonicity (PCM) holds for most queries in commercial DBMS even<br />

over all plans <strong>of</strong> a plan diagram [HDH07]. In contrast, this assumption always holds<br />

for our cost model <strong>of</strong> integration flows with regard to a single plan. As a result,<br />

the costs <strong>of</strong> a plan are monotonically non-decreasing with regard to any increasing<br />

influencing parameter such as selectivities, cardinalities, or execution times.<br />

• Asymmetric <strong>Cost</strong> Functions: The cost model exhibits asymmetric cost functions,<br />

i.e., the ordering <strong>of</strong> binary operator inputs has influence on the computed costs.<br />

Thus, commutativity <strong>of</strong> inputs must be considered during optimization.<br />

• No ASI Property: Finally, the cost model does not exhibit the ASI (Adjacent Sequence<br />

Interchange) property [Moe09]. This property is given if and only if there is<br />

a ranking function <strong>of</strong> data sets such that the ordering <strong>of</strong> ranks is the optimal join<br />

ordering. Due to (1) possibly correlated external data sets and (2) different join<br />

implementations including the merge-join (which is known to not having the ASI<br />

property [Moe09]) our cost model does not exhibit this property.<br />

<strong>Cost</strong> Estimation<br />

In the following, we illustrate the cost estimation, using the known data-flow-oriented<br />

optimization technique <strong>of</strong> eager group-by [CS94] (type invariant group-by), as an example<br />

rewriting technique.<br />

Example 3.2 (<strong>Cost</strong> Estimation). Assume the plan P 3 with monitored execution statistics<br />

and a plan P 3 ′ that has been created by rewriting P 3 during optimization (invariant groupby<br />

due to N:1-relationship between data sets, e.g., given by message schema descriptions).<br />

There are no statistics available for P 3 ′ . The plans are shown in Figure 3.4. The statistics<br />

<strong>of</strong> subplans that are equivalent in P 3 and P 3 ′ can be reused. Thus, only the new output<br />

cardinality |ds out (o ′ 5 )| and the execution times W (o′ 5 ) and W (o′ 4 ) have to be estimated.<br />

Assuming a monitored join selectivity (where f is a shorthand for filter selectivity) <strong>of</strong><br />

f dsout(o 2 ),ds out(o 3 ) = |ds out(o 2 ) ⋊⋉ ds out (o 3 )|<br />

|ds out (o 2 )| · |ds out (o 3 )| = |ds out (o 4 )|<br />

|ds out (o 2 )| · |ds out (o 3 )| = 5,000<br />

5,000,000 = 1<br />

1,000<br />

and a group-by selectivity <strong>of</strong><br />

f γdsout(o 4 ) = |γds out(o 4 )|<br />

|ds out (o 4 )|<br />

= |ds out(o 5 )|<br />

|ds out (o 4 )| = 1,000<br />

5,000 = 1 5 .<br />

The selectivities are used in order to compute the output cardinalities <strong>of</strong> new or reordered<br />

operators. Due to the invariant placement <strong>of</strong> the group-by operator, we can compute the<br />

new group-by output cardinality and directly set the new join output cardinality by<br />

| ˆds out (o ′ 5)| = f γdsout(o4 ) · |ds out (o 2 )| = 1 · 5,000 = 1,000<br />

5<br />

| ˆds out (o ′ 4)| = 1,000.<br />

43

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!