Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
3.2 Prerequisites for <strong>Cost</strong>-<strong>Based</strong> <strong>Optimization</strong><br />
• Comparability <strong>of</strong> Control- and Data-Flow-Oriented Operators: The double-metric<br />
cost model enables the comparison <strong>of</strong> data-flow and control-flow-oriented operators<br />
by their normalized execution time. In contrast to empirical cost models, the double<br />
metric cost model includes interaction- and control-flow-oriented operators as well.<br />
• Plan <strong>Cost</strong> Monotonicity: The Picasso project [RH05] has shown that the assumption<br />
<strong>of</strong> Plan <strong>Cost</strong> Monotonicity (PCM) holds for most queries in commercial DBMS even<br />
over all plans <strong>of</strong> a plan diagram [HDH07]. In contrast, this assumption always holds<br />
for our cost model <strong>of</strong> integration flows with regard to a single plan. As a result,<br />
the costs <strong>of</strong> a plan are monotonically non-decreasing with regard to any increasing<br />
influencing parameter such as selectivities, cardinalities, or execution times.<br />
• Asymmetric <strong>Cost</strong> Functions: The cost model exhibits asymmetric cost functions,<br />
i.e., the ordering <strong>of</strong> binary operator inputs has influence on the computed costs.<br />
Thus, commutativity <strong>of</strong> inputs must be considered during optimization.<br />
• No ASI Property: Finally, the cost model does not exhibit the ASI (Adjacent Sequence<br />
Interchange) property [Moe09]. This property is given if and only if there is<br />
a ranking function <strong>of</strong> data sets such that the ordering <strong>of</strong> ranks is the optimal join<br />
ordering. Due to (1) possibly correlated external data sets and (2) different join<br />
implementations including the merge-join (which is known to not having the ASI<br />
property [Moe09]) our cost model does not exhibit this property.<br />
<strong>Cost</strong> Estimation<br />
In the following, we illustrate the cost estimation, using the known data-flow-oriented<br />
optimization technique <strong>of</strong> eager group-by [CS94] (type invariant group-by), as an example<br />
rewriting technique.<br />
Example 3.2 (<strong>Cost</strong> Estimation). Assume the plan P 3 with monitored execution statistics<br />
and a plan P 3 ′ that has been created by rewriting P 3 during optimization (invariant groupby<br />
due to N:1-relationship between data sets, e.g., given by message schema descriptions).<br />
There are no statistics available for P 3 ′ . The plans are shown in Figure 3.4. The statistics<br />
<strong>of</strong> subplans that are equivalent in P 3 and P 3 ′ can be reused. Thus, only the new output<br />
cardinality |ds out (o ′ 5 )| and the execution times W (o′ 5 ) and W (o′ 4 ) have to be estimated.<br />
Assuming a monitored join selectivity (where f is a shorthand for filter selectivity) <strong>of</strong><br />
f dsout(o 2 ),ds out(o 3 ) = |ds out(o 2 ) ⋊⋉ ds out (o 3 )|<br />
|ds out (o 2 )| · |ds out (o 3 )| = |ds out (o 4 )|<br />
|ds out (o 2 )| · |ds out (o 3 )| = 5,000<br />
5,000,000 = 1<br />
1,000<br />
and a group-by selectivity <strong>of</strong><br />
f γdsout(o 4 ) = |γds out(o 4 )|<br />
|ds out (o 4 )|<br />
= |ds out(o 5 )|<br />
|ds out (o 4 )| = 1,000<br />
5,000 = 1 5 .<br />
The selectivities are used in order to compute the output cardinalities <strong>of</strong> new or reordered<br />
operators. Due to the invariant placement <strong>of</strong> the group-by operator, we can compute the<br />
new group-by output cardinality and directly set the new join output cardinality by<br />
| ˆds out (o ′ 5)| = f γdsout(o4 ) · |ds out (o 2 )| = 1 · 5,000 = 1,000<br />
5<br />
| ˆds out (o ′ 4)| = 1,000.<br />
43