Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
6.4 <strong>Optimization</strong> Techniques<br />
Finally, Figure 6.9(c) compares the number <strong>of</strong> alternative plans <strong>of</strong> the full search space<br />
with the number <strong>of</strong> required optimality conditions, using a log-scaled y-axis. The search<br />
space reduction (see Theorem 6.1 for the worst case consideration) is achieved by transforming<br />
the problem <strong>of</strong> enumerating all possible plans into binary optimality conditions,<br />
where costs in front <strong>of</strong> and after the considered subplan are equal. Note that the assumption<br />
<strong>of</strong> transitivity does not necessarily require that the used cost model has the adjacent<br />
sequence interchange (ASI) property [Moe09]. Figure 6.9(a) shows how this heuristic join<br />
enumeration approach works for arbitrarily large left-deep join trees <strong>of</strong> n input data sets.<br />
For this join tree type, the PlanOptTree has n − 1 optimality decisions (shown as arrows).<br />
Due to the restriction <strong>of</strong> checking only for plan optimality and due to arbitrary complex<br />
optimality conditions, this concept <strong>of</strong> triggering re-optimization leads to the globally<br />
optimal solution. In addition, the PlanOptTree allows for directed re-optimization. With<br />
regard to an equivalent optimization result compared to full join enumeration (e.g., with<br />
DPSize), n(n + 1)/2 optimality conditions are required and only a single reordering <strong>of</strong><br />
two join operators is applied during one re-optimization step and multiple <strong>of</strong> these steps<br />
are required to find the global optimum. However, in case <strong>of</strong> heuristic join enumeration,<br />
directed re-optimization can be used for all operators in one single re-optimization step<br />
but we might not find the global optimum.<br />
Eager Group-By Example<br />
Similarly to the join enumeration example, assume a join <strong>of</strong> n data sets (with arbitrary<br />
multiplicities) and a subsequent group-by, where the join predicate and group-by attributes<br />
are equal with γ F (X);A1 (R ⋊⋉ R.A1 =S.A 1<br />
S). With regard to the optimization technique<br />
WD6: Early Group-by Application, there are 4n! (for n ≥ 2) possible plans. Without loss<br />
<strong>of</strong> generality, we assume n = 2 and concentrate on the 4n! = 8 possibilities to arrange<br />
group-by and join (for an invariant group-by the final γ in P c -P f can be omitted):<br />
P a (opt) : γ(R ⋊⋉ S) P c : γ((γR) ⋊⋉ S) P e : γ(R ⋊⋉ (γS)) P g : (γR) ⋊⋉ (γS)<br />
P b : γ(S ⋊⋉ R) P d : γ(S ⋊⋉ (γR)) P f : γ((γS) ⋊⋉ R) P h : (γS) ⋊⋉ (γR).<br />
In addition to the join costs, the group-by costs are given by C(γ) = |R| + |R| · |R|/2.<br />
Furthermore, the output cardinality in case <strong>of</strong> a single group-by attribute A i , with a<br />
domain D Ai , is defined as 1 ≤ |γR| ≤ |D Ai (R)|, while for an arbitrary number <strong>of</strong> group-by<br />
attributes it is 1 ≤ |γR| ≤ ∏ |A|<br />
i=1 |D A i<br />
(R)|. Further, let us denote the group-by selectivity<br />
with f γR = |γR|/|R|. The optimal plan P a can then be represented with four optimality<br />
conditions. First, the join order is expressed with oc 1 : |R| ≤ |S|. Second, we use one<br />
optimality condition for each single join input (oc 2 and oc 3 , where we illustrate oc 2 as an<br />
example) and one condition for all join inputs (oc 4 ):<br />
oc 2 : C(γ(R ⋊⋉ S)) ≤ ( |R| + |R| 2 /2 ) + (f γR · |R| + f γR · |R| · |S|)<br />
+ ( f (γR),S · f γR · |R| · |S| + (f (γR),S · f γR · |R| · |S|) 2 /2 )<br />
oc 4 : C(γ(R ⋊⋉ S)) ≤ ( |R| + |R| 2 /2 ) + ( |S| + |S| 2 /2 )<br />
+ (f γR · |R| + f γR · |R| · f γS · |S|)<br />
with C(γ(R ⋊⋉ S)) = (|R| + |R| · |S|) + ( f R,S · |R| · |S| + (f R,S · |R| · |S|) 2 /2 ) .<br />
(6.6)<br />
Similar to the join example, we compute the join selectivity by f (γR),S θ f R,S and<br />
assume independence with f (γR),S = f R,S . The group-by selectivity is computed by<br />
183