25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

6.4 <strong>Optimization</strong> Techniques<br />

Finally, Figure 6.9(c) compares the number <strong>of</strong> alternative plans <strong>of</strong> the full search space<br />

with the number <strong>of</strong> required optimality conditions, using a log-scaled y-axis. The search<br />

space reduction (see Theorem 6.1 for the worst case consideration) is achieved by transforming<br />

the problem <strong>of</strong> enumerating all possible plans into binary optimality conditions,<br />

where costs in front <strong>of</strong> and after the considered subplan are equal. Note that the assumption<br />

<strong>of</strong> transitivity does not necessarily require that the used cost model has the adjacent<br />

sequence interchange (ASI) property [Moe09]. Figure 6.9(a) shows how this heuristic join<br />

enumeration approach works for arbitrarily large left-deep join trees <strong>of</strong> n input data sets.<br />

For this join tree type, the PlanOptTree has n − 1 optimality decisions (shown as arrows).<br />

Due to the restriction <strong>of</strong> checking only for plan optimality and due to arbitrary complex<br />

optimality conditions, this concept <strong>of</strong> triggering re-optimization leads to the globally<br />

optimal solution. In addition, the PlanOptTree allows for directed re-optimization. With<br />

regard to an equivalent optimization result compared to full join enumeration (e.g., with<br />

DPSize), n(n + 1)/2 optimality conditions are required and only a single reordering <strong>of</strong><br />

two join operators is applied during one re-optimization step and multiple <strong>of</strong> these steps<br />

are required to find the global optimum. However, in case <strong>of</strong> heuristic join enumeration,<br />

directed re-optimization can be used for all operators in one single re-optimization step<br />

but we might not find the global optimum.<br />

Eager Group-By Example<br />

Similarly to the join enumeration example, assume a join <strong>of</strong> n data sets (with arbitrary<br />

multiplicities) and a subsequent group-by, where the join predicate and group-by attributes<br />

are equal with γ F (X);A1 (R ⋊⋉ R.A1 =S.A 1<br />

S). With regard to the optimization technique<br />

WD6: Early Group-by Application, there are 4n! (for n ≥ 2) possible plans. Without loss<br />

<strong>of</strong> generality, we assume n = 2 and concentrate on the 4n! = 8 possibilities to arrange<br />

group-by and join (for an invariant group-by the final γ in P c -P f can be omitted):<br />

P a (opt) : γ(R ⋊⋉ S) P c : γ((γR) ⋊⋉ S) P e : γ(R ⋊⋉ (γS)) P g : (γR) ⋊⋉ (γS)<br />

P b : γ(S ⋊⋉ R) P d : γ(S ⋊⋉ (γR)) P f : γ((γS) ⋊⋉ R) P h : (γS) ⋊⋉ (γR).<br />

In addition to the join costs, the group-by costs are given by C(γ) = |R| + |R| · |R|/2.<br />

Furthermore, the output cardinality in case <strong>of</strong> a single group-by attribute A i , with a<br />

domain D Ai , is defined as 1 ≤ |γR| ≤ |D Ai (R)|, while for an arbitrary number <strong>of</strong> group-by<br />

attributes it is 1 ≤ |γR| ≤ ∏ |A|<br />

i=1 |D A i<br />

(R)|. Further, let us denote the group-by selectivity<br />

with f γR = |γR|/|R|. The optimal plan P a can then be represented with four optimality<br />

conditions. First, the join order is expressed with oc 1 : |R| ≤ |S|. Second, we use one<br />

optimality condition for each single join input (oc 2 and oc 3 , where we illustrate oc 2 as an<br />

example) and one condition for all join inputs (oc 4 ):<br />

oc 2 : C(γ(R ⋊⋉ S)) ≤ ( |R| + |R| 2 /2 ) + (f γR · |R| + f γR · |R| · |S|)<br />

+ ( f (γR),S · f γR · |R| · |S| + (f (γR),S · f γR · |R| · |S|) 2 /2 )<br />

oc 4 : C(γ(R ⋊⋉ S)) ≤ ( |R| + |R| 2 /2 ) + ( |S| + |S| 2 /2 )<br />

+ (f γR · |R| + f γR · |R| · f γS · |S|)<br />

with C(γ(R ⋊⋉ S)) = (|R| + |R| · |S|) + ( f R,S · |R| · |S| + (f R,S · |R| · |S|) 2 /2 ) .<br />

(6.6)<br />

Similar to the join example, we compute the join selectivity by f (γR),S θ f R,S and<br />

assume independence with f (γR),S = f R,S . The group-by selectivity is computed by<br />

183

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!