25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

6 On-Demand Re-<strong>Optimization</strong><br />

Join Enumeration Example<br />

Recall our join enumeration heuristic for the optimization technique WD10: Join Enumeration,<br />

presented in Subsection 3.3.3, where we assumed a left-deep join tree (R ⋊⋉ S) ⋊⋉ T<br />

<strong>of</strong> n = 3 data sets (without cross products and only one join implementation in the form<br />

<strong>of</strong> a nested loop join) with the following n! = 6 possible plans:<br />

P a (opt) : (R ⋊⋉ S) ⋊⋉ T P c : (R ⋊⋉ T ) ⋊⋉ S P e : (S ⋊⋉ T ) ⋊⋉ R<br />

P b : (S ⋊⋉ R) ⋊⋉ T P d : (T ⋊⋉ R) ⋊⋉ S P f : (T ⋊⋉ S) ⋊⋉ R.<br />

The costs <strong>of</strong> a (nested loop) join are computed by C(R ⋊⋉ S) = |R| + |R| · |S|. Further,<br />

the join output cardinality can be derived by |R ⋊⋉ S| = f R,S · |R| · |S| with a join filter<br />

selectivity <strong>of</strong> f R,S = |R ⋊⋉ S|/(|R| · |S|). Thus, the costs <strong>of</strong> the complete plan (R ⋊⋉ S) ⋊⋉ T<br />

are given by C((R ⋊⋉ S) ⋊⋉ T ) = |R| + |R| · |S| + f R,S · |R| · |S| + f R,S · |R| · |S| · |T |.<br />

Assuming variable selectivities and cardinalities, the optimality conditions for arbitrary<br />

left-deep join trees (see Figure 6.9(a)) are specified as follows. First, fix two base relations<br />

with the commutativity optimality condition <strong>of</strong> oc 1 : |R| ≤ |S|. Second, the optimality <strong>of</strong><br />

executing R ⋊⋉ S before ∗ ⋊⋉ T is given if the following optimality condition holds:<br />

oc 2 : |R| + |R| · |S| + f R,S · |R| · |S| + f R,S · |R| · |S| · |T |<br />

≤|R| + |R| · |T | + f R,T · |R| · |T | + f R,T · |R| · |T | · |S|.<br />

oc ′ 2 : |S| + f R,S · |S| + f R,S · |S| · |T | ≤ |T | + f R,T · |T | + f R,T · |T | · |S|,<br />

(6.5)<br />

where oc 2 has been algebraically simplified to oc ′ 2 by subtracting |R| and subsequently<br />

dividing by |R|. Note that it is possible to monitor all cardinalities |R|, |S|, and |T |<br />

but only the selectivities f R,S and f (R⋊⋉S),T . To estimate f R,T , we need to derive it with<br />

f R,T θ f (R⋊⋉S),T , where θ is a mapping function representing the correlation. If we assume<br />

statistical independence <strong>of</strong> selectivities, we can set f R,T = f 20 (R⋊⋉S),T .<br />

oc 2<br />

oc 3<br />

oc 4<br />

V<br />

W<br />

R S * T<br />

|R| |S| |R S| |T| |* T|<br />

T<br />

fR,S<br />

fR,T=f(R ⋈ S),T<br />

R S<br />

oc 1<br />

(a) Optimality Conditions<br />

≤<br />

(oc1)<br />

C1<br />

≤<br />

(oc’2)<br />

C2<br />

(b) Example POT<br />

(c) Complexity Analysis<br />

Figure 6.9: Example Join Enumeration<br />

The PlanOptTree for this example is illustrated in Figure 6.9(b). Here, the PlanOptTree<br />

contains the two mentioned optimality conditions, which use a hierarchy <strong>of</strong> atomic and<br />

complex statistics. All input cardinalities <strong>of</strong> base relations and the output cardinalities<br />

<strong>of</strong> join operators are used in the form <strong>of</strong> atomic statistic nodes. The join selectivities<br />

(complex statistic nodes) are computed from the atomic statistics. Both terms <strong>of</strong> the<br />

inequality oc ′ 2 are computed using atomic and complex statistics.<br />

20 Another approach would be to use serial histograms [Ioa93] or exact frequency matrices [Pol05].<br />

182

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!