Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
6 On-Demand Re-<strong>Optimization</strong><br />
Join Enumeration Example<br />
Recall our join enumeration heuristic for the optimization technique WD10: Join Enumeration,<br />
presented in Subsection 3.3.3, where we assumed a left-deep join tree (R ⋊⋉ S) ⋊⋉ T<br />
<strong>of</strong> n = 3 data sets (without cross products and only one join implementation in the form<br />
<strong>of</strong> a nested loop join) with the following n! = 6 possible plans:<br />
P a (opt) : (R ⋊⋉ S) ⋊⋉ T P c : (R ⋊⋉ T ) ⋊⋉ S P e : (S ⋊⋉ T ) ⋊⋉ R<br />
P b : (S ⋊⋉ R) ⋊⋉ T P d : (T ⋊⋉ R) ⋊⋉ S P f : (T ⋊⋉ S) ⋊⋉ R.<br />
The costs <strong>of</strong> a (nested loop) join are computed by C(R ⋊⋉ S) = |R| + |R| · |S|. Further,<br />
the join output cardinality can be derived by |R ⋊⋉ S| = f R,S · |R| · |S| with a join filter<br />
selectivity <strong>of</strong> f R,S = |R ⋊⋉ S|/(|R| · |S|). Thus, the costs <strong>of</strong> the complete plan (R ⋊⋉ S) ⋊⋉ T<br />
are given by C((R ⋊⋉ S) ⋊⋉ T ) = |R| + |R| · |S| + f R,S · |R| · |S| + f R,S · |R| · |S| · |T |.<br />
Assuming variable selectivities and cardinalities, the optimality conditions for arbitrary<br />
left-deep join trees (see Figure 6.9(a)) are specified as follows. First, fix two base relations<br />
with the commutativity optimality condition <strong>of</strong> oc 1 : |R| ≤ |S|. Second, the optimality <strong>of</strong><br />
executing R ⋊⋉ S before ∗ ⋊⋉ T is given if the following optimality condition holds:<br />
oc 2 : |R| + |R| · |S| + f R,S · |R| · |S| + f R,S · |R| · |S| · |T |<br />
≤|R| + |R| · |T | + f R,T · |R| · |T | + f R,T · |R| · |T | · |S|.<br />
oc ′ 2 : |S| + f R,S · |S| + f R,S · |S| · |T | ≤ |T | + f R,T · |T | + f R,T · |T | · |S|,<br />
(6.5)<br />
where oc 2 has been algebraically simplified to oc ′ 2 by subtracting |R| and subsequently<br />
dividing by |R|. Note that it is possible to monitor all cardinalities |R|, |S|, and |T |<br />
but only the selectivities f R,S and f (R⋊⋉S),T . To estimate f R,T , we need to derive it with<br />
f R,T θ f (R⋊⋉S),T , where θ is a mapping function representing the correlation. If we assume<br />
statistical independence <strong>of</strong> selectivities, we can set f R,T = f 20 (R⋊⋉S),T .<br />
oc 2<br />
oc 3<br />
oc 4<br />
V<br />
W<br />
R S * T<br />
|R| |S| |R S| |T| |* T|<br />
T<br />
fR,S<br />
fR,T=f(R ⋈ S),T<br />
R S<br />
oc 1<br />
(a) Optimality Conditions<br />
≤<br />
(oc1)<br />
C1<br />
≤<br />
(oc’2)<br />
C2<br />
(b) Example POT<br />
(c) Complexity Analysis<br />
Figure 6.9: Example Join Enumeration<br />
The PlanOptTree for this example is illustrated in Figure 6.9(b). Here, the PlanOptTree<br />
contains the two mentioned optimality conditions, which use a hierarchy <strong>of</strong> atomic and<br />
complex statistics. All input cardinalities <strong>of</strong> base relations and the output cardinalities<br />
<strong>of</strong> join operators are used in the form <strong>of</strong> atomic statistic nodes. The join selectivities<br />
(complex statistic nodes) are computed from the atomic statistics. Both terms <strong>of</strong> the<br />
inequality oc ′ 2 are computed using atomic and complex statistics.<br />
20 Another approach would be to use serial histograms [Ioa93] or exact frequency matrices [Pol05].<br />
182