Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3.3 Periodical Re-<strong>Optimization</strong><br />
Join (o14)<br />
[in: msg1,msg9, out: msg13]<br />
Join (o15)<br />
[in: msg13,msg12, out: msg14]<br />
Join (o16)<br />
[in: msg14,msg4, out: msg15]<br />
Join (o17)<br />
[in: msg15,msg7, out: msg16]<br />
Assign (o18)<br />
[in: msg16, out: msg17]<br />
Invoke (o19)<br />
[service s6, in: msg17]<br />
INNER<br />
INNER<br />
INNER<br />
INNER<br />
Join (o14)<br />
[in: msg1,msg9, out: msg13]<br />
Join (o15)<br />
[in: msg13,msg12, out: msg14]<br />
Join (o16)<br />
[in: msg14,msg4, out: msg15]<br />
Invoke (o19)<br />
[service s6, in: msg15]<br />
Join (o17)<br />
[in: msg15,msg7, out: msg16]<br />
INNER<br />
INNER<br />
INNER<br />
INNER<br />
(a) Plan P 7 (full reordering possible)<br />
(b) Plan P ′ 7 (partial reordering possible)<br />
Figure 3.9: Join Enumeration Example Plans<br />
including the operators o 14 , o 15 and o 16 (independently <strong>of</strong> the operator o 17 ) is possible.<br />
In contrast, for join enumeration in DBMS, the temporal order <strong>of</strong> table accesses does not<br />
matter when considering only the final query result because all joins can be considered by<br />
simply evaluating the connectedness <strong>of</strong> quantifiers (data sets).<br />
In order to take into account the described join enumeration restrictions as well as the<br />
control-flow semantics <strong>of</strong> an integration flow, we introduce a tailor-made, transformationbased<br />
join enumeration heuristic. For the sake <strong>of</strong> clarity, we require some notation before<br />
discussing the join enumeration heuristic. For our heuristic join reordering, we do only<br />
consider (1) left-deep join trees (no composite inners [OL90] in the sense <strong>of</strong> bushy trees),<br />
(2) without cross-products, and (3) only one join implementation (nested loop join). Note<br />
that after join re-ordering, we still decide between different join operator implementations.<br />
Using these assumptions in combination with our asymmetric cost functions, there exist<br />
n! alternative plans for joining n data sets. For example, assume a left-deep join tree<br />
(R ⋊⋉ S) ⋊⋉ T (n = 3) with the following n! = 6 possible plans:<br />
P a (opt) : (R ⋊⋉ S) ⋊⋉ T P c : (R ⋊⋉ T ) ⋊⋉ S P e : (S ⋊⋉ T ) ⋊⋉ R<br />
P b : (S ⋊⋉ R) ⋊⋉ T P d : (T ⋊⋉ R) ⋊⋉ S P f : (T ⋊⋉ S) ⋊⋉ R.<br />
The join selectivity f R,S (filter selectivity) <strong>of</strong> R ⋊⋉ S is given by<br />
f R,S =<br />
|R ⋊⋉ S|<br />
|R| · |S| with f R,S ∈ [0, 1] (3.6)<br />
and the costs <strong>of</strong> the nested loop join are computed by C(R ⋊⋉ S) = |R|+|R|·|S| (asymmetric,<br />
in order to take into account commutativity <strong>of</strong> join inputs). Further, the join output<br />
cardinality can be derived with |R ⋊⋉ S| = f R,S · |R| · |S|. Thus, the costs <strong>of</strong> the complete<br />
plan (R ⋊⋉ S) ⋊⋉ T are given by<br />
C((R ⋊⋉ S) ⋊⋉ T ) = |R| + |R| · |S| + f R,S · |R| · |S| + f R,S · |R| · |S| · |T |. (3.7)<br />
The core idea <strong>of</strong> our heuristic join reordering is to transform the full join enumeration<br />
into binary re-ordering decisions between subsequent join operators. This is possible because<br />
we restricted ourself to left-deep-join trees and nested loop joins only. We then can<br />
observe that the costs before and after a binary reordering decision are independent <strong>of</strong><br />
53