25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.3 Periodical Re-<strong>Optimization</strong><br />

Join (o14)<br />

[in: msg1,msg9, out: msg13]<br />

Join (o15)<br />

[in: msg13,msg12, out: msg14]<br />

Join (o16)<br />

[in: msg14,msg4, out: msg15]<br />

Join (o17)<br />

[in: msg15,msg7, out: msg16]<br />

Assign (o18)<br />

[in: msg16, out: msg17]<br />

Invoke (o19)<br />

[service s6, in: msg17]<br />

INNER<br />

INNER<br />

INNER<br />

INNER<br />

Join (o14)<br />

[in: msg1,msg9, out: msg13]<br />

Join (o15)<br />

[in: msg13,msg12, out: msg14]<br />

Join (o16)<br />

[in: msg14,msg4, out: msg15]<br />

Invoke (o19)<br />

[service s6, in: msg15]<br />

Join (o17)<br />

[in: msg15,msg7, out: msg16]<br />

INNER<br />

INNER<br />

INNER<br />

INNER<br />

(a) Plan P 7 (full reordering possible)<br />

(b) Plan P ′ 7 (partial reordering possible)<br />

Figure 3.9: Join Enumeration Example Plans<br />

including the operators o 14 , o 15 and o 16 (independently <strong>of</strong> the operator o 17 ) is possible.<br />

In contrast, for join enumeration in DBMS, the temporal order <strong>of</strong> table accesses does not<br />

matter when considering only the final query result because all joins can be considered by<br />

simply evaluating the connectedness <strong>of</strong> quantifiers (data sets).<br />

In order to take into account the described join enumeration restrictions as well as the<br />

control-flow semantics <strong>of</strong> an integration flow, we introduce a tailor-made, transformationbased<br />

join enumeration heuristic. For the sake <strong>of</strong> clarity, we require some notation before<br />

discussing the join enumeration heuristic. For our heuristic join reordering, we do only<br />

consider (1) left-deep join trees (no composite inners [OL90] in the sense <strong>of</strong> bushy trees),<br />

(2) without cross-products, and (3) only one join implementation (nested loop join). Note<br />

that after join re-ordering, we still decide between different join operator implementations.<br />

Using these assumptions in combination with our asymmetric cost functions, there exist<br />

n! alternative plans for joining n data sets. For example, assume a left-deep join tree<br />

(R ⋊⋉ S) ⋊⋉ T (n = 3) with the following n! = 6 possible plans:<br />

P a (opt) : (R ⋊⋉ S) ⋊⋉ T P c : (R ⋊⋉ T ) ⋊⋉ S P e : (S ⋊⋉ T ) ⋊⋉ R<br />

P b : (S ⋊⋉ R) ⋊⋉ T P d : (T ⋊⋉ R) ⋊⋉ S P f : (T ⋊⋉ S) ⋊⋉ R.<br />

The join selectivity f R,S (filter selectivity) <strong>of</strong> R ⋊⋉ S is given by<br />

f R,S =<br />

|R ⋊⋉ S|<br />

|R| · |S| with f R,S ∈ [0, 1] (3.6)<br />

and the costs <strong>of</strong> the nested loop join are computed by C(R ⋊⋉ S) = |R|+|R|·|S| (asymmetric,<br />

in order to take into account commutativity <strong>of</strong> join inputs). Further, the join output<br />

cardinality can be derived with |R ⋊⋉ S| = f R,S · |R| · |S|. Thus, the costs <strong>of</strong> the complete<br />

plan (R ⋊⋉ S) ⋊⋉ T are given by<br />

C((R ⋊⋉ S) ⋊⋉ T ) = |R| + |R| · |S| + f R,S · |R| · |S| + f R,S · |R| · |S| · |T |. (3.7)<br />

The core idea <strong>of</strong> our heuristic join reordering is to transform the full join enumeration<br />

into binary re-ordering decisions between subsequent join operators. This is possible because<br />

we restricted ourself to left-deep-join trees and nested loop joins only. We then can<br />

observe that the costs before and after a binary reordering decision are independent <strong>of</strong><br />

53

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!