25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong> <strong>Flows</strong><br />

<strong>of</strong> C(⋊⋉) = |R| + |R| · |S|, the group-by costs are given by C(γ) = |R| + |R| · |R|/2.<br />

Furthermore, the output cardinality in case <strong>of</strong> a single group-by attribute A i is defined<br />

as 1 ≤ |γR| ≤ |D Ai (R)|, while for an arbitrary number <strong>of</strong> group-by attributes it is<br />

1 ≤ |γR| ≤ ∏ |A|<br />

i=1 |D A i<br />

(R)|, where D Ai denotes the domain <strong>of</strong> an attribute A i . Further, we<br />

denote the group-by selectivity with f γR = |γR|/|R|. Then, the plan P a is optimal if the<br />

following four optimality conditions hold. First, the commutative join order is expressed<br />

with |R| ≤ |S|. Second, there is one optimality condition for each single join input (in<br />

order to determine if pre-aggregation is advantageous):<br />

C(γ(R ⋊⋉ S)) ≤ ( |R| + |R| 2 /2 ) + (f γR · |R| + f γR · |R| · |S|)<br />

+ ( f (γR),S · f γR · |R| · |S| + (f (γR),S · f γR · |R| · |S|) 2 /2 )<br />

with C(γ(R ⋊⋉ S)) = (|R| + |R| · |S|) + ( f R,S · |R| · |S| + (f R,S · |R| · |S|) 2 /2 ) ,<br />

(3.24)<br />

C(γ(R ⋊⋉ S)) ≤ ( |S| + |S| 2 /2 ) +(|R| + |R| · f γS · |S|)+ ( |R| + (f R,(γS) · f γS · |R| · |S|) 2 /2 ) ,<br />

(3.25)<br />

and one condition for all join inputs:<br />

C(γ(R ⋊⋉ S)) ≤ ( |R| + |R| 2 /2 ) + ( |S| + |S| 2 /2 ) + (f γR · |R| + f γR · |R| · f γS · |S|). (3.26)<br />

These conditions are necessary due to the characteristic <strong>of</strong> missing knowledge about data<br />

properties. For example, we do not know the multiplicities <strong>of</strong> join inputs that can be<br />

exploited for defining simpler optimality conditions in advance. The algorithm for realizing<br />

this technique is invoked for each Groupby operator. Then, we check by the use <strong>of</strong> the<br />

dependency graph if this operator can be reordered with predecessor Join operators, where<br />

for each join there are four optimality conditions. As a result, this algorithm exhibits a<br />

linear time complexity <strong>of</strong> O(m). We use an example to illustrate this concept.<br />

Example 3.14 (Eager Group-By). Recall our running example plan P 2 as shown in Figure<br />

3.17(a). Assume arbitrary join multiplicities and monitored statistics. <strong>Based</strong> on the<br />

given optimality condition, the plan has been rewritten to P 2 ′ as shown in Figure 3.17(b).<br />

Essentially, we observed that the full eager-group-by before the join causes lower costs<br />

than the join-group-by combination. Note that the Fork operator is taken into account by<br />

Assign (o1)<br />

[out: msg1]<br />

Fork (o-1)<br />

Assign (o1)<br />

[out: msg1]<br />

Fork (o-1)<br />

Invoke (o2)<br />

[service s4, in: msg1, out: msg2]<br />

Invoke (o3)<br />

[service s5, in: msg1, out: msg3]<br />

Invoke (o2)<br />

[service s4, in: msg1, out: msg2]<br />

Invoke (o3)<br />

[service s5, in: msg1, out: msg3]<br />

Join (o4)<br />

[in: msg2,msg3, out: msg4]<br />

Groupby (o5)<br />

[in: msg2, out: msg2]<br />

Groupby (o8)<br />

[in: msg3, out: msg3]<br />

Groupby (o5)<br />

[in: msg4, out: msg5]<br />

Assign (o6)<br />

[in: msg5, out: msg1]<br />

Invoke (o7)<br />

[service s5, in: msg1]<br />

Join (o4)<br />

[in: msg2,msg3, out: msg4]<br />

Assign (o6)<br />

[in: msg4, out: msg1]<br />

Invoke (o7)<br />

[service s5, in: msg1]<br />

(a) Plan P 2<br />

(b) Plan P ′ 2<br />

Figure 3.17: Example Eager Group-By Application<br />

70

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!