Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong> <strong>Flows</strong><br />
<strong>of</strong> C(⋊⋉) = |R| + |R| · |S|, the group-by costs are given by C(γ) = |R| + |R| · |R|/2.<br />
Furthermore, the output cardinality in case <strong>of</strong> a single group-by attribute A i is defined<br />
as 1 ≤ |γR| ≤ |D Ai (R)|, while for an arbitrary number <strong>of</strong> group-by attributes it is<br />
1 ≤ |γR| ≤ ∏ |A|<br />
i=1 |D A i<br />
(R)|, where D Ai denotes the domain <strong>of</strong> an attribute A i . Further, we<br />
denote the group-by selectivity with f γR = |γR|/|R|. Then, the plan P a is optimal if the<br />
following four optimality conditions hold. First, the commutative join order is expressed<br />
with |R| ≤ |S|. Second, there is one optimality condition for each single join input (in<br />
order to determine if pre-aggregation is advantageous):<br />
C(γ(R ⋊⋉ S)) ≤ ( |R| + |R| 2 /2 ) + (f γR · |R| + f γR · |R| · |S|)<br />
+ ( f (γR),S · f γR · |R| · |S| + (f (γR),S · f γR · |R| · |S|) 2 /2 )<br />
with C(γ(R ⋊⋉ S)) = (|R| + |R| · |S|) + ( f R,S · |R| · |S| + (f R,S · |R| · |S|) 2 /2 ) ,<br />
(3.24)<br />
C(γ(R ⋊⋉ S)) ≤ ( |S| + |S| 2 /2 ) +(|R| + |R| · f γS · |S|)+ ( |R| + (f R,(γS) · f γS · |R| · |S|) 2 /2 ) ,<br />
(3.25)<br />
and one condition for all join inputs:<br />
C(γ(R ⋊⋉ S)) ≤ ( |R| + |R| 2 /2 ) + ( |S| + |S| 2 /2 ) + (f γR · |R| + f γR · |R| · f γS · |S|). (3.26)<br />
These conditions are necessary due to the characteristic <strong>of</strong> missing knowledge about data<br />
properties. For example, we do not know the multiplicities <strong>of</strong> join inputs that can be<br />
exploited for defining simpler optimality conditions in advance. The algorithm for realizing<br />
this technique is invoked for each Groupby operator. Then, we check by the use <strong>of</strong> the<br />
dependency graph if this operator can be reordered with predecessor Join operators, where<br />
for each join there are four optimality conditions. As a result, this algorithm exhibits a<br />
linear time complexity <strong>of</strong> O(m). We use an example to illustrate this concept.<br />
Example 3.14 (Eager Group-By). Recall our running example plan P 2 as shown in Figure<br />
3.17(a). Assume arbitrary join multiplicities and monitored statistics. <strong>Based</strong> on the<br />
given optimality condition, the plan has been rewritten to P 2 ′ as shown in Figure 3.17(b).<br />
Essentially, we observed that the full eager-group-by before the join causes lower costs<br />
than the join-group-by combination. Note that the Fork operator is taken into account by<br />
Assign (o1)<br />
[out: msg1]<br />
Fork (o-1)<br />
Assign (o1)<br />
[out: msg1]<br />
Fork (o-1)<br />
Invoke (o2)<br />
[service s4, in: msg1, out: msg2]<br />
Invoke (o3)<br />
[service s5, in: msg1, out: msg3]<br />
Invoke (o2)<br />
[service s4, in: msg1, out: msg2]<br />
Invoke (o3)<br />
[service s5, in: msg1, out: msg3]<br />
Join (o4)<br />
[in: msg2,msg3, out: msg4]<br />
Groupby (o5)<br />
[in: msg2, out: msg2]<br />
Groupby (o8)<br />
[in: msg3, out: msg3]<br />
Groupby (o5)<br />
[in: msg4, out: msg5]<br />
Assign (o6)<br />
[in: msg5, out: msg1]<br />
Invoke (o7)<br />
[service s5, in: msg1]<br />
Join (o4)<br />
[in: msg2,msg3, out: msg4]<br />
Assign (o6)<br />
[in: msg4, out: msg1]<br />
Invoke (o7)<br />
[service s5, in: msg1]<br />
(a) Plan P 2<br />
(b) Plan P ′ 2<br />
Figure 3.17: Example Eager Group-By Application<br />
70