Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong> <strong>Flows</strong><br />
Fork (o1)<br />
Fork (o1)<br />
Assign (o2)<br />
[out: msg1]<br />
Assign (o4)<br />
[out: msg3]<br />
Assign (o6)<br />
[out: msg5]<br />
Assign (o2)<br />
[out: msg1]<br />
Assign (o4)<br />
[out: msg3]<br />
Assign (o6)<br />
[out: msg5]<br />
Invoke (o3)<br />
[service s3, in: msg1, out: msg2]<br />
Invoke (o5)<br />
[service s4, in: msg3, out: msg4]<br />
Invoke (o7)<br />
[service s5, in: msg5, out: msg6]<br />
Invoke (o3)<br />
[service s3, in: msg1, out: msg2]<br />
Invoke (o5)<br />
[service s4, in: msg3, out: msg4]<br />
Invoke (o7)<br />
[service s5, in: msg5, out: msg6]<br />
Orderby (o12)<br />
[in: msg2, out: msg2]<br />
Orderby (o13)<br />
[in: msg4, out: msg4]<br />
Orderby (o16)<br />
[in: msg6, out: msg6]<br />
Setoperation (o8)<br />
[in: msg2,msg4, out: msg7]<br />
UNION DISTINCT<br />
Setoperation (o8)<br />
[in: msg2,msg4, out: msg7]<br />
UNION DISTINCT<br />
(Merge)<br />
Setoperation (o9)<br />
[in: msg7,msg6, out: msg8]<br />
UNION DISTINCT<br />
Setoperation (o9)<br />
[in: msg7,msg6, out: msg8]<br />
UNION DISTINCT<br />
(Merge)<br />
Assign (o10)<br />
[in: msg8, out: msg9]<br />
Assign (o10)<br />
[in: msg8, out: msg9]<br />
Invoke (o11)<br />
[service s6, in: msg9]<br />
Invoke (o11)<br />
[service s6, in: msg9]<br />
(a) Plan P 6<br />
(b) Plan P ′ 6<br />
Figure 3.18: Example Setoperation Type Selection<br />
the techniques orderby insertion and setoperation type selection, we created the rewritten<br />
plan P 6 ′ shown in Figure 3.18(b). Here, we use the efficient merge algorithm for both<br />
Setoperation operators and hence, require to sort all three input data sets. Sorting the<br />
result <strong>of</strong> the first Setoperation operator is not required because the output <strong>of</strong> the merge<br />
algorithm is already ordered. Consider two cases with different statistics for input and<br />
output cardinalities <strong>of</strong> the Setoperation o 8 . Figure 3.19 (left) shows the abstract costs <strong>of</strong><br />
the two possible subplans—P 6 : (o 8 ) versus P 6 ′ : (o′ 12 , o′ 13 , o′ 8 )—in both cases.<br />
Statistics C(o 8 ) C(o ′ 12 , o′ 13 , o′ 8 )<br />
|ds in1 (o 8 )| = 1,000<br />
case 1 |ds in2 (o 8 )| = 1,000, 501,000 21,932<br />
|ds out (o 8 )| = 1,000<br />
|ds in1 (o 8 )| = 1,000,<br />
case 2 |ds in2 (o 8 )| = 10, 6,000 11,009<br />
|ds out (o 8 )| = 1,000<br />
Figure 3.19: Example Setoperation <strong>Cost</strong> Comparison<br />
We observe that the optimality <strong>of</strong> these subplans depends on current workload characteristics,<br />
while the subplan (o ′ 12 , o′ 13 , o′ 8 ) is more robust6 over arbitrary statistic ranges than<br />
the subplan (o 8 ) as shown in Figure 3.19 (right).<br />
Finally, this optimization technique is also applicable for other operators such as Projection<br />
with duplicate elimination or for forcing a merge-based join algorithm (WD9). Thus,<br />
in general, the technique WD8 (orderby insertion) should be applied before selecting different<br />
physical types <strong>of</strong> an operator.<br />
To summarize, we presented selected control-flow- and data-flow-oriented optimization<br />
6 Robustness is an alternative optimization objective, which is beyond the scope <strong>of</strong> this thesis. However,<br />
in contrast to existing work [ABD + 10], we would identify these robust (insensitive to input statistics)<br />
plans by simply choosing one <strong>of</strong> the plans with lowest asymptotic time complexity with regard to their<br />
overall abstract cost functions <strong>of</strong> all plan operators.<br />
72