25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong> <strong>Flows</strong><br />

plan with Orderby and Setoperation (UNION DISTINCT MERGE) operators. The<br />

sorted output order between the two Setoperation operators was exploited by the<br />

optimizer in order to reduce the number <strong>of</strong> required Orderby operators. Finally, the<br />

Setoperation and Orderby operators were parallelized by WC2.<br />

• P 7 : Similar to P 6 , this plan was also affected by WC1 in the sense <strong>of</strong> rescheduled<br />

subflows <strong>of</strong> the existing Fork operator. Furthermore, the techniques WD10 and WD8<br />

changed the Join operators o 14 , o 15 and o 16 from nested loop joins to subplans <strong>of</strong><br />

Orderby and merge join operators. Finally, note that join enumeration did not<br />

resulted in a new join order.<br />

• P 8 : This plan was mainly affected by control-flow oriented optimization techniques.<br />

In detail, the operator sequence (o 3 -o 9 ) was rewritten into two parallel subflows <strong>of</strong> a<br />

Fork operator. The last operator o 10 was not included because both o 9 and o 10 are<br />

two writing interactions to the same external system. Finally, the technique WC1<br />

was applied once again for rescheduling the created subflows.<br />

In addition to these consistent optimization benefits, Figure 3.22(b) shows the required<br />

cumulative optimization time. The significant differences between the optimization times<br />

<strong>of</strong> different plans are caused by two facts. First, the different total execution time influences<br />

the number <strong>of</strong> periodical re-optimization steps required in this scenario because these<br />

optimization steps are triggered periodically. Second, different techniques (with different<br />

time complexity) are applied according to the specific operator types used in the concrete<br />

plan. For example, plans P 4 and P 7 are dominated by the costs for join enumeration,<br />

where we did not found different join orders due to ensuring semantic correctness (P 4 )<br />

and the chain query type (P 7 ).<br />

Putting it all together, we can conclude that execution time reductions are possible,<br />

while only a fairly low overhead is required by periodical re-optimization.<br />

Scalability<br />

In addition to the presented comparison <strong>of</strong> optimized and unoptimized execution, scalability<br />

is one <strong>of</strong> the most important aspects. Hence, we conducted a series <strong>of</strong> experiments<br />

that examines the scalability with regard to increasing number <strong>of</strong> operators as well as with<br />

regard to increasing input data size.<br />

Figure 3.23: Speedup <strong>of</strong> Rewriting Sequences to Parallel <strong>Flows</strong><br />

78

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!