25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>of</strong> traditional data management systems and the included optimization techniques focus<br />

on execution time minimization only, we additionally introduce two novel cost-based optimization<br />

techniques that are tailor-made for integration flows, which both follow the<br />

optimization objective <strong>of</strong> throughput maximization. First, we explain the concept <strong>of</strong><br />

cost-based vectorization <strong>of</strong> integration flows in order to optimally leverage pipeline parallelism<br />

<strong>of</strong> plan operators and thus, increase the message throughput. Second, we discuss<br />

the concept <strong>of</strong> multi-flow optimization via horizontal message queue partitioning that increases<br />

throughput by executing operations on message partitions instead <strong>of</strong> on individual<br />

messages and thus, it reduces work <strong>of</strong> the integration platform such as the costs for querying<br />

external systems. Finally, the major drawbacks <strong>of</strong> periodical re-optimization are (1)<br />

many unnecessary re-optimization steps, where we find a new plan, only if workload characteristics<br />

have changed, and (2) adaptation delays after a workload change, where we<br />

use a suboptimal plan until re-optimization and miss optimization opportunities. Therefore,<br />

we refine the re-optimization approach from periodical re-optimization to on-demand<br />

re-optimization, where only necessary statistics are maintained and re-optimization is immediately<br />

triggered only if a new plan is certain to be found.<br />

The positive consequences <strong>of</strong> the cost-based optimization <strong>of</strong> integration flows are, in general,<br />

(1) the continuous adaptation to dynamically changing workload characteristics and<br />

(2) performance improvements in the sense <strong>of</strong> minimizing execution times and maximizing<br />

message throughput by exploiting the full optimization potential <strong>of</strong> rewriting decisions.<br />

In particular, the parameterless on-demand re-optimization achieves a fast but robust<br />

adaptation to changing workload characteristics with minimal overhead for incremental<br />

statistics maintenance and directed re-optimization. Finally, this cost-based optimization<br />

framework <strong>of</strong> integration flows can be used for investigating additional integration-flowspecific<br />

optimization techniques. Those optimizations are strongly needed in order to meet<br />

the continuously increasing performance requirements on integration platforms.<br />

iv

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!