Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>of</strong> traditional data management systems and the included optimization techniques focus<br />
on execution time minimization only, we additionally introduce two novel cost-based optimization<br />
techniques that are tailor-made for integration flows, which both follow the<br />
optimization objective <strong>of</strong> throughput maximization. First, we explain the concept <strong>of</strong><br />
cost-based vectorization <strong>of</strong> integration flows in order to optimally leverage pipeline parallelism<br />
<strong>of</strong> plan operators and thus, increase the message throughput. Second, we discuss<br />
the concept <strong>of</strong> multi-flow optimization via horizontal message queue partitioning that increases<br />
throughput by executing operations on message partitions instead <strong>of</strong> on individual<br />
messages and thus, it reduces work <strong>of</strong> the integration platform such as the costs for querying<br />
external systems. Finally, the major drawbacks <strong>of</strong> periodical re-optimization are (1)<br />
many unnecessary re-optimization steps, where we find a new plan, only if workload characteristics<br />
have changed, and (2) adaptation delays after a workload change, where we<br />
use a suboptimal plan until re-optimization and miss optimization opportunities. Therefore,<br />
we refine the re-optimization approach from periodical re-optimization to on-demand<br />
re-optimization, where only necessary statistics are maintained and re-optimization is immediately<br />
triggered only if a new plan is certain to be found.<br />
The positive consequences <strong>of</strong> the cost-based optimization <strong>of</strong> integration flows are, in general,<br />
(1) the continuous adaptation to dynamically changing workload characteristics and<br />
(2) performance improvements in the sense <strong>of</strong> minimizing execution times and maximizing<br />
message throughput by exploiting the full optimization potential <strong>of</strong> rewriting decisions.<br />
In particular, the parameterless on-demand re-optimization achieves a fast but robust<br />
adaptation to changing workload characteristics with minimal overhead for incremental<br />
statistics maintenance and directed re-optimization. Finally, this cost-based optimization<br />
framework <strong>of</strong> integration flows can be used for investigating additional integration-flowspecific<br />
optimization techniques. Those optimizations are strongly needed in order to meet<br />
the continuously increasing performance requirements on integration platforms.<br />
iv