Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong> <strong>Flows</strong><br />
our transformation-based optimization algorithm, approaches for search space reduction<br />
and adjusting the sensibility <strong>of</strong> workload adaptation as well as a lightweight concept for<br />
handling conditional probabilities and correlation. Further, we explained selected optimization<br />
techniques that are specific to integration flows because they exploit both the<br />
data flow and the control flow in a combined manner. Our evaluation shows significant<br />
performance improvements with moderate overhead for periodical re-optimization.<br />
In conclusion, our cost-based optimization approach can be integrated seamlessly into<br />
the major products in the area <strong>of</strong> integration platforms. <strong>Based</strong> on the observation <strong>of</strong><br />
many independent instances <strong>of</strong> integration flows, this approach <strong>of</strong> periodical cost-based<br />
re-optimization is tailor-made for integration flows. In detail, the advantages <strong>of</strong> periodical<br />
re-optimization are (1) the asynchronous optimization independently <strong>of</strong> executing certain<br />
instances, (2) the fact that all subsequent instances rather than only the current query<br />
benefit from re-optimization, and (3) the inter-instance plan change without the need<br />
<strong>of</strong> state migration. This general optimization framework can be used as foundation for<br />
further rewriting techniques and optimization approaches.<br />
Apart from these re-optimization advantages, the optimization framework presented so<br />
far has still several shortcomings. First, only the optimization objective <strong>of</strong> minimizing<br />
the average plan execution time was considered. This is not always a suitable optimization<br />
objective because in high load scenarios <strong>of</strong>ten the major optimization objective is<br />
throughput maximization, while moderate latency times are acceptable. Therefore, in the<br />
following, we will present two integration-flow-specific optimization techniques that have<br />
the potential to significantly increase the message throughput. In detail, we present the<br />
cost-based vectorization (a control-flow-oriented optimization technique) in Chapter 4 and<br />
the multi-flow optimization (a data-flow-oriented optimization technique) in Chapter 5.<br />
Second, also the periodical re-optimization algorithm itself has several drawbacks. This<br />
includes the generic gathering <strong>of</strong> statistics for all operators that causes the maintenance <strong>of</strong><br />
statistics that might not be used by the optimizer. While for the evaluated workload aggregation<br />
methods, this overhead was negligible, there might be performance issues when<br />
using more complex forecast models. In addition, there is the problem <strong>of</strong> periodically<br />
triggered re-optimization, where a new plan is only found if workload characteristics have<br />
changed. Otherwise, we trigger many unnecessary invocations <strong>of</strong> the optimizer that evaluates<br />
the complete search space. Depending on the used optimization techniques this can<br />
have notable performance implications. However, if a workload change occurs, it takes a<br />
while until re-optimization is triggered. During this adaptation delay, we thus use a suboptimal<br />
plan and miss optimization opportunities. Finally, the parameter ∆t (optimization<br />
period) has high influence on optimization and execution times and hence, parameterization<br />
requires awareness <strong>of</strong> changing workloads. These four drawbacks are addressed with<br />
the concept <strong>of</strong> on-demand re-optimization that we will present in Chapter 6. However, the<br />
periodical re-optimization already provides a reasonable optimization framework including<br />
many fundamental concepts and thus, is used as the conceptual basis <strong>of</strong> this thesis.<br />
86