25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong> <strong>Flows</strong><br />

our transformation-based optimization algorithm, approaches for search space reduction<br />

and adjusting the sensibility <strong>of</strong> workload adaptation as well as a lightweight concept for<br />

handling conditional probabilities and correlation. Further, we explained selected optimization<br />

techniques that are specific to integration flows because they exploit both the<br />

data flow and the control flow in a combined manner. Our evaluation shows significant<br />

performance improvements with moderate overhead for periodical re-optimization.<br />

In conclusion, our cost-based optimization approach can be integrated seamlessly into<br />

the major products in the area <strong>of</strong> integration platforms. <strong>Based</strong> on the observation <strong>of</strong><br />

many independent instances <strong>of</strong> integration flows, this approach <strong>of</strong> periodical cost-based<br />

re-optimization is tailor-made for integration flows. In detail, the advantages <strong>of</strong> periodical<br />

re-optimization are (1) the asynchronous optimization independently <strong>of</strong> executing certain<br />

instances, (2) the fact that all subsequent instances rather than only the current query<br />

benefit from re-optimization, and (3) the inter-instance plan change without the need<br />

<strong>of</strong> state migration. This general optimization framework can be used as foundation for<br />

further rewriting techniques and optimization approaches.<br />

Apart from these re-optimization advantages, the optimization framework presented so<br />

far has still several shortcomings. First, only the optimization objective <strong>of</strong> minimizing<br />

the average plan execution time was considered. This is not always a suitable optimization<br />

objective because in high load scenarios <strong>of</strong>ten the major optimization objective is<br />

throughput maximization, while moderate latency times are acceptable. Therefore, in the<br />

following, we will present two integration-flow-specific optimization techniques that have<br />

the potential to significantly increase the message throughput. In detail, we present the<br />

cost-based vectorization (a control-flow-oriented optimization technique) in Chapter 4 and<br />

the multi-flow optimization (a data-flow-oriented optimization technique) in Chapter 5.<br />

Second, also the periodical re-optimization algorithm itself has several drawbacks. This<br />

includes the generic gathering <strong>of</strong> statistics for all operators that causes the maintenance <strong>of</strong><br />

statistics that might not be used by the optimizer. While for the evaluated workload aggregation<br />

methods, this overhead was negligible, there might be performance issues when<br />

using more complex forecast models. In addition, there is the problem <strong>of</strong> periodically<br />

triggered re-optimization, where a new plan is only found if workload characteristics have<br />

changed. Otherwise, we trigger many unnecessary invocations <strong>of</strong> the optimizer that evaluates<br />

the complete search space. Depending on the used optimization techniques this can<br />

have notable performance implications. However, if a workload change occurs, it takes a<br />

while until re-optimization is triggered. During this adaptation delay, we thus use a suboptimal<br />

plan and miss optimization opportunities. Finally, the parameter ∆t (optimization<br />

period) has high influence on optimization and execution times and hence, parameterization<br />

requires awareness <strong>of</strong> changing workloads. These four drawbacks are addressed with<br />

the concept <strong>of</strong> on-demand re-optimization that we will present in Chapter 6. However, the<br />

periodical re-optimization already provides a reasonable optimization framework including<br />

many fundamental concepts and thus, is used as the conceptual basis <strong>of</strong> this thesis.<br />

86

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!