25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.3 Periodical Re-<strong>Optimization</strong><br />

3.3 Periodical Re-<strong>Optimization</strong><br />

<strong>Based</strong> on the presented prerequisites, we now explain the core algorithm for the cost-based<br />

optimization <strong>of</strong> imperative integration flows. First, we formally define the periodic plan<br />

optimization problem including the existing parameters and show that this problem is<br />

NP-hard. Due to the complexity <strong>of</strong> this optimization problem, we additionally introduce<br />

two search space reduction approaches. Then, we describe how the parameters can be<br />

leveraged in order to influence the sensibility <strong>of</strong> workload adaptation. Finally, we sketch<br />

how to handle conditional probabilities and correlation without the knowledge about data<br />

characteristics (e.g., value distributions) <strong>of</strong> external systems.<br />

The core optimization algorithm is independent <strong>of</strong> any concrete optimization technique.<br />

For that reason, it can be extended with arbitrary new techniques. However, we use<br />

selected optimization techniques in order to illustrate the properties and the behavior <strong>of</strong><br />

our optimization algorithm. In Section 3.4, we will explain various concrete optimization<br />

techniques in more detail.<br />

3.3.1 Overall <strong>Optimization</strong> Algorithm<br />

Existing approaches <strong>of</strong> integration flow optimization, which also take execution statistics<br />

into account [SMWM06, SVS05], use the optimize-always model, where the given plan<br />

is optimized for each initiated plan instance. While this is advantageous for changing<br />

workload characteristics in combination with long running plan instances, it fails under<br />

the assumption <strong>of</strong> many plan instances with rather small amounts <strong>of</strong> data because in this<br />

case the optimization time might be even higher than the execution time <strong>of</strong> the plan. In<br />

consequence, we introduce an optimization algorithm that exploits the integration-flowspecific<br />

characteristics <strong>of</strong> (1) being deployed once and executed many times as well as (2)<br />

the presence <strong>of</strong> an initially given imperative integration flow.<br />

<strong>Optimization</strong> Problem<br />

The goal <strong>of</strong> plan optimization is to rewrite (transform) a given plan into a semantically<br />

equivalent plan that is optimal in the average case with regard to the estimated costs. The<br />

major differences to DBMS are (1) the average case optimization <strong>of</strong> a deployed plan and<br />

(2) the transformation-based plan rewriting that takes into account the specified controlflow<br />

semantics <strong>of</strong> imperative integration flows. As a first step, we define the optimal plan<br />

as follows:<br />

Definition 3.3 (Optimal Plan). A plan P = (o, c, s) is optimal at timestamp T k with<br />

respect to a given workload W (P, T k ) if no plan P ′ = (o ′ , c ′ , s) with lower estimated execution<br />

time Ŵ (P ′ ) < W (P ) exists. Thus, the optimization objective φ <strong>of</strong> any optimization<br />

algorithm is to minimize the estimated average execution time <strong>of</strong> the plan with:<br />

φ = min Ŵ (P ). (3.5)<br />

The plan P is optimal according to the monitored statistics at the timestamp <strong>of</strong> optimization<br />

T k . In case <strong>of</strong> changing workload characteristics, over time, the plan P might<br />

loose this property <strong>of</strong> optimality. For this reason, we require periodical re-optimization if<br />

we do not want to employ an optimize-always model. Hence, as a second step, we formally<br />

define this time-based optimization problem as follows:<br />

45

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!