Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3.3 Periodical Re-<strong>Optimization</strong><br />
3.3 Periodical Re-<strong>Optimization</strong><br />
<strong>Based</strong> on the presented prerequisites, we now explain the core algorithm for the cost-based<br />
optimization <strong>of</strong> imperative integration flows. First, we formally define the periodic plan<br />
optimization problem including the existing parameters and show that this problem is<br />
NP-hard. Due to the complexity <strong>of</strong> this optimization problem, we additionally introduce<br />
two search space reduction approaches. Then, we describe how the parameters can be<br />
leveraged in order to influence the sensibility <strong>of</strong> workload adaptation. Finally, we sketch<br />
how to handle conditional probabilities and correlation without the knowledge about data<br />
characteristics (e.g., value distributions) <strong>of</strong> external systems.<br />
The core optimization algorithm is independent <strong>of</strong> any concrete optimization technique.<br />
For that reason, it can be extended with arbitrary new techniques. However, we use<br />
selected optimization techniques in order to illustrate the properties and the behavior <strong>of</strong><br />
our optimization algorithm. In Section 3.4, we will explain various concrete optimization<br />
techniques in more detail.<br />
3.3.1 Overall <strong>Optimization</strong> Algorithm<br />
Existing approaches <strong>of</strong> integration flow optimization, which also take execution statistics<br />
into account [SMWM06, SVS05], use the optimize-always model, where the given plan<br />
is optimized for each initiated plan instance. While this is advantageous for changing<br />
workload characteristics in combination with long running plan instances, it fails under<br />
the assumption <strong>of</strong> many plan instances with rather small amounts <strong>of</strong> data because in this<br />
case the optimization time might be even higher than the execution time <strong>of</strong> the plan. In<br />
consequence, we introduce an optimization algorithm that exploits the integration-flowspecific<br />
characteristics <strong>of</strong> (1) being deployed once and executed many times as well as (2)<br />
the presence <strong>of</strong> an initially given imperative integration flow.<br />
<strong>Optimization</strong> Problem<br />
The goal <strong>of</strong> plan optimization is to rewrite (transform) a given plan into a semantically<br />
equivalent plan that is optimal in the average case with regard to the estimated costs. The<br />
major differences to DBMS are (1) the average case optimization <strong>of</strong> a deployed plan and<br />
(2) the transformation-based plan rewriting that takes into account the specified controlflow<br />
semantics <strong>of</strong> imperative integration flows. As a first step, we define the optimal plan<br />
as follows:<br />
Definition 3.3 (Optimal Plan). A plan P = (o, c, s) is optimal at timestamp T k with<br />
respect to a given workload W (P, T k ) if no plan P ′ = (o ′ , c ′ , s) with lower estimated execution<br />
time Ŵ (P ′ ) < W (P ) exists. Thus, the optimization objective φ <strong>of</strong> any optimization<br />
algorithm is to minimize the estimated average execution time <strong>of</strong> the plan with:<br />
φ = min Ŵ (P ). (3.5)<br />
The plan P is optimal according to the monitored statistics at the timestamp <strong>of</strong> optimization<br />
T k . In case <strong>of</strong> changing workload characteristics, over time, the plan P might<br />
loose this property <strong>of</strong> optimality. For this reason, we require periodical re-optimization if<br />
we do not want to employ an optimize-always model. Hence, as a second step, we formally<br />
define this time-based optimization problem as follows:<br />
45