25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6 On-Demand Re-<strong>Optimization</strong><br />

categories that reasoned the use <strong>of</strong> periodical re-optimization for integration flows. First,<br />

integration flows are deployed once and executed many times, with rather small amounts<br />

<strong>of</strong> data per instance. Hence, there is no need for mid-instance (inter- or intra-operator) reoptimization.<br />

Second, in contrast to continuous-query-based systems, many independent<br />

instances <strong>of</strong> an integration flow are executed over time. Thus, there is no need for state migration<br />

during plan rewriting. Further advantages are (1) the asynchronous optimization<br />

independent <strong>of</strong> any instance execution, (2) the fact that all subsequent instances (until<br />

the next plan change) rather than only the current query benefit from re-optimization,<br />

and (3) the efficient inter-instance plan change without state migration. However, this<br />

optimization model exhibits also major drawbacks, which we reveal in the following.<br />

Periodical<br />

Re-<strong>Optimization</strong><br />

Execution Time<br />

per<br />

Instance <strong>of</strong> Plan<br />

P, P’, P’’<br />

On-Demand<br />

Re-<strong>Optimization</strong><br />

(4) high-influence parameter<br />

optimization interval<br />

re-optimization steps<br />

∆t<br />

P P’ P’’<br />

(1) many unnecessary<br />

re-optimization steps<br />

initial<br />

plan P<br />

workload change<br />

modified<br />

plan P’<br />

(2) missed optimization<br />

opportunity (3) overhead <strong>of</strong> maintaining<br />

unnecessary statistics<br />

workload change<br />

Time<br />

Figure 6.1: Drawbacks <strong>of</strong> Periodical Re-<strong>Optimization</strong><br />

P’’<br />

Figure 6.1 shows the execution time <strong>of</strong> plan instances that have been executed over time<br />

in a scenario with two workload shifts. Re-optimization is triggered periodically using a period<br />

∆t, where we only find a new plan if a workload shift occurred meanwhile. We observe<br />

the potential problems <strong>of</strong> (1) many unnecessary re-optimization steps, where each step is a<br />

full re-optimization and (2) adaptation delays, where we miss optimization opportunities.<br />

Furthermore, we might (3) maintain statistics that are not used by the optimizer and (4)<br />

the chosen optimization interval has high influence on the execution time. Depending on<br />

the optimization interval, periodical re-optimization can even degrade to the unoptimized<br />

execution. To tackle these problems, we propose the on-demand re-optimization that directly<br />

reacts to workload shifts if a new plan is certain to be found. This implies only<br />

necessary re-optimization steps and no missed optimization opportunities.<br />

Example 6.1 (Periodical Plan <strong>Optimization</strong>). Recall our example plan P 5 that consists <strong>of</strong><br />

m = 9 operators, which is illustrated in Figure 6.2. It receives messages from the system<br />

s 3 , executes three Selection operators (according to different attributes). Subsequently,<br />

a Switch operator routes the incoming messages with content-based predicates to schema<br />

mapping Translation operators. Finally, the result is loaded into the system s 6 . For each<br />

received message, conceptually, an independent instance <strong>of</strong> this plan is initiated. In order<br />

to enable cost-based optimization, statistics are monitored for each operator. We assume<br />

that re-optimization is periodically triggered with period ∆t, as shown in Figure 6.1. During<br />

this re-optimization, all gathered statistics are aggregated and used as cost estimates.<br />

However, in this particular example, there are only few rewriting possibilities: In detail, the<br />

sequence <strong>of</strong> Selection operators can be reordered according to their selectivities (optimality<br />

conditions oc 1 -oc 3 ; e.g., oc 1 : sel(o 2 ) ≤ sel(o 3 ) with sel = |ds out1 |/|ds in1 |), and the paths<br />

<strong>of</strong> the Switch operator can be reordered according to their cost-weighted path probabilities<br />

(oc 4 ). Each single re-optimization is a full optimization, where our transformation-based<br />

168

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!