Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
6 On-Demand Re-<strong>Optimization</strong><br />
categories that reasoned the use <strong>of</strong> periodical re-optimization for integration flows. First,<br />
integration flows are deployed once and executed many times, with rather small amounts<br />
<strong>of</strong> data per instance. Hence, there is no need for mid-instance (inter- or intra-operator) reoptimization.<br />
Second, in contrast to continuous-query-based systems, many independent<br />
instances <strong>of</strong> an integration flow are executed over time. Thus, there is no need for state migration<br />
during plan rewriting. Further advantages are (1) the asynchronous optimization<br />
independent <strong>of</strong> any instance execution, (2) the fact that all subsequent instances (until<br />
the next plan change) rather than only the current query benefit from re-optimization,<br />
and (3) the efficient inter-instance plan change without state migration. However, this<br />
optimization model exhibits also major drawbacks, which we reveal in the following.<br />
Periodical<br />
Re-<strong>Optimization</strong><br />
Execution Time<br />
per<br />
Instance <strong>of</strong> Plan<br />
P, P’, P’’<br />
On-Demand<br />
Re-<strong>Optimization</strong><br />
(4) high-influence parameter<br />
optimization interval<br />
re-optimization steps<br />
∆t<br />
P P’ P’’<br />
(1) many unnecessary<br />
re-optimization steps<br />
initial<br />
plan P<br />
workload change<br />
modified<br />
plan P’<br />
(2) missed optimization<br />
opportunity (3) overhead <strong>of</strong> maintaining<br />
unnecessary statistics<br />
workload change<br />
Time<br />
Figure 6.1: Drawbacks <strong>of</strong> Periodical Re-<strong>Optimization</strong><br />
P’’<br />
Figure 6.1 shows the execution time <strong>of</strong> plan instances that have been executed over time<br />
in a scenario with two workload shifts. Re-optimization is triggered periodically using a period<br />
∆t, where we only find a new plan if a workload shift occurred meanwhile. We observe<br />
the potential problems <strong>of</strong> (1) many unnecessary re-optimization steps, where each step is a<br />
full re-optimization and (2) adaptation delays, where we miss optimization opportunities.<br />
Furthermore, we might (3) maintain statistics that are not used by the optimizer and (4)<br />
the chosen optimization interval has high influence on the execution time. Depending on<br />
the optimization interval, periodical re-optimization can even degrade to the unoptimized<br />
execution. To tackle these problems, we propose the on-demand re-optimization that directly<br />
reacts to workload shifts if a new plan is certain to be found. This implies only<br />
necessary re-optimization steps and no missed optimization opportunities.<br />
Example 6.1 (Periodical Plan <strong>Optimization</strong>). Recall our example plan P 5 that consists <strong>of</strong><br />
m = 9 operators, which is illustrated in Figure 6.2. It receives messages from the system<br />
s 3 , executes three Selection operators (according to different attributes). Subsequently,<br />
a Switch operator routes the incoming messages with content-based predicates to schema<br />
mapping Translation operators. Finally, the result is loaded into the system s 6 . For each<br />
received message, conceptually, an independent instance <strong>of</strong> this plan is initiated. In order<br />
to enable cost-based optimization, statistics are monitored for each operator. We assume<br />
that re-optimization is periodically triggered with period ∆t, as shown in Figure 6.1. During<br />
this re-optimization, all gathered statistics are aggregated and used as cost estimates.<br />
However, in this particular example, there are only few rewriting possibilities: In detail, the<br />
sequence <strong>of</strong> Selection operators can be reordered according to their selectivities (optimality<br />
conditions oc 1 -oc 3 ; e.g., oc 1 : sel(o 2 ) ≤ sel(o 3 ) with sel = |ds out1 |/|ds in1 |), and the paths<br />
<strong>of</strong> the Switch operator can be reordered according to their cost-weighted path probabilities<br />
(oc 4 ). Each single re-optimization is a full optimization, where our transformation-based<br />
168