Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
6 On-Demand Re-<strong>Optimization</strong><br />
The resulting PlanOptTree for a sequence <strong>of</strong> m = 5 operators that has been distributed<br />
to k = 3 execution buckets is illustrated in Figure 6.12. With regard to cost-based vectorization,<br />
we need to include operator nodes for all operators, but only the execution time as<br />
atomic statistic nodes. Furthermore, we use the complex statistic node W (o max ) + λ and<br />
the aggregated bucket execution costs W (b i ) for each bucket with more than two operators.<br />
Then, there are three optimality conditions (oc 1 -oc 3 ), which check that the bucket<br />
execution costs (or operator execution costs) are below this maximum. In addition, two<br />
optimality conditions (oc 4 and oc 5 ) are used in order to check if all operators belong to<br />
the right bucket with regard to producing the same result as the A-CPV does.<br />
Whenever one <strong>of</strong> the optimality conditions is violated, we trigger directed re-optimization.<br />
In this case we know the operator or execution bucket, respectively, which reasoned<br />
this violation. In contrast to the the full A-CPV, we directly start the cost-based plan<br />
vectorization at this bucket and hence, might not need to evaluate all operators. However,<br />
the worst-case time complexity <strong>of</strong> O(m) is not changed by the directed cost-based plan<br />
vectorization because we might start directed re-optimization at the first operator o 1 .<br />
6.4.3 Multi-Flow <strong>Optimization</strong><br />
Similarly to the cost-based vectorization, on-demand re-optimization can also be applied<br />
to the data-flow-oriented optimization technique multi-flow optimization that has been<br />
discussed in Chapter 5. It is based on the algorithms <strong>of</strong> deriving partitioning schemes<br />
(A-DPA) and on the waiting time computation (A-WTC). Furthermore, we follow the<br />
optimization objective <strong>of</strong> minimizing the total latency time with<br />
φ = max |M ′ |<br />
∆t<br />
= min T L (M ′ ), (6.11)<br />
where T L (M ′ ) is computed by<br />
⌈ |M<br />
ˆT L (M ′ , k ′ ′ ⌉<br />
|<br />
) = · ∆tw + Ŵ (P ′ , k ′ ). (6.12)<br />
k ′<br />
and Ŵ (P ′ , k ′ ) is computed by Ŵ (P ′ , k ′ ) = W − (P ′ )+W + (P ′ )·k ′ for arbitrary k ′ = R·∆tw.<br />
Due to our specific cost model extension, this minimum is given at ∆tw = W (P ′ , ∆tw · R)<br />
such that we can compute ∆tw by ∆tw = W − (P ′ )/(1−W + (P ′ )·R). In order to ensure the<br />
latency time constraint at the same time, we additionally evaluate the validity condition<br />
<strong>of</strong> (0 ≤ W (P ′ , k ′ ) ≤ ∆tw) ∧ (0 ≤ ˆT L ≤ lc). In general, there are different cases, which<br />
reason different optimality conditions. Here, we concentrate on the default case, where<br />
the computed waiting time fulfills the validity condition.<br />
For a plan P with m = 5 operators and h = 2 partitioning attributes, there exist<br />
h + 3 = 5 optimality conditions. First, h − 1 = 1 optimality conditions are required<br />
with regard to the derivation <strong>of</strong> partitioning schemes, where we order the partitioning<br />
attributes according to the monitored selectivities with oc 1 : sel(ba 1 ) ≥ sel(ba 2 ). Second,<br />
for this case <strong>of</strong> a valid waiting time, we require four optimality conditions—independent<br />
<strong>of</strong> the number <strong>of</strong> partitioning attributes—in order to represent the validity condition with<br />
oc 2 : 0 ≤ W (P ′ , k ′ ), oc 3 : W (P ′ , k ′ ) ≤ ∆tw, oc 4 : 0 ≤ ˆT L , and oc 5 : ˆT L ≤ lc).<br />
The PlanOptTree that represents these optimality conditions is shown in Figure 6.13.<br />
Essentially, it contains operator nodes for all five operators. Only for operators with<br />
partitioning attributes, we monitor the selectivity according to this attribute, while for all<br />
186