25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6 On-Demand Re-<strong>Optimization</strong><br />

The resulting PlanOptTree for a sequence <strong>of</strong> m = 5 operators that has been distributed<br />

to k = 3 execution buckets is illustrated in Figure 6.12. With regard to cost-based vectorization,<br />

we need to include operator nodes for all operators, but only the execution time as<br />

atomic statistic nodes. Furthermore, we use the complex statistic node W (o max ) + λ and<br />

the aggregated bucket execution costs W (b i ) for each bucket with more than two operators.<br />

Then, there are three optimality conditions (oc 1 -oc 3 ), which check that the bucket<br />

execution costs (or operator execution costs) are below this maximum. In addition, two<br />

optimality conditions (oc 4 and oc 5 ) are used in order to check if all operators belong to<br />

the right bucket with regard to producing the same result as the A-CPV does.<br />

Whenever one <strong>of</strong> the optimality conditions is violated, we trigger directed re-optimization.<br />

In this case we know the operator or execution bucket, respectively, which reasoned<br />

this violation. In contrast to the the full A-CPV, we directly start the cost-based plan<br />

vectorization at this bucket and hence, might not need to evaluate all operators. However,<br />

the worst-case time complexity <strong>of</strong> O(m) is not changed by the directed cost-based plan<br />

vectorization because we might start directed re-optimization at the first operator o 1 .<br />

6.4.3 Multi-Flow <strong>Optimization</strong><br />

Similarly to the cost-based vectorization, on-demand re-optimization can also be applied<br />

to the data-flow-oriented optimization technique multi-flow optimization that has been<br />

discussed in Chapter 5. It is based on the algorithms <strong>of</strong> deriving partitioning schemes<br />

(A-DPA) and on the waiting time computation (A-WTC). Furthermore, we follow the<br />

optimization objective <strong>of</strong> minimizing the total latency time with<br />

φ = max |M ′ |<br />

∆t<br />

= min T L (M ′ ), (6.11)<br />

where T L (M ′ ) is computed by<br />

⌈ |M<br />

ˆT L (M ′ , k ′ ′ ⌉<br />

|<br />

) = · ∆tw + Ŵ (P ′ , k ′ ). (6.12)<br />

k ′<br />

and Ŵ (P ′ , k ′ ) is computed by Ŵ (P ′ , k ′ ) = W − (P ′ )+W + (P ′ )·k ′ for arbitrary k ′ = R·∆tw.<br />

Due to our specific cost model extension, this minimum is given at ∆tw = W (P ′ , ∆tw · R)<br />

such that we can compute ∆tw by ∆tw = W − (P ′ )/(1−W + (P ′ )·R). In order to ensure the<br />

latency time constraint at the same time, we additionally evaluate the validity condition<br />

<strong>of</strong> (0 ≤ W (P ′ , k ′ ) ≤ ∆tw) ∧ (0 ≤ ˆT L ≤ lc). In general, there are different cases, which<br />

reason different optimality conditions. Here, we concentrate on the default case, where<br />

the computed waiting time fulfills the validity condition.<br />

For a plan P with m = 5 operators and h = 2 partitioning attributes, there exist<br />

h + 3 = 5 optimality conditions. First, h − 1 = 1 optimality conditions are required<br />

with regard to the derivation <strong>of</strong> partitioning schemes, where we order the partitioning<br />

attributes according to the monitored selectivities with oc 1 : sel(ba 1 ) ≥ sel(ba 2 ). Second,<br />

for this case <strong>of</strong> a valid waiting time, we require four optimality conditions—independent<br />

<strong>of</strong> the number <strong>of</strong> partitioning attributes—in order to represent the validity condition with<br />

oc 2 : 0 ≤ W (P ′ , k ′ ), oc 3 : W (P ′ , k ′ ) ≤ ∆tw, oc 4 : 0 ≤ ˆT L , and oc 5 : ˆT L ≤ lc).<br />

The PlanOptTree that represents these optimality conditions is shown in Figure 6.13.<br />

Essentially, it contains operator nodes for all five operators. Only for operators with<br />

partitioning attributes, we monitor the selectivity according to this attribute, while for all<br />

186

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!