25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.4 <strong>Optimization</strong> Techniques<br />

For arbitrary cardinalities, the optimality conditions (Figures 6.11(a) and 6.11(b)) are<br />

oc 1 : |R| ≥ |S| and<br />

oc 2 : |R| + |S| ·<br />

|R ∪ S|<br />

2<br />

≤ |R| + |S| + |R| · log 2 |R| + |S| · log 2 |S|.<br />

(6.8)<br />

After simplification <strong>of</strong> oc 2 , we obtain<br />

(<br />

oc ′ 2 : |R ∪ S| ≤ 2 1 + |R| · log )<br />

2|R|<br />

+ log<br />

|S|<br />

2 |S| . (6.9)<br />

Figure 6.11(b) illustrates the resulting PlanOptTree, where we monitor the input and<br />

output cardinalities |R|, |S|, |R ∪ S| and we only have to check the two fairly simple<br />

optimality conditions oc 1 and oc ′ 2 , respectively. A similar concept is also used for example,<br />

when deciding on nested-loop or sort-merge joins.<br />

6.4.2 <strong>Cost</strong>-<strong>Based</strong> Vectorization<br />

The concept <strong>of</strong> on-demand re-optimization can also seamlessly be applied to the controlflow-oriented<br />

technique cost-based plan vectorization that has been presented in Chapter<br />

4. For this technique, the created PlanOptTree depends on the used algorithm and<br />

optimization objective. In this subsection, we illustrate the on-demand re-optimization<br />

for the, already discussed, constrained optimization objective <strong>of</strong><br />

⎛ ⎞<br />

φ c = min<br />

m<br />

l k | ∀i ∈ [1, k] : ∑ bi<br />

⎝ W (o j ) ⎠ ≤ W (o max ) + λ, (6.10)<br />

k=1<br />

and the typically used heuristic computation approach (A-CPV), whose core idea is to<br />

merge buckets in a first-fit (next-fit) manner.<br />

Let o be a sequence <strong>of</strong> m operators that has been distributed to k execution buckets<br />

b i with 1 ≤ k ≤ m. Plan optimality is then represented by 2k − 1 optimality conditions<br />

with regard to the heuristic computation approach (A-CPV). First, for each bucket b i , the<br />

total execution time <strong>of</strong> all included operators must be below the maximum cost constraint<br />

with oc : W (b i ) ≤ W (o max ) + λ. Second, for each bucket, except the first one, we check if<br />

the first operator o i <strong>of</strong> this bucket b i still cannot be assigned to the previous bucket b i−1<br />

with oc i : W (b i−1 ) + W (o i ) ≥ W (o max ) + λ. This optimality condition is reasoned by the<br />

first-fit (next-fit) character <strong>of</strong> the A-CPV.<br />

j=1<br />

o 1 o 2 o 3 o 4 o 5<br />

W<br />

max + λ<br />

W<br />

W(b 2)<br />

W W W<br />

W(b 3)<br />

≤ (oc1)<br />

≥ (oc2)<br />

W(b 1+) W(b 2+)<br />

≥ (oc3)<br />

< (oc4) < (oc5)<br />

Figure 6.12: Example PlanOptTree <strong>of</strong> <strong>Cost</strong>-<strong>Based</strong> Vectorization<br />

185

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!