25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4 Vectorizing <strong>Integration</strong> <strong>Flows</strong><br />

Theorem 4.3. The cost-based plan vectorization algorithm solves (1) the P-CPV and (2)<br />

the constrained P-CPV with linear time complexity <strong>of</strong> O(m). There, the cost constraints<br />

hold but the number <strong>of</strong> execution buckets might not be minimized.<br />

Pro<strong>of</strong>. Assume a plan that comprises a sequence <strong>of</strong> m operators. First, the maximum <strong>of</strong><br />

a value list (line 2) is known to exhibit a linear time complexity <strong>of</strong> O(m). Second, we see<br />

that the bucket number is at least 1 (all operators assigned to one bucket) and at most<br />

m (each operator assigned to exactly one bucket). Third, in both cases <strong>of</strong> k = 1, and<br />

k = m, there are at most 2m − 1 possible operator evaluations. If we assume constant<br />

time complexity for all set operations, we can now conclude that the cost-based plan<br />

vectorization algorithm exhibits a linear complexity with O(m) = O(3m − 1). However,<br />

due to the importance <strong>of</strong> the concrete order <strong>of</strong> operator evaluations, we might require a<br />

higher number <strong>of</strong> execution buckets k than optimal. Hence, Theorem 4.3 holds.<br />

We use an example to illustrate this heuristic cost-based plan vectorization algorithm<br />

and the influence <strong>of</strong> the λ parameter regarding the constrained optimization objective φ c .<br />

Example 4.7 (Heuristic <strong>Cost</strong>-<strong>Based</strong> Plan Vectorization). Assume a plan with m = 6<br />

operators shown in Figure 4.13. Each operator o i has assigned execution times W (o i ). The<br />

maximum operator execution time is given by W (o max ) = max m i=1 W (o i) = W (o 3 ) = 5 ms.<br />

given operator<br />

sequence o<br />

o1 o2 o3 o4 o5 o6<br />

W(o1)=1 W(o2)=4 W(o3)=5 W(o4)=2 W(o5)=3 W(o6)=1<br />

λ=0 (max W(bi)=5)<br />

λ=1 (max W(bi)=6)<br />

λ=2 (max W(bi)=7)<br />

b1<br />

b2 b3 b4<br />

o1 o2 o3 o4 o5 o6<br />

b1<br />

b2 b3<br />

o1 o2 o3 o4 o5 o6<br />

b1<br />

b2<br />

b3<br />

o1 o2 o3 o4 o5 o6<br />

k=4<br />

k=3<br />

k=3<br />

Figure 4.13: Bucket Merging with Different λ<br />

The Constrained P-CPV describes the search for the minimal number <strong>of</strong> execution buckets,<br />

where the cumulative costs <strong>of</strong> each bucket must not be larger than the determined maximum<br />

plus a user-defined cost increase λ. Hence, we search for those k buckets whose cumulative<br />

costs <strong>of</strong> each bucket are, at most, equal to five. If we increase λ, we can reduce the number<br />

<strong>of</strong> buckets by increasing the allowed maximum and hence, the work cycle <strong>of</strong> the vectorized<br />

plan. This example also shows the heuristic character <strong>of</strong> the algorithm. For the case <strong>of</strong><br />

λ = 2 ms, we find a scheme with (k = 3, W (b 1 ) = 5 ms, W (b 2 ) = 7 ms, W (b 3 ) = 4 ms),<br />

while our exhaustive approach would find the more balanced scheme (k = 3, W (b 1 ) =<br />

5 ms, W (b 2 ) = 5 ms, W (b 3 ) = 6 ms).<br />

In conclusion, this heuristic approach ensures the maximum benefit <strong>of</strong> pipeline parallelism<br />

and <strong>of</strong>ten minimizes the number <strong>of</strong> execution buckets and hence, reduces the<br />

number <strong>of</strong> threads as well as the length <strong>of</strong> the pipeline at the same time. Cammert et<br />

al. introduced a similar optimization objective in terms <strong>of</strong> a stall-avoiding partitioning<br />

<strong>of</strong> continuous queries [CHK + 07] in the context <strong>of</strong> data stream management systems. In<br />

contrast to their approach, our algorithm is tailor-made for integration flows with control<br />

flow semantics and materialized intermediate results within an execution bucket. In<br />

addition, (1) we presented exhaustive and heuristic computation approaches as well as<br />

108

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!