Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4 Vectorizing <strong>Integration</strong> <strong>Flows</strong><br />
Theorem 4.3. The cost-based plan vectorization algorithm solves (1) the P-CPV and (2)<br />
the constrained P-CPV with linear time complexity <strong>of</strong> O(m). There, the cost constraints<br />
hold but the number <strong>of</strong> execution buckets might not be minimized.<br />
Pro<strong>of</strong>. Assume a plan that comprises a sequence <strong>of</strong> m operators. First, the maximum <strong>of</strong><br />
a value list (line 2) is known to exhibit a linear time complexity <strong>of</strong> O(m). Second, we see<br />
that the bucket number is at least 1 (all operators assigned to one bucket) and at most<br />
m (each operator assigned to exactly one bucket). Third, in both cases <strong>of</strong> k = 1, and<br />
k = m, there are at most 2m − 1 possible operator evaluations. If we assume constant<br />
time complexity for all set operations, we can now conclude that the cost-based plan<br />
vectorization algorithm exhibits a linear complexity with O(m) = O(3m − 1). However,<br />
due to the importance <strong>of</strong> the concrete order <strong>of</strong> operator evaluations, we might require a<br />
higher number <strong>of</strong> execution buckets k than optimal. Hence, Theorem 4.3 holds.<br />
We use an example to illustrate this heuristic cost-based plan vectorization algorithm<br />
and the influence <strong>of</strong> the λ parameter regarding the constrained optimization objective φ c .<br />
Example 4.7 (Heuristic <strong>Cost</strong>-<strong>Based</strong> Plan Vectorization). Assume a plan with m = 6<br />
operators shown in Figure 4.13. Each operator o i has assigned execution times W (o i ). The<br />
maximum operator execution time is given by W (o max ) = max m i=1 W (o i) = W (o 3 ) = 5 ms.<br />
given operator<br />
sequence o<br />
o1 o2 o3 o4 o5 o6<br />
W(o1)=1 W(o2)=4 W(o3)=5 W(o4)=2 W(o5)=3 W(o6)=1<br />
λ=0 (max W(bi)=5)<br />
λ=1 (max W(bi)=6)<br />
λ=2 (max W(bi)=7)<br />
b1<br />
b2 b3 b4<br />
o1 o2 o3 o4 o5 o6<br />
b1<br />
b2 b3<br />
o1 o2 o3 o4 o5 o6<br />
b1<br />
b2<br />
b3<br />
o1 o2 o3 o4 o5 o6<br />
k=4<br />
k=3<br />
k=3<br />
Figure 4.13: Bucket Merging with Different λ<br />
The Constrained P-CPV describes the search for the minimal number <strong>of</strong> execution buckets,<br />
where the cumulative costs <strong>of</strong> each bucket must not be larger than the determined maximum<br />
plus a user-defined cost increase λ. Hence, we search for those k buckets whose cumulative<br />
costs <strong>of</strong> each bucket are, at most, equal to five. If we increase λ, we can reduce the number<br />
<strong>of</strong> buckets by increasing the allowed maximum and hence, the work cycle <strong>of</strong> the vectorized<br />
plan. This example also shows the heuristic character <strong>of</strong> the algorithm. For the case <strong>of</strong><br />
λ = 2 ms, we find a scheme with (k = 3, W (b 1 ) = 5 ms, W (b 2 ) = 7 ms, W (b 3 ) = 4 ms),<br />
while our exhaustive approach would find the more balanced scheme (k = 3, W (b 1 ) =<br />
5 ms, W (b 2 ) = 5 ms, W (b 3 ) = 6 ms).<br />
In conclusion, this heuristic approach ensures the maximum benefit <strong>of</strong> pipeline parallelism<br />
and <strong>of</strong>ten minimizes the number <strong>of</strong> execution buckets and hence, reduces the<br />
number <strong>of</strong> threads as well as the length <strong>of</strong> the pipeline at the same time. Cammert et<br />
al. introduced a similar optimization objective in terms <strong>of</strong> a stall-avoiding partitioning<br />
<strong>of</strong> continuous queries [CHK + 07] in the context <strong>of</strong> data stream management systems. In<br />
contrast to their approach, our algorithm is tailor-made for integration flows with control<br />
flow semantics and materialized intermediate results within an execution bucket. In<br />
addition, (1) we presented exhaustive and heuristic computation approaches as well as<br />
108