Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
4.3 <strong>Cost</strong>-<strong>Based</strong> Vectorization<br />
where we determine the minimum costs only for the given k. Hence, the optimization<br />
objective is to keep the maximum costs <strong>of</strong> execution buckets as low as possible. Due to<br />
the explicitly given k, the constrained objective φ c is not applicable. Finally, the rewriting<br />
algorithm can be reused as it is.<br />
Heuristic Computation Approach<br />
In contrast to the exhaustive computation, the heuristic approach requires major changes<br />
when restricting k because we cannot exploit the maximum capacity <strong>of</strong> a bucket that<br />
would result in any k and thus, would stand in conflict to the fixed k constraint. Hence,<br />
we require an alternative heuristic to solve this optimization problem.<br />
The heuristic algorithm (A-RCPV) for a restricted number <strong>of</strong> execution buckets k works<br />
as follows. In a first step, we distribute the m operators uniformly across the given k<br />
buckets, where the first m−k ·⌊m/k⌋ buckets get ⌈m/k⌉ operators and all other operators<br />
get ⌊m/k⌋ operators assigned to them. In a second step, for each bucket, we check if the<br />
performance can be improved by assigning its first operator to the previous bucket or its<br />
last operator to the next bucket. There, the optimization objective φ (Equation 4.18)<br />
is used to determine the influence in the sense <strong>of</strong> lower maximum bucket costs. Finally,<br />
we do this for each operator until no more operators are exchanged during one run over<br />
all operators. Due to the direct evaluation with φ, cycles are impossible and hence, the<br />
algorithm terminates and it exhibits a linear time complexity <strong>of</strong> O(m). We illustrate this<br />
using an example.<br />
Example 4.8 (Heuristic Computation with Fixed k). Assume a fixed number <strong>of</strong> buckets,<br />
k = 3. Figure 4.14 uses the plan and statistics from Example 4.7 and illustrates the<br />
heuristic approach for fixed k.<br />
given operator<br />
sequence o<br />
o1 o2 o3 o4 o5 o6<br />
W(o1)=1 W(o2)=4 W(o3)=5 W(o4)=2 W(o5)=3 W(o6)=1<br />
b1 b2 b3<br />
k=3 o1 o2 o3 o4 o5 o6<br />
max W(bi)=7<br />
b1 b2 b3<br />
o1 o2 o3 o4 o5 o6<br />
max W(bi)=6<br />
Figure 4.14: Heuristic Operator Distribution with Fixed k<br />
We distribute the six operators uniformly across the three execution buckets. As already<br />
mentioned, the performance <strong>of</strong> the plan depends on the most time-consuming bucket. In<br />
our example, this is bucket 2, with W (b 2 ) = 7 ms. Now, we exchange operators. First, at<br />
bucket 1, no operator is exchanged because transferring o 2 from b 1 to b 2 would increase the<br />
maximum bucket costs. The same is true for a transfer <strong>of</strong> o 3 from b 2 to b 1 . However, we<br />
can transfer o 4 from b 2 to b 3 and reduce the maximum costs to W (b 3 ) = 6 ms. Finally, we<br />
require a final run over all buckets to check the termination condition.<br />
As a result, one can solve both the P-CPV as well as the constrained P-CPV under<br />
the restriction <strong>of</strong> a fixed number <strong>of</strong> execution buckets and thus, also with a fixed number<br />
<strong>of</strong> threads. With this approach we can guarantee to overcome the problem <strong>of</strong> a possibly<br />
large number <strong>of</strong> required threads.<br />
111