25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.3 <strong>Cost</strong>-<strong>Based</strong> Vectorization<br />

where we determine the minimum costs only for the given k. Hence, the optimization<br />

objective is to keep the maximum costs <strong>of</strong> execution buckets as low as possible. Due to<br />

the explicitly given k, the constrained objective φ c is not applicable. Finally, the rewriting<br />

algorithm can be reused as it is.<br />

Heuristic Computation Approach<br />

In contrast to the exhaustive computation, the heuristic approach requires major changes<br />

when restricting k because we cannot exploit the maximum capacity <strong>of</strong> a bucket that<br />

would result in any k and thus, would stand in conflict to the fixed k constraint. Hence,<br />

we require an alternative heuristic to solve this optimization problem.<br />

The heuristic algorithm (A-RCPV) for a restricted number <strong>of</strong> execution buckets k works<br />

as follows. In a first step, we distribute the m operators uniformly across the given k<br />

buckets, where the first m−k ·⌊m/k⌋ buckets get ⌈m/k⌉ operators and all other operators<br />

get ⌊m/k⌋ operators assigned to them. In a second step, for each bucket, we check if the<br />

performance can be improved by assigning its first operator to the previous bucket or its<br />

last operator to the next bucket. There, the optimization objective φ (Equation 4.18)<br />

is used to determine the influence in the sense <strong>of</strong> lower maximum bucket costs. Finally,<br />

we do this for each operator until no more operators are exchanged during one run over<br />

all operators. Due to the direct evaluation with φ, cycles are impossible and hence, the<br />

algorithm terminates and it exhibits a linear time complexity <strong>of</strong> O(m). We illustrate this<br />

using an example.<br />

Example 4.8 (Heuristic Computation with Fixed k). Assume a fixed number <strong>of</strong> buckets,<br />

k = 3. Figure 4.14 uses the plan and statistics from Example 4.7 and illustrates the<br />

heuristic approach for fixed k.<br />

given operator<br />

sequence o<br />

o1 o2 o3 o4 o5 o6<br />

W(o1)=1 W(o2)=4 W(o3)=5 W(o4)=2 W(o5)=3 W(o6)=1<br />

b1 b2 b3<br />

k=3 o1 o2 o3 o4 o5 o6<br />

max W(bi)=7<br />

b1 b2 b3<br />

o1 o2 o3 o4 o5 o6<br />

max W(bi)=6<br />

Figure 4.14: Heuristic Operator Distribution with Fixed k<br />

We distribute the six operators uniformly across the three execution buckets. As already<br />

mentioned, the performance <strong>of</strong> the plan depends on the most time-consuming bucket. In<br />

our example, this is bucket 2, with W (b 2 ) = 7 ms. Now, we exchange operators. First, at<br />

bucket 1, no operator is exchanged because transferring o 2 from b 1 to b 2 would increase the<br />

maximum bucket costs. The same is true for a transfer <strong>of</strong> o 3 from b 2 to b 1 . However, we<br />

can transfer o 4 from b 2 to b 3 and reduce the maximum costs to W (b 3 ) = 6 ms. Finally, we<br />

require a final run over all buckets to check the termination condition.<br />

As a result, one can solve both the P-CPV as well as the constrained P-CPV under<br />

the restriction <strong>of</strong> a fixed number <strong>of</strong> execution buckets and thus, also with a fixed number<br />

<strong>of</strong> threads. With this approach we can guarantee to overcome the problem <strong>of</strong> a possibly<br />

large number <strong>of</strong> required threads.<br />

111

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!