25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.3 <strong>Cost</strong>-<strong>Based</strong> Vectorization<br />

scheme has a linear time complexity <strong>of</strong> O(m). As a result, the overall complexity <strong>of</strong> the<br />

exhaustive computation is still dominated by the enumeration <strong>of</strong> candidate distribution<br />

schemes, and hence, it has an exponential time complexity <strong>of</strong> O(2 m ).<br />

Heuristic Computation Approach<br />

Due to this exponential complexity <strong>of</strong> the P-CPV, a search space reduction approach for<br />

determining the (near) cost-optimal solution for the P-CPV is required. Therefore, we<br />

present a heuristic algorithm that solves the P-CPV and the Constrained P-CPV with<br />

linear complexity <strong>of</strong> O(m). The core idea is to use a first fit (next fit) approach <strong>of</strong> merging<br />

operators into execution buckets until the maximum constraint is reached.<br />

Algorithm 4.3 <strong>Cost</strong>-<strong>Based</strong> Plan Vectorization (A-CPV)<br />

Require: operator sequence o<br />

1: A ← ∅, B ← ∅, k ← 0<br />

2: max ← max m i=1 W (P ′ , o i ) + λ<br />

3: for i ← 1 to |o| do // for each operator o i<br />

4: if o i ∈ A then<br />

5: continue 3<br />

6: k ← k + 1<br />

7: b k (o i ) ← create bucket over o i<br />

8: for j<br />

(<br />

← i + 1 to |o| do // for each following operator o j<br />

∑|bk<br />

)<br />

|<br />

9: if c=1 W (o c) + W (o j ) ≤ max then<br />

10: b k ← add o j to b k<br />

11: A ← A ∪ o j<br />

12: else<br />

13: break 9<br />

14: B ← B ∪ b k<br />

15: return B<br />

Algorithm 4.3 illustrates the concept <strong>of</strong> the cost-based plan vectorization algorithm.<br />

The operator sequence o is required. First, we initialize two sets A and B as empty sets.<br />

Thereafter, we compute the maximal costs <strong>of</strong> a bucket max with max = max m i=1 W (o i)+λ<br />

followed by the main loop over all operators. If the operator o i belongs to A (operators<br />

already assigned to buckets), we can proceed with the next operator. Otherwise, we<br />

create a new bucket b k and increment the number <strong>of</strong> buckets k accordingly. After that, we<br />

execute the inner loop in order to assign operators to this bucket such that the constraint<br />

∑ |bk |<br />

c=1 W (o c) ≤ max holds. This is done by adding o j to b k and to A. Here, we can ensure<br />

that each created bucket has at least one operator assigned. Finally, each new bucket b k<br />

is added to the set <strong>of</strong> buckets B.<br />

The heuristic character is reasoned by merging subsequent operators. This is similar to<br />

the first-fit (next fit) algorithm [Joh74] <strong>of</strong> the bin packing problem but with the difference<br />

that the order <strong>of</strong> operators must be preserved. Thus, there are cases, where we do not find<br />

the optimal scheme. However, this algorithm <strong>of</strong>ten leads to good or near-optimal results.<br />

In conclusion, we use this heuristic as default computation approach, which motivates a<br />

more detailed complexity and cost analysis, which we discuss in the following.<br />

107

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!