25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.3 <strong>Cost</strong>-<strong>Based</strong> Vectorization<br />

Pro<strong>of</strong>. The problem <strong>of</strong> finding all possible plans <strong>of</strong> m operators that are not connected by<br />

any data dependencies is reducible to the known problem <strong>of</strong> finding all possible partitions<br />

<strong>of</strong> a set with m members, where the Bell’s numbers B m [Bel34a, Bel34b] represent the<br />

total number <strong>of</strong> partitions. Note that similar to this known problem, we exclude the plan<br />

with zero operators (B 0 = B 1 = 1). Thus, the number <strong>of</strong> possible plans can be recursively<br />

computed by<br />

|P ′′ | = B m =<br />

m−1<br />

∑<br />

j=0<br />

( m − 1<br />

j<br />

)<br />

· B j . (4.13)<br />

Furthermore, each Bell number is the sum <strong>of</strong> Stirling numbers <strong>of</strong> the second kind [Jr.68].<br />

As a result, we are able to determine the number <strong>of</strong> plans |P ′′ | k for a given k by<br />

m∑<br />

B m = S(m, k) with S(m, k) = 1 k∑<br />

( )<br />

(−1) k−j k<br />

j m<br />

k!<br />

j<br />

k=0<br />

j=0<br />

|P ′′ | k = 1 k∑<br />

( ) (4.14)<br />

(−1) k−j k<br />

j m<br />

k!<br />

j<br />

j=0<br />

In addition, many asymptotic limits for Bell numbers are known [Lov93]. However, in<br />

general, we can state that the Bell numbers grow in O(2 cm ), where c is a constant factor.<br />

Due to the linear complexity <strong>of</strong> O(m) for determining the costs <strong>of</strong> a plan, the cost-based<br />

plan vectorization problem exhibits an exponential worst-case overall time complexity <strong>of</strong><br />

O (2 m ) = O (m · 2 cm ). Hence, Lemma 4.2 holds.<br />

Now, we can combine the results for the best and the worst case to the general result.<br />

Theorem 4.2. The cost-based plan vectorization problem exhibits an exponential time<br />

complexity <strong>of</strong> O(2 m ).<br />

Pro<strong>of</strong>. The cost-based plan vectorization problem exhibits an exponential time complexity<br />

<strong>of</strong> O(2 m ) for both the best-case plan (Lemma 4.1) and the worst-case plan (Lemma 4.2).<br />

Hence, Theorem 4.2 holds.<br />

4.3.2 Computation Approach<br />

So far, we have analyzed the search space <strong>of</strong> the P-CPV. We now explain how the optimal<br />

plan is computed with regard to the current execution statistics. In detail, we present<br />

an exhaustive computation approach (thus, with exponential time complexity) as well<br />

as a heuristic with linear time complexity that is used within our general cost-based<br />

optimization framework. For simplicity <strong>of</strong> presentation, we use the sequence <strong>of</strong> operators.<br />

However, the general version <strong>of</strong> the exhaustive and heuristic computation approaches<br />

use recursive algorithms that contain many specific cases for arbitrary combinations <strong>of</strong><br />

subplans (sequences and sets) as well as cases for complex operators.<br />

Exhaustive Computation Approach<br />

The exhaustive computation approach has the following three steps:<br />

1. Scheme Enumeration: Enumerate all 2 m−1 possible plan distribution schemes for<br />

the sequence <strong>of</strong> operators.<br />

105

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!