25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4 Vectorizing <strong>Integration</strong> <strong>Flows</strong><br />

In the following, we formally analyze the time complexity for exhaustively solving this<br />

problem. For this purpose, we analyze the complexity <strong>of</strong> the best and worst case and<br />

combine this to the general result.<br />

Lemma 4.1. The cost-based plan vectorization problem exhibits an exponential time complexity<br />

<strong>of</strong> O(2 m ) for the best-case plan <strong>of</strong> an operator sequence.<br />

Pro<strong>of</strong>. The distribution function D <strong>of</strong> the number <strong>of</strong> possible plans over k is a symmetric<br />

function according to Pascal’s Triangle 9 , where the condition l bi = l bk−i+1 with i ≤ m/2<br />

holds. <strong>Based</strong> on Definition 2.1, a plan contains m operators. Due to Definition 4.2, we<br />

search for k execution buckets b i with l bi ≥ 1 ∧ l bi ≤ m and ∑ |b|<br />

i=1 l b i<br />

= m. Hence, different<br />

numbers <strong>of</strong> buckets k ∈ [1, m] have to be evaluated. From now on, we fix m ′ as m ′ = m−1<br />

and k ′ as k ′ = k − 1. In fact, there is only one possible plan for k = 1 (all operators in<br />

one bucket) and k = m (each operator in a different bucket), respectively:<br />

( ) m<br />

′<br />

|P | k ′ =0 =<br />

0<br />

= 1 and |P ′ | k ′ =m ′ = ( m<br />

′<br />

m ′ )<br />

= 1 . (4.9)<br />

Now, without loss <strong>of</strong> generality, we fix a specific m. The number <strong>of</strong> possible plans for a<br />

given k is then computed with<br />

|P ′′ | k =<br />

( m<br />

′<br />

k ′ )<br />

=<br />

( m ′ ) (<br />

− 1 m<br />

k ′ +<br />

′ )<br />

− 1<br />

− 1 k ′ =<br />

k ′<br />

∏<br />

i=1<br />

m ′ + 1 − i<br />

. (4.10)<br />

i<br />

In order to compute the total number <strong>of</strong> possible plans, we have to sum up the possible<br />

plans for each k, with 1 ≤ k ≤ m:<br />

|P ′′ | =<br />

m ′<br />

∑<br />

k ′ =0<br />

( m<br />

′<br />

k ′ )<br />

with k ′ = k − 1 and m ′ = m − 1. (4.11)<br />

Finally, ∑ ( )<br />

n n<br />

k=0<br />

is known to be equal to 2<br />

k<br />

n . Hence, by changing the index k from<br />

k ′ = 0 to k = 1, we can write:<br />

|P ′′ | =<br />

m ′<br />

∑<br />

k ′ =0<br />

( m<br />

′<br />

k ′ )<br />

=<br />

m∑<br />

k=1<br />

( ) m − 1<br />

= 2 m−1 . (4.12)<br />

k − 1<br />

In conclusion, there are 2 m−1 possible plans that must be evaluated. Due to the linear complexity<br />

<strong>of</strong> O(m) for determining the costs <strong>of</strong> a plan, the cost-based plan vectorization problem<br />

exhibits an exponential best-case overall time complexity <strong>of</strong> O (2 m ) = O ( m · 2 m−1) .<br />

Hence, Lemma 4.1 holds.<br />

Lemma 4.2. The cost-based plan vectorization problem exhibits an exponential time complexity<br />

<strong>of</strong> O(2 m ) for the worst-case plan <strong>of</strong> a set <strong>of</strong> operators.<br />

9 As an alternative to Pascal’s Triangle, we could also consider the m − 1 virtual delimiters between<br />

operators. Due to the binary decision for each delimiter to be set or not, we get 2 m−1 different plans.<br />

However, we used Pascal’s Triangle in order to be able to determine the number <strong>of</strong> plans for a given<br />

number <strong>of</strong> execution buckets k.<br />

104

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!