Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4 Vectorizing <strong>Integration</strong> <strong>Flows</strong><br />
In the following, we formally analyze the time complexity for exhaustively solving this<br />
problem. For this purpose, we analyze the complexity <strong>of</strong> the best and worst case and<br />
combine this to the general result.<br />
Lemma 4.1. The cost-based plan vectorization problem exhibits an exponential time complexity<br />
<strong>of</strong> O(2 m ) for the best-case plan <strong>of</strong> an operator sequence.<br />
Pro<strong>of</strong>. The distribution function D <strong>of</strong> the number <strong>of</strong> possible plans over k is a symmetric<br />
function according to Pascal’s Triangle 9 , where the condition l bi = l bk−i+1 with i ≤ m/2<br />
holds. <strong>Based</strong> on Definition 2.1, a plan contains m operators. Due to Definition 4.2, we<br />
search for k execution buckets b i with l bi ≥ 1 ∧ l bi ≤ m and ∑ |b|<br />
i=1 l b i<br />
= m. Hence, different<br />
numbers <strong>of</strong> buckets k ∈ [1, m] have to be evaluated. From now on, we fix m ′ as m ′ = m−1<br />
and k ′ as k ′ = k − 1. In fact, there is only one possible plan for k = 1 (all operators in<br />
one bucket) and k = m (each operator in a different bucket), respectively:<br />
( ) m<br />
′<br />
|P | k ′ =0 =<br />
0<br />
= 1 and |P ′ | k ′ =m ′ = ( m<br />
′<br />
m ′ )<br />
= 1 . (4.9)<br />
Now, without loss <strong>of</strong> generality, we fix a specific m. The number <strong>of</strong> possible plans for a<br />
given k is then computed with<br />
|P ′′ | k =<br />
( m<br />
′<br />
k ′ )<br />
=<br />
( m ′ ) (<br />
− 1 m<br />
k ′ +<br />
′ )<br />
− 1<br />
− 1 k ′ =<br />
k ′<br />
∏<br />
i=1<br />
m ′ + 1 − i<br />
. (4.10)<br />
i<br />
In order to compute the total number <strong>of</strong> possible plans, we have to sum up the possible<br />
plans for each k, with 1 ≤ k ≤ m:<br />
|P ′′ | =<br />
m ′<br />
∑<br />
k ′ =0<br />
( m<br />
′<br />
k ′ )<br />
with k ′ = k − 1 and m ′ = m − 1. (4.11)<br />
Finally, ∑ ( )<br />
n n<br />
k=0<br />
is known to be equal to 2<br />
k<br />
n . Hence, by changing the index k from<br />
k ′ = 0 to k = 1, we can write:<br />
|P ′′ | =<br />
m ′<br />
∑<br />
k ′ =0<br />
( m<br />
′<br />
k ′ )<br />
=<br />
m∑<br />
k=1<br />
( ) m − 1<br />
= 2 m−1 . (4.12)<br />
k − 1<br />
In conclusion, there are 2 m−1 possible plans that must be evaluated. Due to the linear complexity<br />
<strong>of</strong> O(m) for determining the costs <strong>of</strong> a plan, the cost-based plan vectorization problem<br />
exhibits an exponential best-case overall time complexity <strong>of</strong> O (2 m ) = O ( m · 2 m−1) .<br />
Hence, Lemma 4.1 holds.<br />
Lemma 4.2. The cost-based plan vectorization problem exhibits an exponential time complexity<br />
<strong>of</strong> O(2 m ) for the worst-case plan <strong>of</strong> a set <strong>of</strong> operators.<br />
9 As an alternative to Pascal’s Triangle, we could also consider the m − 1 virtual delimiters between<br />
operators. Due to the binary decision for each delimiter to be set or not, we get 2 m−1 different plans.<br />
However, we used Pascal’s Triangle in order to be able to determine the number <strong>of</strong> plans for a given<br />
number <strong>of</strong> execution buckets k.<br />
104