Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4.3 <strong>Cost</strong>-<strong>Based</strong> Vectorization<br />
Pro<strong>of</strong>. The problem <strong>of</strong> finding all possible plans <strong>of</strong> m operators that are not connected by<br />
any data dependencies is reducible to the known problem <strong>of</strong> finding all possible partitions<br />
<strong>of</strong> a set with m members, where the Bell’s numbers B m [Bel34a, Bel34b] represent the<br />
total number <strong>of</strong> partitions. Note that similar to this known problem, we exclude the plan<br />
with zero operators (B 0 = B 1 = 1). Thus, the number <strong>of</strong> possible plans can be recursively<br />
computed by<br />
|P ′′ | = B m =<br />
m−1<br />
∑<br />
j=0<br />
( m − 1<br />
j<br />
)<br />
· B j . (4.13)<br />
Furthermore, each Bell number is the sum <strong>of</strong> Stirling numbers <strong>of</strong> the second kind [Jr.68].<br />
As a result, we are able to determine the number <strong>of</strong> plans |P ′′ | k for a given k by<br />
m∑<br />
B m = S(m, k) with S(m, k) = 1 k∑<br />
( )<br />
(−1) k−j k<br />
j m<br />
k!<br />
j<br />
k=0<br />
j=0<br />
|P ′′ | k = 1 k∑<br />
( ) (4.14)<br />
(−1) k−j k<br />
j m<br />
k!<br />
j<br />
j=0<br />
In addition, many asymptotic limits for Bell numbers are known [Lov93]. However, in<br />
general, we can state that the Bell numbers grow in O(2 cm ), where c is a constant factor.<br />
Due to the linear complexity <strong>of</strong> O(m) for determining the costs <strong>of</strong> a plan, the cost-based<br />
plan vectorization problem exhibits an exponential worst-case overall time complexity <strong>of</strong><br />
O (2 m ) = O (m · 2 cm ). Hence, Lemma 4.2 holds.<br />
Now, we can combine the results for the best and the worst case to the general result.<br />
Theorem 4.2. The cost-based plan vectorization problem exhibits an exponential time<br />
complexity <strong>of</strong> O(2 m ).<br />
Pro<strong>of</strong>. The cost-based plan vectorization problem exhibits an exponential time complexity<br />
<strong>of</strong> O(2 m ) for both the best-case plan (Lemma 4.1) and the worst-case plan (Lemma 4.2).<br />
Hence, Theorem 4.2 holds.<br />
4.3.2 Computation Approach<br />
So far, we have analyzed the search space <strong>of</strong> the P-CPV. We now explain how the optimal<br />
plan is computed with regard to the current execution statistics. In detail, we present<br />
an exhaustive computation approach (thus, with exponential time complexity) as well<br />
as a heuristic with linear time complexity that is used within our general cost-based<br />
optimization framework. For simplicity <strong>of</strong> presentation, we use the sequence <strong>of</strong> operators.<br />
However, the general version <strong>of</strong> the exhaustive and heuristic computation approaches<br />
use recursive algorithms that contain many specific cases for arbitrary combinations <strong>of</strong><br />
subplans (sequences and sets) as well as cases for complex operators.<br />
Exhaustive Computation Approach<br />
The exhaustive computation approach has the following three steps:<br />
1. Scheme Enumeration: Enumerate all 2 m−1 possible plan distribution schemes for<br />
the sequence <strong>of</strong> operators.<br />
105