Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4 Vectorizing <strong>Integration</strong> <strong>Flows</strong><br />
order to prove the theorem, we need to prove the two single claims <strong>of</strong> W (P ′′ ) ≤ W (P )<br />
and W (P ′′ ) ≤ W (P ′ ).<br />
For the pro<strong>of</strong> <strong>of</strong> W (P ′′ ) ≤ W (P ), assume the worst case, where ∀o i : R o (o i ) = 1. If we<br />
vectorize this to P ′′ , we need to compute the costs by W (b ′′<br />
i ) = (R o(b ′′<br />
i ))/(R e(b ′′<br />
i )) · W (o i)<br />
with R e (b ′′<br />
i ) = 1/|b|. Due to the vectorized execution, W (P ′′ ) = max m i=1 W (b′′ i ), while<br />
W (P ) = ∑ m<br />
i=1 W (o i). Hence, we can write W (P ′′ ) = W (P ) if the condition ∀o i : R o (o i ) =<br />
1 holds. This is the worst case. For each R o (o i ) < 1, we get W (P ′′ ) < W (P ).<br />
In order to prove W (P ′′ ) ≤ W (P ′ ), we fix λ = 0. If we merge two buckets b i and b i+1 , we<br />
see that R e (b ′′<br />
i ) is increased from 1/|b| to 1/(|b|−1). Thus, we re-compute the costs W (b′′ i )<br />
as mentioned before. In the worst case, W (b ′′<br />
i ) = W (b′ i ), which is true iff R e(b ′ i ) = R o(b ′ i )<br />
because then we also have R e (b ′′<br />
i ) = R e(b ′ i ). Due to W (P ′′ ) = max m i=1 W (b′′ i ), we can state<br />
W (P ′′ ) ≤ W (P ). Hence, Theorem 4.4 holds.<br />
In conclusion, we cannot guarantee that the result <strong>of</strong> the A-CPV is the global optimum<br />
because we cannot efficiently evaluate the effective resource consumption. However, we can<br />
guarantee that each merging <strong>of</strong> execution buckets when solving the P-CPV with λ = 0<br />
(where the costs <strong>of</strong> each bucket are lower than or equal to the highest operator costs)<br />
improves the performance <strong>of</strong> the plan P .<br />
4.3.3 <strong>Cost</strong>-<strong>Based</strong> Vectorization with Restricted Number <strong>of</strong> Buckets<br />
Due to dynamically changing workload characteristics, we recommend using the cost-based<br />
vectorization approach. However, there might exist scenarios where an explicit restriction<br />
<strong>of</strong> k and thus, <strong>of</strong> the number <strong>of</strong> threads, is advantageous. Hence, in this subsection, we<br />
discuss the necessary changes <strong>of</strong> the exhaustive and heuristic computation approaches<br />
when using this constraint.<br />
Exhaustive Computation Approach<br />
With regard to the exhaustive cost-based computation approach (see Subsection 4.3.2),<br />
only minor changes are required when restricting k. Due to the restricted number <strong>of</strong><br />
execution buckets, k = |b|, the search space is smaller than for the previously described<br />
P-CPV. As already stated, for an operator sequence (best case), there are<br />
|P ′′ | k =<br />
k−1<br />
∏<br />
i=1<br />
m − i<br />
i<br />
different possibilities, while for sets <strong>of</strong> operators, there are<br />
|P ′′ | k = 1 k!<br />
j=0<br />
(4.16)<br />
k∑<br />
( )<br />
(−1) k−j k<br />
j m (4.17)<br />
j<br />
possibilities to distribute the m operators <strong>of</strong> plan P across k buckets. Hence, the enumeration<br />
<strong>of</strong> candidate distribution schemes can be reused by simply invoking the recursive<br />
Algorithm 4.2 only once for the given k. In addition, we change the optimality condition<br />
for evaluating those candidates to<br />
⎛ ⎛ ⎞⎞<br />
l bi<br />
φ = min ⎝ max<br />
|b|=k ∑<br />
⎝ W (o j ) ⎠⎠ , (4.18)<br />
i=1<br />
j=1<br />
110