Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4 Vectorizing <strong>Integration</strong> <strong>Flows</strong><br />
4.3.4 Operator-Aware <strong>Cost</strong>-<strong>Based</strong> Vectorization<br />
Although the cost-based vectorization described so far significantly improves performance,<br />
it has one drawback. When rewriting an instance-based plan into a cost-based vectorized<br />
plan, we take only the costs <strong>of</strong> originally existing operators into account. Thus, the optimization<br />
objective is to minimize the number <strong>of</strong> execution buckets with lowest possible<br />
maximum bucket execution costs. However, we neglected the overhead <strong>of</strong> additional operators<br />
such as Copy, And or Xor that are only used within vectorized plans. In conclusion, an<br />
operator-aware rewriting approach is required in order to improve the standard cost-based<br />
vectorization approach.<br />
In detail, we use explicit cost comparisons for operators that are only used for vectorized<br />
plans. With this concept, we can ensure that the performance is not hurt by costs <strong>of</strong> those<br />
additional operators. We explain this using the Copy operator as an example.<br />
Example 4.9 (Operator-Aware <strong>Cost</strong> Comparison). Assume the instance-based subplan<br />
illustrated in Figure 4.15(a) and the given operator costs.<br />
o1<br />
W(o1)=2<br />
o2<br />
W(o2)=3<br />
o3<br />
W(o3)=5<br />
o4<br />
W(o4)=4<br />
W(P) = n · 14<br />
Copy<br />
W(cp)<br />
o1 o2<br />
W(o1)=2 W(o2)=3<br />
o3<br />
W(o3)=5<br />
o4<br />
W(o4)=4<br />
W(cp) ≤ W(o3):<br />
W(P) = (n + 2) · 5<br />
W(cp) > W(o3):<br />
W(P) = (n + 2) · W(cp)<br />
(a) Instance-<strong>Based</strong> Subplan<br />
(b) <strong>Cost</strong>-<strong>Based</strong> Vectorized Subplan<br />
Figure 4.15: Example Operator Awareness<br />
The costs <strong>of</strong> processing n messages are then determined by W (P ) = n · 14 ms. In contrast,<br />
the cost-based vectorized subplan that is shown in Figure 4.15(b) uses k = 3 + 1 execution<br />
buckets and hence, it usually increases the throughput. For the exact cost analysis, we need<br />
to distinguish two cases. First, if the costs <strong>of</strong> the Copy operator W (cp) are lower than the<br />
maximum operator costs W (o 3 ), we compute the total costs by W (P ) = (n + 2) · 5 ms.<br />
Second, if W (cp) > W (o 3 ), we need to compute the costs by W (P ) = (n + 2) · W (cp).<br />
While we always benefit from vectorization in the first case, we have a break-even point<br />
for the vectorization benefit in the second case. In detail, if n → ∞ and W (cp) > 10 ms<br />
(the costs for executing this subplan in an instance-based manner), it is advantageous to<br />
execute the subplan (o 1 , o 2 , o 3 ) as a single execution bucket because the costs <strong>of</strong> the Copy<br />
operator are not amortized by the vectorization benefit.<br />
We use those explicit cost comparisons (optimality conditions) between costs <strong>of</strong> additional<br />
operators and the instance-based execution <strong>of</strong> such a subplan whenever we determine<br />
parallel pipelines. Therefore, only the subplan—from the beginning <strong>of</strong> those parallel<br />
pipelines to the temporal join at the end—is used for comparison.<br />
In conclusion, the operator-aware cost-based vectorization is used within both the exact<br />
and the heuristic computation approach. We use explicit cost comparisons in case <strong>of</strong><br />
subplans consisting <strong>of</strong> parallel pipelines because those require additional operators, which<br />
we can now take into account as well. If statistics are available, this is a binary decision<br />
for each subplan and hence, it can be efficiently computed. As a result, this approach<br />
adds awareness <strong>of</strong> specific cases, where vectorization should not be applied for a subplan.<br />
This ensures robustness <strong>of</strong> the cost-based vectorization <strong>of</strong> a single plan, in the sense that<br />
it will never be slower than the instance-based execution.<br />
112