25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4 Vectorizing <strong>Integration</strong> <strong>Flows</strong><br />

4.3.4 Operator-Aware <strong>Cost</strong>-<strong>Based</strong> Vectorization<br />

Although the cost-based vectorization described so far significantly improves performance,<br />

it has one drawback. When rewriting an instance-based plan into a cost-based vectorized<br />

plan, we take only the costs <strong>of</strong> originally existing operators into account. Thus, the optimization<br />

objective is to minimize the number <strong>of</strong> execution buckets with lowest possible<br />

maximum bucket execution costs. However, we neglected the overhead <strong>of</strong> additional operators<br />

such as Copy, And or Xor that are only used within vectorized plans. In conclusion, an<br />

operator-aware rewriting approach is required in order to improve the standard cost-based<br />

vectorization approach.<br />

In detail, we use explicit cost comparisons for operators that are only used for vectorized<br />

plans. With this concept, we can ensure that the performance is not hurt by costs <strong>of</strong> those<br />

additional operators. We explain this using the Copy operator as an example.<br />

Example 4.9 (Operator-Aware <strong>Cost</strong> Comparison). Assume the instance-based subplan<br />

illustrated in Figure 4.15(a) and the given operator costs.<br />

o1<br />

W(o1)=2<br />

o2<br />

W(o2)=3<br />

o3<br />

W(o3)=5<br />

o4<br />

W(o4)=4<br />

W(P) = n · 14<br />

Copy<br />

W(cp)<br />

o1 o2<br />

W(o1)=2 W(o2)=3<br />

o3<br />

W(o3)=5<br />

o4<br />

W(o4)=4<br />

W(cp) ≤ W(o3):<br />

W(P) = (n + 2) · 5<br />

W(cp) > W(o3):<br />

W(P) = (n + 2) · W(cp)<br />

(a) Instance-<strong>Based</strong> Subplan<br />

(b) <strong>Cost</strong>-<strong>Based</strong> Vectorized Subplan<br />

Figure 4.15: Example Operator Awareness<br />

The costs <strong>of</strong> processing n messages are then determined by W (P ) = n · 14 ms. In contrast,<br />

the cost-based vectorized subplan that is shown in Figure 4.15(b) uses k = 3 + 1 execution<br />

buckets and hence, it usually increases the throughput. For the exact cost analysis, we need<br />

to distinguish two cases. First, if the costs <strong>of</strong> the Copy operator W (cp) are lower than the<br />

maximum operator costs W (o 3 ), we compute the total costs by W (P ) = (n + 2) · 5 ms.<br />

Second, if W (cp) > W (o 3 ), we need to compute the costs by W (P ) = (n + 2) · W (cp).<br />

While we always benefit from vectorization in the first case, we have a break-even point<br />

for the vectorization benefit in the second case. In detail, if n → ∞ and W (cp) > 10 ms<br />

(the costs for executing this subplan in an instance-based manner), it is advantageous to<br />

execute the subplan (o 1 , o 2 , o 3 ) as a single execution bucket because the costs <strong>of</strong> the Copy<br />

operator are not amortized by the vectorization benefit.<br />

We use those explicit cost comparisons (optimality conditions) between costs <strong>of</strong> additional<br />

operators and the instance-based execution <strong>of</strong> such a subplan whenever we determine<br />

parallel pipelines. Therefore, only the subplan—from the beginning <strong>of</strong> those parallel<br />

pipelines to the temporal join at the end—is used for comparison.<br />

In conclusion, the operator-aware cost-based vectorization is used within both the exact<br />

and the heuristic computation approach. We use explicit cost comparisons in case <strong>of</strong><br />

subplans consisting <strong>of</strong> parallel pipelines because those require additional operators, which<br />

we can now take into account as well. If statistics are available, this is a binary decision<br />

for each subplan and hence, it can be efficiently computed. As a result, this approach<br />

adds awareness <strong>of</strong> specific cases, where vectorization should not be applied for a subplan.<br />

This ensures robustness <strong>of</strong> the cost-based vectorization <strong>of</strong> a single plan, in the sense that<br />

it will never be slower than the instance-based execution.<br />

112

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!