Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4.6 Experimental Evaluation<br />
Figure 4.23: Vectorization Deployment Overhead<br />
results, where we varied the number <strong>of</strong> operators m because all other scale factors do not<br />
influence the deployment and maintenance costs. In general, there is a high performance<br />
improvement using vectorization with a factor <strong>of</strong> up to seven. It is caused by the different<br />
deployment approaches. The WFPE uses a compilation approach, where Java classes are<br />
generated from the integration flow specification. In contrast to this, the VWFPE as well as<br />
the CBVWFPE uses interpretation approaches, where plans are built dynamically with the<br />
A-PV. The VWFPE always outperforms CBVWFPE because both use the A-PV but CBVWFPE<br />
additionally uses the A-CPV in order to find the optimal number <strong>of</strong> execution buckets<br />
k. Note that the additional costs for the A-CPV (that cause a break-even point with<br />
the standard WFPE) occur periodically during runtime. Here, we excluded the costs for<br />
flushing the pipelines because it depends mainly on the maximum constraint <strong>of</strong> the queues<br />
and on the costs <strong>of</strong> the most time-consuming operator. In conclusion, the vectorization<br />
<strong>of</strong> integration flows shows better runtime as well as <strong>of</strong>ten better deployment time performance<br />
with regard to plan generation. Even in the case where a deployment overhead<br />
exists, it is negligible compared to the runtime improvement we gain by vectorization.<br />
In conclusion, the deployment and maintenance overhead is moderate compared to the<br />
yielded performance improvement. Recall the evaluation results from Figure 4.21. It is<br />
important to note that the presented performance <strong>of</strong> the cost-based vectorized execution<br />
model already includes the costs for periodical re-optimization and statistics maintenance.<br />
Parameters <strong>of</strong> Periodic <strong>Optimization</strong><br />
The resulting performance improvement <strong>of</strong> vectorization in the presence <strong>of</strong> changing workload<br />
characteristics depends on the periodic re-optimization. This re-optimization can be<br />
influenced by several parameters including the workload aggregation method, the sliding<br />
time window size ∆w, the optimization period ∆t, and the maximum cost increase λ. In<br />
this subsection, we evaluate the influence <strong>of</strong> λ with regard to the cost-based vectorization,<br />
while the other parameters have already been evaluated in Chapter 3. Therefore, we conducted<br />
an experiment, where we measured how increasing maximum costs influence the<br />
number <strong>of</strong> execution buckets and thus, indirectly influence the elapsed time as well.<br />
In a first sub-experiment, we fixed d = 1, t = 0, q = 50 and executed n = 250<br />
messages with different λ and for different plans (with different numbers <strong>of</strong> operators m).<br />
Figure 4.24(a) shows the influence <strong>of</strong> λ on the number <strong>of</strong> execution buckets k as well as on<br />
the execution time. It is obvious that the number <strong>of</strong> execution buckets (annotated at the<br />
top <strong>of</strong> each point) decreases with increasing λ because for each bucket, the sum <strong>of</strong> operator<br />
costs must not exceed max + λ and hence, more operators can be executed by a single<br />
bucket. Clearly, when increasing λ, the number <strong>of</strong> execution buckets cannot increase. In<br />
125