Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
5.6 Summary and Discussion<br />
total execution time), while not reducing the total latency time. However, for a data<br />
size <strong>of</strong> d = 1 as an example, we achieved significant overall relative improvements <strong>of</strong> 82%<br />
(P 1 ), 72% (P 2 ), 74% (P 5 ), and 55% (P 7 ). Second, we observe that typically, the highest<br />
optimization benefits are achieved by vectorization and multi-flow optimization, where<br />
these plans (P 1 , P 2 , and P 5 ) do not have a high CPU utilization in the unoptimized case.<br />
Hence, the optimization benefits are partially overlapping. In contrast, for plans such as<br />
P 7 , where the local processing steps dominate the execution time (high CPU utilization),<br />
the standard cost-based optimiztaion techniques have higher influence. In this case, the<br />
different optimization benefits are not overlapping and hence the joint application achieves<br />
significant improvements. Furthermore, we see different scalability <strong>of</strong> the different optimization<br />
techniques with incresing data size according to the used plan. The application<br />
<strong>of</strong> all optimization techniques balances these effects such that finally, we observe good<br />
scalability with increasing data size for all plans with almost constant improvement. We<br />
can conclude that, especially, with regard to the scalability and the maximum benefit, it<br />
is advantageous to use all optimization techniques in combination.<br />
Finally, we can state that MFO achieves significant throughput improvement by accepting<br />
moderate additional latency time for single messages. Furthermore, the serialized<br />
external behavior can be guaranteed as well. Anyway, how much we benefit from MFO<br />
depends on the used plans and on the concrete workload. The benefit <strong>of</strong> MFO is caused by<br />
two main facts. First, even for one-message partitions, there is only a moderate runtime<br />
overhead (Figures 5.18(b) and 5.22). Second, only a small number <strong>of</strong> messages is required<br />
within one partition to yield a significant speedup (Figure 5.18(b)).<br />
5.6 Summary and Discussion<br />
To summarize, in this chapter, we introduced the data-flow-oriented multi-flow optimization<br />
(MFO) technique for throughput maximization <strong>of</strong> integration flows. Both MFO and<br />
the control-flow-oriented vectorization technique achieve throughput improvements. In<br />
contrast to vectorization that relies on parallelization, MFO reduces executed work by<br />
employing horizontal data partitioning <strong>of</strong> inbound message queues and executing plans<br />
for batches <strong>of</strong> messages. First, we discussed the plan execution <strong>of</strong> message partitions that<br />
includes the definition <strong>of</strong> the partition tree as a queue data structure for message partitions<br />
as well as the automatic derivation <strong>of</strong> partitioning attributes, the derivation <strong>of</strong> partitioning<br />
schemes, and the rewriting <strong>of</strong> plans. Second, we explained the required cost model extensions,<br />
the computation <strong>of</strong> the optimal waiting time with regard to message throughput<br />
improvement, and the integration into our overall cost-based optimization framework.<br />
In conclusion <strong>of</strong> our formal analysis and experimental evaluation, the multi-flow optimization<br />
technique achieves significant throughput improvement by accepting moderate<br />
additional latency time for single messages. Furthermore, we guarantee constraints <strong>of</strong><br />
maximum latency for single messages and serialized external behavior. Thus, MFO is<br />
applicable for arbitrary asynchronous, data-driven integration flows in many different application<br />
areas. Finally, it is important to note that MFO and vectorization can already<br />
be applied together in order to achieve the highest throughput due to the integration <strong>of</strong><br />
both techniques within the surrounding cost-based optimization framework.<br />
Further, MFO opens several opportunities for further optimizations. Future work might<br />
consider, for example, (1) the execution <strong>of</strong> partitions independent <strong>of</strong> their temporal order,<br />
165