25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.6 Summary and Discussion<br />

total execution time), while not reducing the total latency time. However, for a data<br />

size <strong>of</strong> d = 1 as an example, we achieved significant overall relative improvements <strong>of</strong> 82%<br />

(P 1 ), 72% (P 2 ), 74% (P 5 ), and 55% (P 7 ). Second, we observe that typically, the highest<br />

optimization benefits are achieved by vectorization and multi-flow optimization, where<br />

these plans (P 1 , P 2 , and P 5 ) do not have a high CPU utilization in the unoptimized case.<br />

Hence, the optimization benefits are partially overlapping. In contrast, for plans such as<br />

P 7 , where the local processing steps dominate the execution time (high CPU utilization),<br />

the standard cost-based optimiztaion techniques have higher influence. In this case, the<br />

different optimization benefits are not overlapping and hence the joint application achieves<br />

significant improvements. Furthermore, we see different scalability <strong>of</strong> the different optimization<br />

techniques with incresing data size according to the used plan. The application<br />

<strong>of</strong> all optimization techniques balances these effects such that finally, we observe good<br />

scalability with increasing data size for all plans with almost constant improvement. We<br />

can conclude that, especially, with regard to the scalability and the maximum benefit, it<br />

is advantageous to use all optimization techniques in combination.<br />

Finally, we can state that MFO achieves significant throughput improvement by accepting<br />

moderate additional latency time for single messages. Furthermore, the serialized<br />

external behavior can be guaranteed as well. Anyway, how much we benefit from MFO<br />

depends on the used plans and on the concrete workload. The benefit <strong>of</strong> MFO is caused by<br />

two main facts. First, even for one-message partitions, there is only a moderate runtime<br />

overhead (Figures 5.18(b) and 5.22). Second, only a small number <strong>of</strong> messages is required<br />

within one partition to yield a significant speedup (Figure 5.18(b)).<br />

5.6 Summary and Discussion<br />

To summarize, in this chapter, we introduced the data-flow-oriented multi-flow optimization<br />

(MFO) technique for throughput maximization <strong>of</strong> integration flows. Both MFO and<br />

the control-flow-oriented vectorization technique achieve throughput improvements. In<br />

contrast to vectorization that relies on parallelization, MFO reduces executed work by<br />

employing horizontal data partitioning <strong>of</strong> inbound message queues and executing plans<br />

for batches <strong>of</strong> messages. First, we discussed the plan execution <strong>of</strong> message partitions that<br />

includes the definition <strong>of</strong> the partition tree as a queue data structure for message partitions<br />

as well as the automatic derivation <strong>of</strong> partitioning attributes, the derivation <strong>of</strong> partitioning<br />

schemes, and the rewriting <strong>of</strong> plans. Second, we explained the required cost model extensions,<br />

the computation <strong>of</strong> the optimal waiting time with regard to message throughput<br />

improvement, and the integration into our overall cost-based optimization framework.<br />

In conclusion <strong>of</strong> our formal analysis and experimental evaluation, the multi-flow optimization<br />

technique achieves significant throughput improvement by accepting moderate<br />

additional latency time for single messages. Furthermore, we guarantee constraints <strong>of</strong><br />

maximum latency for single messages and serialized external behavior. Thus, MFO is<br />

applicable for arbitrary asynchronous, data-driven integration flows in many different application<br />

areas. Finally, it is important to note that MFO and vectorization can already<br />

be applied together in order to achieve the highest throughput due to the integration <strong>of</strong><br />

both techniques within the surrounding cost-based optimization framework.<br />

Further, MFO opens several opportunities for further optimizations. Future work might<br />

consider, for example, (1) the execution <strong>of</strong> partitions independent <strong>of</strong> their temporal order,<br />

165

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!