25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5 Multi-Flow <strong>Optimization</strong><br />

Similar to the vectorization <strong>of</strong> integration flows, in this chapter, we introduce the multiflow<br />

optimization [BHL10, BHL11] as a data-flow-oriented optimization technique that is<br />

tailor-made for integration flows. This technique tackles the problem <strong>of</strong> expensive external<br />

system access as well as it exploits the optimization potential that equivalent work (e.g.,<br />

same queries to external systems) is done multiple times. The core idea is to horizontally<br />

partition inbound message queues and to execute plan instances for message batches rather<br />

than for individual messages. Therefore, this technique is applicable for asynchronous<br />

data-driven integration flows, where message queues are used at the inbound side <strong>of</strong> the<br />

integration platform. As a result, the message throughput is increased by reducing the<br />

amount <strong>of</strong> work (external system access and local processing steps) done by the integration<br />

platform. We call this technique multi-flow optimization because sequences <strong>of</strong> messages<br />

that would initiate multiple plan instances are processed together.<br />

In order to enable multi-flow optimization, in Section 5.1, we introduce the batch creation<br />

via horizontal message queue partitioning. Essentially, two major challenges arise<br />

in the context <strong>of</strong> multi-flow optimization. In Section 5.2, we discuss the challenge <strong>of</strong> plan<br />

execution on batches <strong>of</strong> messages. Furthermore, in Section 5.3, we describe how this optimization<br />

technique is embedded within the periodical re-optimization framework and<br />

we address the challenge <strong>of</strong> computing the optimal waiting time with regard to message<br />

throughput maximization. In addition, we provide formal analysis results such as optimality<br />

and latency guarantees in Section 5.4. Finally, the experimental evaluation, which<br />

is presented in Section 5.5, shows that significant performance improvements in the sense<br />

<strong>of</strong> an increased message throughput are achieved by multi-flow optimization.<br />

5.1 Motivation and Problem Description<br />

In the context <strong>of</strong> integration platforms, especially in scenarios with huge numbers <strong>of</strong> plan<br />

instances, the major optimization objective is throughput maximization [LZL07] rather<br />

than the execution time minimization <strong>of</strong> single plan instances. The goal is (1) to maximize<br />

the number <strong>of</strong> messages processed per time period, or synonymously in our context, (2) to<br />

minimize the total execution time <strong>of</strong> a sequence <strong>of</strong> plan instances. Here, depending on the<br />

application area, moderate latency times <strong>of</strong> single messages, in the orders <strong>of</strong> seconds to<br />

minutes, are acceptable [UGA + 09]. When addressing this general optimization objective,<br />

the following concrete problems have to be considered:<br />

Problem 5.1 (Expensive External System Access). External system access can be really<br />

time-consuming caused by network latency (minimal roundtrip time), external query processing,<br />

network traffic, and message transformations from external formats into internal<br />

structures. Depending on the involved external systems and on the present infrastructure,<br />

the fraction <strong>of</strong> these influences with regard to the required total access time may vary significantly.<br />

However, in particular when accessing custom applications and services, data<br />

129

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!