25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

5 Multi-Flow <strong>Optimization</strong><br />

Thus, T L (m i ) ≤ ˆT L (M ′ ) ≤ lc is true due to the subsumtion <strong>of</strong> W ∗ (P ′ , k ′ ) by ∆tw because<br />

the waiting time is longer than the execution time, for the valid case <strong>of</strong> ∆tw ≥ W (P ′ , k ′ ).<br />

This is also true for arbitrary serialization concepts. Hence, Theorem 5.5 holds.<br />

<strong>Based</strong> on this formal analysis, we can state that the introduced waiting time computation<br />

approach (1) optimizes the message throughput by minimizing the total latency<br />

time <strong>of</strong> a message subsequence, and ensures the additional restrictions <strong>of</strong> (2) a maximum<br />

latency time constraint for single messages, and (3) the serialized external behavior.<br />

5.5 Experimental Evaluation<br />

In this section, we present experimental evaluation results with regard to the three evaluation<br />

aspects <strong>of</strong> (1) optimization benefits/scalability, (2) optimization overheads, and (3)<br />

latency guarantees under certain constraints. In general, the evaluation shows that:<br />

• Message throughput improvements are yielded by minimizing the total latency time<br />

<strong>of</strong> message sequences. Compared to the unoptimized case, the achieved relative<br />

improvements decrease with increasing data sizes for some plans, while they stay<br />

constant for other plans. In contrast, the relative improvement increases with increasing<br />

batch sizes.<br />

• The runtime optimization overhead for deriving partitioning schemes, rewriting<br />

plans, and computing the optimal waiting time is moderate. In addition, the overhead<br />

for horizontal partitioning <strong>of</strong> inbound message queues is, for moderate numbers<br />

<strong>of</strong> distinct items, fairly low compared to commonly used transient message queues.<br />

• Finally, the theoretical maximum latency guarantees for arbitrary distribution functions<br />

also hold under experimental investigation. This stays true under the constraint<br />

<strong>of</strong> serialized external behavior (additional serialization at the outbound side) as well.<br />

The detailed description <strong>of</strong> our experimental findings is structured as follows. First, we<br />

evaluate the end-to-end throughput improvement as well as the overheads <strong>of</strong> periodical reoptimization.<br />

Second, we present scalability results with regard to increasing data sizes as<br />

well as increasing batch sizes. Third, we analyze in detail the execution time with regard<br />

to influencing factors such as message rate, selectivities, waiting time, and batch sizes.<br />

Fourth, we present the influences on the latency time <strong>of</strong> single messages with and without<br />

serialized external behavior and with arbitrary message rate distribution functions. Fifth<br />

and finally, we discuss the runtime overhead <strong>of</strong> horizontal message queue partitioning.<br />

Experimental Setting<br />

We implemented the approach <strong>of</strong> MFO via horizontal partitioning within our Java-based<br />

WFPE (workflow process engine) and integrated it into our general cost-based optimization<br />

framework. This includes the partitioned message queue (partition tree and hash partition<br />

tree), slightly changed operators (partition-awareness) as well as the algorithms for deriving<br />

partitioning attributes (A-DPA), rewriting <strong>of</strong> plans (A-MPR) and automatic waiting<br />

time computation (A-WTC).<br />

We ran our experiments on the same platform as described in Section 3.5 and we used<br />

synthetically generated data sets. As integration flows under test, we use all asynchronous,<br />

156

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!