Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
5 Multi-Flow <strong>Optimization</strong><br />
Thus, T L (m i ) ≤ ˆT L (M ′ ) ≤ lc is true due to the subsumtion <strong>of</strong> W ∗ (P ′ , k ′ ) by ∆tw because<br />
the waiting time is longer than the execution time, for the valid case <strong>of</strong> ∆tw ≥ W (P ′ , k ′ ).<br />
This is also true for arbitrary serialization concepts. Hence, Theorem 5.5 holds.<br />
<strong>Based</strong> on this formal analysis, we can state that the introduced waiting time computation<br />
approach (1) optimizes the message throughput by minimizing the total latency<br />
time <strong>of</strong> a message subsequence, and ensures the additional restrictions <strong>of</strong> (2) a maximum<br />
latency time constraint for single messages, and (3) the serialized external behavior.<br />
5.5 Experimental Evaluation<br />
In this section, we present experimental evaluation results with regard to the three evaluation<br />
aspects <strong>of</strong> (1) optimization benefits/scalability, (2) optimization overheads, and (3)<br />
latency guarantees under certain constraints. In general, the evaluation shows that:<br />
• Message throughput improvements are yielded by minimizing the total latency time<br />
<strong>of</strong> message sequences. Compared to the unoptimized case, the achieved relative<br />
improvements decrease with increasing data sizes for some plans, while they stay<br />
constant for other plans. In contrast, the relative improvement increases with increasing<br />
batch sizes.<br />
• The runtime optimization overhead for deriving partitioning schemes, rewriting<br />
plans, and computing the optimal waiting time is moderate. In addition, the overhead<br />
for horizontal partitioning <strong>of</strong> inbound message queues is, for moderate numbers<br />
<strong>of</strong> distinct items, fairly low compared to commonly used transient message queues.<br />
• Finally, the theoretical maximum latency guarantees for arbitrary distribution functions<br />
also hold under experimental investigation. This stays true under the constraint<br />
<strong>of</strong> serialized external behavior (additional serialization at the outbound side) as well.<br />
The detailed description <strong>of</strong> our experimental findings is structured as follows. First, we<br />
evaluate the end-to-end throughput improvement as well as the overheads <strong>of</strong> periodical reoptimization.<br />
Second, we present scalability results with regard to increasing data sizes as<br />
well as increasing batch sizes. Third, we analyze in detail the execution time with regard<br />
to influencing factors such as message rate, selectivities, waiting time, and batch sizes.<br />
Fourth, we present the influences on the latency time <strong>of</strong> single messages with and without<br />
serialized external behavior and with arbitrary message rate distribution functions. Fifth<br />
and finally, we discuss the runtime overhead <strong>of</strong> horizontal message queue partitioning.<br />
Experimental Setting<br />
We implemented the approach <strong>of</strong> MFO via horizontal partitioning within our Java-based<br />
WFPE (workflow process engine) and integrated it into our general cost-based optimization<br />
framework. This includes the partitioned message queue (partition tree and hash partition<br />
tree), slightly changed operators (partition-awareness) as well as the algorithms for deriving<br />
partitioning attributes (A-DPA), rewriting <strong>of</strong> plans (A-MPR) and automatic waiting<br />
time computation (A-WTC).<br />
We ran our experiments on the same platform as described in Section 3.5 and we used<br />
synthetically generated data sets. As integration flows under test, we use all asynchronous,<br />
156