Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
5 Multi-Flow <strong>Optimization</strong><br />
• P 5 : The plan P 5 shows the lowest benefit. There, the Switch operator o 5 is executed<br />
only once for a batch because the switch expression attribute /resultsets/<br />
resultset/row/A1 Orderdate is used as partitioning attribute. Additional benefit<br />
comes from the Assign and Invoke operators o 8 and o 9 . However, we achieved only<br />
an improvement <strong>of</strong> 25% because the Selection operators in front <strong>of</strong> the operators<br />
that benefit from partitioning consume most <strong>of</strong> the time and significantly reduces<br />
the cardinality <strong>of</strong> intermediate results.<br />
• P 7 : In contrast to the other plans, plan P 7 does not contain any partitioning attribute<br />
candidate. Therefore, a system partitioning with sel = 1.0 is used (this<br />
case is similar to the time-base batch creation strategy but without the drawback<br />
<strong>of</strong> distinct messages in a batch). Many operators benefit from partitioning. First,<br />
the queries to external systems are prepared only once (Assign operators o 3 , o 6 , o 9 ,<br />
and o 11 ). Second, also the external queries and the subsequent schema mapping is<br />
only executed once (Invoke operators o 4 , o 7 , o 10 , and o 12 as well as Translation<br />
operators o 5 , o 8 , o 13 ). Third, additional benefit is achieved by the final Assign and<br />
Invoke operators o 18 and o 19 . In total, this led to an improvement <strong>of</strong> 30%.<br />
Figure 5.15(b) illustrates the optimization overhead imposed by the cost-based multiflow<br />
optimization. This includes the derivation <strong>of</strong> partitioning attributes, the creation <strong>of</strong><br />
a partitioning scheme, the plan rewriting, as well as the continuous waiting time computation.<br />
Essentially, we observe that the overhead is moderate, where the differences are<br />
mainly reasoned by the different numbers <strong>of</strong> operators.<br />
Finally, based on the observed results, we can conclude that the multi-flow optimization<br />
technique can be used by default (if small additional latency for single messages is acceptable)<br />
because the throughput improvements clearly amortize the optimization overhead.<br />
This is true for arbitrary asynchronous, data-driven integration flows because each flow<br />
has at least one combination <strong>of</strong> writing Assign and Invoke operators.<br />
Scalability<br />
We now investigate the scalability <strong>of</strong> plan execution, which includes (1) the scalability<br />
with increasing input data sizes and (2) the scalability with increasing batch sizes.<br />
First, we used our example plans in order to investigate the scalability <strong>of</strong> optimization<br />
benefits with increasing input data size. We reused the scalability experiment with increasing<br />
data size from Section 3.5. In contrast to the already presented scalability results, we<br />
now disabled all optimization techniques except multi-flow optimization. In detail, we executed<br />
20,000 plan instances for the plans P 1 , P 2 , P 5 , and P 7 and compared the optimized<br />
plans with their unoptimized counterparts varying the input data size d ∈ {1, 2, 3, 4, 5, 6, 7}<br />
(in 100 kB). Again, we varied the input data size <strong>of</strong> these plans (the size <strong>of</strong> the received<br />
message) only but did not change the size <strong>of</strong> externally loaded data. Further, we fixed<br />
a batch size <strong>of</strong> k ′ = 10, an optimization interval <strong>of</strong> ∆t = 5 min, a sliding window size<br />
<strong>of</strong> ∆w = 5 min and EMA as the workload aggregation method. The results are shown in<br />
Figure 5.16. In general, the plans scale with increasing data size but with a decreasing<br />
relative improvement. With regard to the different plans, we observe different scalability<br />
behavior. The plan P 1 scales best with increasing data size and shows almost constant<br />
relative improvement because this plan mainly benefits from reduced costs for writing<br />
interactions that linearly depend on the data size. Further, also plan P 2 shows good scalability<br />
with increasing data size. However, the relative improvement is decreasing because<br />
158