25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5 Multi-Flow <strong>Optimization</strong><br />

• P 5 : The plan P 5 shows the lowest benefit. There, the Switch operator o 5 is executed<br />

only once for a batch because the switch expression attribute /resultsets/<br />

resultset/row/A1 Orderdate is used as partitioning attribute. Additional benefit<br />

comes from the Assign and Invoke operators o 8 and o 9 . However, we achieved only<br />

an improvement <strong>of</strong> 25% because the Selection operators in front <strong>of</strong> the operators<br />

that benefit from partitioning consume most <strong>of</strong> the time and significantly reduces<br />

the cardinality <strong>of</strong> intermediate results.<br />

• P 7 : In contrast to the other plans, plan P 7 does not contain any partitioning attribute<br />

candidate. Therefore, a system partitioning with sel = 1.0 is used (this<br />

case is similar to the time-base batch creation strategy but without the drawback<br />

<strong>of</strong> distinct messages in a batch). Many operators benefit from partitioning. First,<br />

the queries to external systems are prepared only once (Assign operators o 3 , o 6 , o 9 ,<br />

and o 11 ). Second, also the external queries and the subsequent schema mapping is<br />

only executed once (Invoke operators o 4 , o 7 , o 10 , and o 12 as well as Translation<br />

operators o 5 , o 8 , o 13 ). Third, additional benefit is achieved by the final Assign and<br />

Invoke operators o 18 and o 19 . In total, this led to an improvement <strong>of</strong> 30%.<br />

Figure 5.15(b) illustrates the optimization overhead imposed by the cost-based multiflow<br />

optimization. This includes the derivation <strong>of</strong> partitioning attributes, the creation <strong>of</strong><br />

a partitioning scheme, the plan rewriting, as well as the continuous waiting time computation.<br />

Essentially, we observe that the overhead is moderate, where the differences are<br />

mainly reasoned by the different numbers <strong>of</strong> operators.<br />

Finally, based on the observed results, we can conclude that the multi-flow optimization<br />

technique can be used by default (if small additional latency for single messages is acceptable)<br />

because the throughput improvements clearly amortize the optimization overhead.<br />

This is true for arbitrary asynchronous, data-driven integration flows because each flow<br />

has at least one combination <strong>of</strong> writing Assign and Invoke operators.<br />

Scalability<br />

We now investigate the scalability <strong>of</strong> plan execution, which includes (1) the scalability<br />

with increasing input data sizes and (2) the scalability with increasing batch sizes.<br />

First, we used our example plans in order to investigate the scalability <strong>of</strong> optimization<br />

benefits with increasing input data size. We reused the scalability experiment with increasing<br />

data size from Section 3.5. In contrast to the already presented scalability results, we<br />

now disabled all optimization techniques except multi-flow optimization. In detail, we executed<br />

20,000 plan instances for the plans P 1 , P 2 , P 5 , and P 7 and compared the optimized<br />

plans with their unoptimized counterparts varying the input data size d ∈ {1, 2, 3, 4, 5, 6, 7}<br />

(in 100 kB). Again, we varied the input data size <strong>of</strong> these plans (the size <strong>of</strong> the received<br />

message) only but did not change the size <strong>of</strong> externally loaded data. Further, we fixed<br />

a batch size <strong>of</strong> k ′ = 10, an optimization interval <strong>of</strong> ∆t = 5 min, a sliding window size<br />

<strong>of</strong> ∆w = 5 min and EMA as the workload aggregation method. The results are shown in<br />

Figure 5.16. In general, the plans scale with increasing data size but with a decreasing<br />

relative improvement. With regard to the different plans, we observe different scalability<br />

behavior. The plan P 1 scales best with increasing data size and shows almost constant<br />

relative improvement because this plan mainly benefits from reduced costs for writing<br />

interactions that linearly depend on the data size. Further, also plan P 2 shows good scalability<br />

with increasing data size. However, the relative improvement is decreasing because<br />

158

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!