25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.5 Experimental Evaluation<br />

data-driven integration flow use cases (plans P 1 , P 2 , P 5 , and P 7 ), which have been described<br />

in Section 2.4. Furthermore, we used the following scale factors: the number <strong>of</strong><br />

messages |M|, the message rate R, the selectivity according to the partitioning attribute<br />

sel, the batch size k ′ , the message rate distribution function D, the maximum latency<br />

constraint lc, and the data size d <strong>of</strong> input messages (in 100 kB).<br />

End-to-End Comparison and <strong>Optimization</strong> Benefits<br />

First <strong>of</strong> all, we investigate the end-to-end optimization benefit achieved by multi-flow optimization<br />

and the related optimization overhead. We compared the multi-flow optimization<br />

with no-optimization, while all other optimization techniques have been disabled. Similar<br />

to the use case comparison in Section 3.5 and 4.6, we executed 20,000 plan instances for<br />

each asynchronous, data-driven example plan (P 1 , P 2 , P 5 , and P 7 ) and for both execution<br />

models. We reused the same workload configuration as already presented (without correlations<br />

and without workload changes). Furthermore, we fixed the cardinality <strong>of</strong> input<br />

data sets to d = 1 (100 kB messages), an optimization interval <strong>of</strong> ∆t = 5 min, a sliding<br />

window size <strong>of</strong> ∆w = 5 min and EMA as the workload aggregation method. With regard to<br />

multi-flow optimization, we did not use the computed waiting time but directly restricted<br />

the batch size to k ′ = 10 in order to achieve comparable results across the different plans.<br />

(a) Cumulative Execution Time<br />

(b) Cumulative <strong>Optimization</strong> Time<br />

Figure 5.15: Use Case Comparison <strong>of</strong> Multi-Flow <strong>Optimization</strong><br />

Figure 5.15(a) shows the resulting total execution times. To summarize, we consistently<br />

observe significant execution time reductions that have been achieved as follows:<br />

• P 1 : The plan P 1 benefits from MFO in several ways. First, the Switch operator<br />

o 2 is executed once for a message batch because the switch expression attribute<br />

/material/type is used as the only partitioning attribute. Furthermore, the Assign<br />

operators o 4 , o 6 , and o 8 are also executed only once because the result is exclusively<br />

used by the partition-aware Invoke operators o 7 , and o 9 . These writing Invoke operators<br />

show additional benefit because a single operator instance is used to process<br />

all messages <strong>of</strong> a batch. Overall, this achieves a throughput improvement <strong>of</strong> 62%.<br />

• P 2 : The plan P 2 mainly benefits from executing the Invoke operator o 3 and the<br />

predecessor Assign operator o 2 only once for a whole partition. There, the predicate<br />

part /resultsets/resultset/row/A1 Custkey is used as the partitioning attribute.<br />

Additional benefit is achieved by the final Assign and Invoke operators o 5 and o 6 .<br />

In total, an improvement <strong>of</strong> 53% has been achieved.<br />

157

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!