25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

7 Conclusions<br />

I can’t change the direction <strong>of</strong> the wind,<br />

but I can adjust my sails to always reach my destination.<br />

— Jimmy Dean<br />

From the reactive perspective <strong>of</strong> an integration platform, we cannot change the workload<br />

characteristics (the direction <strong>of</strong> the wind) in the sense <strong>of</strong> incoming messages that initiate<br />

plan instances <strong>of</strong> integration flows. However, we can incrementally maintain execution<br />

statistics and use these for the cost-based optimization <strong>of</strong> integration flows (the adjustment<br />

<strong>of</strong> sails) in order to improve their execution time, latency and thus, the message throughput<br />

(in order to reach the destination). This allows the workload-based adjustment <strong>of</strong> the<br />

current plan and thus, for continuous adaptation to changing workload characteristics.<br />

<strong>Based</strong> on emerging requirements <strong>of</strong> complex integration tasks that (1) stretch beyond<br />

simple read-only applications, (2) involve many types <strong>of</strong> heterogeneous systems and applications,<br />

and (3) require fairly complex procedural aspects, typically, imperative integration<br />

flows are used for specification and execution <strong>of</strong> these tasks. In this context, we observe<br />

that many independent instances <strong>of</strong> such integration flows with rather small amounts <strong>of</strong><br />

data per instance are executed over time in order to achieve (1) high consistency between<br />

data <strong>of</strong> operational systems or (2) high up-to-dateness <strong>of</strong> analytical query results in data<br />

warehouse infrastructures. In addition to this high load <strong>of</strong> flow instances, the performance<br />

<strong>of</strong> source systems depends on the execution time and availability <strong>of</strong> synchronous<br />

data-driven integration flows. For these reasons, there are high performance demands on<br />

integration platforms that execute imperative integration flows.<br />

To tackle this problem <strong>of</strong> high performance requirements on the execution <strong>of</strong> integration<br />

flows, we introduced the cost-based optimization <strong>of</strong> these imperative integration flows. In<br />

detail, we described the fundamentals <strong>of</strong> cost-based optimization including novel techniques<br />

such as the first entirely defined cost model for imperative integration flows, a<br />

transformation-based rewriting algorithm with several search space reduction approaches,<br />

the asynchronous periodical re-optimization as well as techniques for workload adaptation<br />

and handling <strong>of</strong> correlated data. Essentially, this cost-based optimizer exploits the major<br />

integration flow specific characteristics <strong>of</strong> being deployed once and executed many times<br />

with rather small amounts <strong>of</strong> data per instance. In addition, we introduced several controlflow<br />

and data-flow-oriented optimization techniques, where we adapted on the one side<br />

techniques from data management systems, and programming language compilers as well<br />

as on the other side, we defined techniques tailor-made for integration flows. Additionally,<br />

we described in detail two novel optimization techniques for throughput optimization<br />

<strong>of</strong> integration flows, namely the cost-based vectorization and the multi-flow optimization.<br />

Finally, we introduced the novel concept <strong>of</strong> on-demand re-optimization <strong>of</strong> integration flows<br />

that overcomes the major drawbacks <strong>of</strong> periodical re-optimization, while still exploiting<br />

the major characteristics <strong>of</strong> integration flows. Our experiments showed that significant execution<br />

time and throughput improvements are possible, while only moderate additional<br />

optimization overhead is required.<br />

199

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!