Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
7 Conclusions<br />
I can’t change the direction <strong>of</strong> the wind,<br />
but I can adjust my sails to always reach my destination.<br />
— Jimmy Dean<br />
From the reactive perspective <strong>of</strong> an integration platform, we cannot change the workload<br />
characteristics (the direction <strong>of</strong> the wind) in the sense <strong>of</strong> incoming messages that initiate<br />
plan instances <strong>of</strong> integration flows. However, we can incrementally maintain execution<br />
statistics and use these for the cost-based optimization <strong>of</strong> integration flows (the adjustment<br />
<strong>of</strong> sails) in order to improve their execution time, latency and thus, the message throughput<br />
(in order to reach the destination). This allows the workload-based adjustment <strong>of</strong> the<br />
current plan and thus, for continuous adaptation to changing workload characteristics.<br />
<strong>Based</strong> on emerging requirements <strong>of</strong> complex integration tasks that (1) stretch beyond<br />
simple read-only applications, (2) involve many types <strong>of</strong> heterogeneous systems and applications,<br />
and (3) require fairly complex procedural aspects, typically, imperative integration<br />
flows are used for specification and execution <strong>of</strong> these tasks. In this context, we observe<br />
that many independent instances <strong>of</strong> such integration flows with rather small amounts <strong>of</strong><br />
data per instance are executed over time in order to achieve (1) high consistency between<br />
data <strong>of</strong> operational systems or (2) high up-to-dateness <strong>of</strong> analytical query results in data<br />
warehouse infrastructures. In addition to this high load <strong>of</strong> flow instances, the performance<br />
<strong>of</strong> source systems depends on the execution time and availability <strong>of</strong> synchronous<br />
data-driven integration flows. For these reasons, there are high performance demands on<br />
integration platforms that execute imperative integration flows.<br />
To tackle this problem <strong>of</strong> high performance requirements on the execution <strong>of</strong> integration<br />
flows, we introduced the cost-based optimization <strong>of</strong> these imperative integration flows. In<br />
detail, we described the fundamentals <strong>of</strong> cost-based optimization including novel techniques<br />
such as the first entirely defined cost model for imperative integration flows, a<br />
transformation-based rewriting algorithm with several search space reduction approaches,<br />
the asynchronous periodical re-optimization as well as techniques for workload adaptation<br />
and handling <strong>of</strong> correlated data. Essentially, this cost-based optimizer exploits the major<br />
integration flow specific characteristics <strong>of</strong> being deployed once and executed many times<br />
with rather small amounts <strong>of</strong> data per instance. In addition, we introduced several controlflow<br />
and data-flow-oriented optimization techniques, where we adapted on the one side<br />
techniques from data management systems, and programming language compilers as well<br />
as on the other side, we defined techniques tailor-made for integration flows. Additionally,<br />
we described in detail two novel optimization techniques for throughput optimization<br />
<strong>of</strong> integration flows, namely the cost-based vectorization and the multi-flow optimization.<br />
Finally, we introduced the novel concept <strong>of</strong> on-demand re-optimization <strong>of</strong> integration flows<br />
that overcomes the major drawbacks <strong>of</strong> periodical re-optimization, while still exploiting<br />
the major characteristics <strong>of</strong> integration flows. Our experiments showed that significant execution<br />
time and throughput improvements are possible, while only moderate additional<br />
optimization overhead is required.<br />
199