Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
7 Conclusions<br />
Existing approaches before this thesis either used the rule-based optimize-once model,<br />
where the integration flow is only optimized once during the initial deployment, or the<br />
optimize-always model, where optimization is triggered for each plan instance. In contrast,<br />
we presented the first entire cost-based optimizer for imperative integration flows using the<br />
cost-based optimization models <strong>of</strong> periodical and on-demand re-optimization. The major<br />
advantage <strong>of</strong> these new optimization models is robustness in terms <strong>of</strong> (1) high optimization<br />
opportunities that allows (2) the adaptation to changing workload characteristics with (3)<br />
low risk <strong>of</strong> optimization overheads.<br />
Beside the investigation <strong>of</strong> additional optimization techniques, we see four major research<br />
challenges regarding future work <strong>of</strong> the cost-based optimization <strong>of</strong> integration flows:<br />
• Mid-Instance <strong>Optimization</strong>: Our asynchronous re-optimization model is based on the<br />
assumption <strong>of</strong> many plan instances with rather small amounts <strong>of</strong> data per instance.<br />
In order to make it also suitable for (1) long running plan instances and (2) adhoc<br />
integration flows (as required for situational BI and mashup integration flows),<br />
an extension to synchronous mid-instance re-optimization would be necessary. The<br />
challenge is to define a hybrid model for both use cases <strong>of</strong> integration flows.<br />
• Multi-Objective <strong>Optimization</strong>: We focused on the optimization objectives <strong>of</strong> execution<br />
time and throughput. However, facing new requirements, this might be extended<br />
by multiple objectives including for example, monetary measures, execution and latency<br />
time, throughput, energy consumption, resiliency or transactional guarantees.<br />
• <strong>Optimization</strong> <strong>of</strong> Multiple Deployed Plans: The cost-based optimization <strong>of</strong> integration<br />
flows—with few exceptions—aims to optimize a single deployed plan. However,<br />
typically, multiple different integration flows are deployed and concurrently executed<br />
by the integration platform. The major question is if we could exploit the knowledge<br />
about workload characteristics <strong>of</strong> all plans and their inter-influences for more efficient<br />
scheduling <strong>of</strong> plan instances or influence-aware plan rewriting.<br />
• <strong>Optimization</strong> <strong>of</strong> Distributed <strong>Integration</strong> <strong>Flows</strong>: With regard to load balancing and<br />
the emerging trend towards virtualization, distributed integration flows might be<br />
used as well. The resulting research challenge is to optimize the entire distributed<br />
integration flow, rather than just the subplans <strong>of</strong> local server nodes. One aspect <strong>of</strong><br />
this might be the extension <strong>of</strong> cost-based vectorization to the distributed case.<br />
Thus, we can conclude that there are many directions for future work in this new field<br />
<strong>of</strong> the cost-based optimization <strong>of</strong> integration flows. Similarly to cost-based query optimization,<br />
this might be a starting point for the continuous development <strong>of</strong> new execution and<br />
optimization techniques:<br />
”In my view, the query optimizer was the first attempt at what we call autonomic<br />
computing or self-managing, self-timing technology. Query optimizers<br />
have been 25 years in development, with enhancements <strong>of</strong> the cost-based query<br />
model and the optimization that goes with it, and a richer and richer variety<br />
<strong>of</strong> execution techniques that the optimizer chooses from. We just have to keep<br />
working on this. It’s a never-ending quest for an increasingly better model and<br />
repertoire <strong>of</strong> optimization and execution techniques. So the more the model can<br />
predict what’s really happening in the data and how the data is really organized,<br />
the closer and closer we will come [to the ideal system].”<br />
— Patricia G. Selinger, IBM Research – Almaden, 2003 [Win03]<br />
200