25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

7 Conclusions<br />

Existing approaches before this thesis either used the rule-based optimize-once model,<br />

where the integration flow is only optimized once during the initial deployment, or the<br />

optimize-always model, where optimization is triggered for each plan instance. In contrast,<br />

we presented the first entire cost-based optimizer for imperative integration flows using the<br />

cost-based optimization models <strong>of</strong> periodical and on-demand re-optimization. The major<br />

advantage <strong>of</strong> these new optimization models is robustness in terms <strong>of</strong> (1) high optimization<br />

opportunities that allows (2) the adaptation to changing workload characteristics with (3)<br />

low risk <strong>of</strong> optimization overheads.<br />

Beside the investigation <strong>of</strong> additional optimization techniques, we see four major research<br />

challenges regarding future work <strong>of</strong> the cost-based optimization <strong>of</strong> integration flows:<br />

• Mid-Instance <strong>Optimization</strong>: Our asynchronous re-optimization model is based on the<br />

assumption <strong>of</strong> many plan instances with rather small amounts <strong>of</strong> data per instance.<br />

In order to make it also suitable for (1) long running plan instances and (2) adhoc<br />

integration flows (as required for situational BI and mashup integration flows),<br />

an extension to synchronous mid-instance re-optimization would be necessary. The<br />

challenge is to define a hybrid model for both use cases <strong>of</strong> integration flows.<br />

• Multi-Objective <strong>Optimization</strong>: We focused on the optimization objectives <strong>of</strong> execution<br />

time and throughput. However, facing new requirements, this might be extended<br />

by multiple objectives including for example, monetary measures, execution and latency<br />

time, throughput, energy consumption, resiliency or transactional guarantees.<br />

• <strong>Optimization</strong> <strong>of</strong> Multiple Deployed Plans: The cost-based optimization <strong>of</strong> integration<br />

flows—with few exceptions—aims to optimize a single deployed plan. However,<br />

typically, multiple different integration flows are deployed and concurrently executed<br />

by the integration platform. The major question is if we could exploit the knowledge<br />

about workload characteristics <strong>of</strong> all plans and their inter-influences for more efficient<br />

scheduling <strong>of</strong> plan instances or influence-aware plan rewriting.<br />

• <strong>Optimization</strong> <strong>of</strong> Distributed <strong>Integration</strong> <strong>Flows</strong>: With regard to load balancing and<br />

the emerging trend towards virtualization, distributed integration flows might be<br />

used as well. The resulting research challenge is to optimize the entire distributed<br />

integration flow, rather than just the subplans <strong>of</strong> local server nodes. One aspect <strong>of</strong><br />

this might be the extension <strong>of</strong> cost-based vectorization to the distributed case.<br />

Thus, we can conclude that there are many directions for future work in this new field<br />

<strong>of</strong> the cost-based optimization <strong>of</strong> integration flows. Similarly to cost-based query optimization,<br />

this might be a starting point for the continuous development <strong>of</strong> new execution and<br />

optimization techniques:<br />

”In my view, the query optimizer was the first attempt at what we call autonomic<br />

computing or self-managing, self-timing technology. Query optimizers<br />

have been 25 years in development, with enhancements <strong>of</strong> the cost-based query<br />

model and the optimization that goes with it, and a richer and richer variety<br />

<strong>of</strong> execution techniques that the optimizer chooses from. We just have to keep<br />

working on this. It’s a never-ending quest for an increasingly better model and<br />

repertoire <strong>of</strong> optimization and execution techniques. So the more the model can<br />

predict what’s really happening in the data and how the data is really organized,<br />

the closer and closer we will come [to the ideal system].”<br />

— Patricia G. Selinger, IBM Research – Almaden, 2003 [Win03]<br />

200

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!