25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

6 On-Demand Re-<strong>Optimization</strong><br />

6.6 Summary and Discussion<br />

The periodical re-optimization exhibits the drawbacks <strong>of</strong> (1) full monitoring <strong>of</strong> all operator<br />

statistics regardless if they are used by the optimizer or not, (2) many unnecessary<br />

periodical re-optimization steps, where for each step a full optimization is executed, (3)<br />

missed optimization opportunities due to potentially slow workload adaptation, and (4)<br />

the optimization period ∆t as a high-influence parameter, whose configuration requires<br />

context knowledge <strong>of</strong> the integration flows as well as knowledge <strong>of</strong> the frequency <strong>of</strong> workload<br />

changes. These drawbacks are mainly reasoned by the strict separation between<br />

optimizer, statistics monitoring and plan execution.<br />

In this chapter, we presented the concept <strong>of</strong> on-demand re-optimization to overcome<br />

these drawbacks. We introduced the PlanOptTree that models optimality <strong>of</strong> the current<br />

plan by exploiting context knowledge from the optimizer. With this approach, statistics<br />

are monitored according to optimality conditions and directed re-optimization is triggered<br />

if and only if such a condition is violated. In conclusion, this approach always reduces the<br />

total execution time because (1) if the workload does not change, we avoid unnecessary reoptimization<br />

steps, and (2) if there are workload changes, we do not miss any optimization<br />

opportunities due to direct re-optimization, while the overhead for evaluating optimality<br />

conditions is negligible. In addition, it allows for predictable performance without the<br />

need for elaborate parameterization. In conclusion, on-demand re-optimization has the<br />

same advantages but overcomes the disadvantages <strong>of</strong> periodical re-optimization.<br />

However, on-demand re-optimization has also some limitations. Most importantly, the<br />

on-demand re-optimization is more sensitive with regard to workload changes than periodical<br />

re-optimization is. On the one side this is advantageous because we directly adapt to<br />

these changes and therefore, reduce the execution time. On the other side, more care with<br />

regard to robustness (stability) is required. For example, correlation-awareness is required<br />

because otherwise, we might result in frequent plan changes, which hurt performance. For<br />

this reason, we introduce the concepts <strong>of</strong> correlation tables, minimal existence time, and<br />

lazy condition violation, which ensures robustness <strong>of</strong> on-demand re-optimization.<br />

In conclusion <strong>of</strong> our experimental evaluation, the on-demand re-optimization, in comparison<br />

to periodical re-optimization, achieves additional cumulative execution time improvements,<br />

while it requires much less re-optimization steps and therefore, significantly<br />

reduces the total optimization overhead. Furthermore, there are plenty issues for future<br />

work. This includes, for example, the investigation <strong>of</strong> (1) the extension <strong>of</strong> inter-instance<br />

re-optimization to intra-instance re-optimization (mid-query re-optimization) in order to<br />

support ad-hoc integration flows or long running plan instances, (2) specific approaches<br />

for directed re-optimization with regard to complex optimization techniques (e.g., join<br />

enumeration, eager group-by), and (3) the combination <strong>of</strong> progressive parametric query<br />

optimization with on-demand re-optimization in order to reuse generated physical plans.<br />

Although the on-demand re-optimization approach is tailor-made for integration flows that<br />

are deployed once and executed many times, it is also applicable in other areas. Examples<br />

for these areas are continuous queries in DSMS, re-occurring queries in DBMS, and<br />

incremental maintenance <strong>of</strong> data mining results.<br />

198

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!