Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
6 On-Demand Re-<strong>Optimization</strong><br />
6.6 Summary and Discussion<br />
The periodical re-optimization exhibits the drawbacks <strong>of</strong> (1) full monitoring <strong>of</strong> all operator<br />
statistics regardless if they are used by the optimizer or not, (2) many unnecessary<br />
periodical re-optimization steps, where for each step a full optimization is executed, (3)<br />
missed optimization opportunities due to potentially slow workload adaptation, and (4)<br />
the optimization period ∆t as a high-influence parameter, whose configuration requires<br />
context knowledge <strong>of</strong> the integration flows as well as knowledge <strong>of</strong> the frequency <strong>of</strong> workload<br />
changes. These drawbacks are mainly reasoned by the strict separation between<br />
optimizer, statistics monitoring and plan execution.<br />
In this chapter, we presented the concept <strong>of</strong> on-demand re-optimization to overcome<br />
these drawbacks. We introduced the PlanOptTree that models optimality <strong>of</strong> the current<br />
plan by exploiting context knowledge from the optimizer. With this approach, statistics<br />
are monitored according to optimality conditions and directed re-optimization is triggered<br />
if and only if such a condition is violated. In conclusion, this approach always reduces the<br />
total execution time because (1) if the workload does not change, we avoid unnecessary reoptimization<br />
steps, and (2) if there are workload changes, we do not miss any optimization<br />
opportunities due to direct re-optimization, while the overhead for evaluating optimality<br />
conditions is negligible. In addition, it allows for predictable performance without the<br />
need for elaborate parameterization. In conclusion, on-demand re-optimization has the<br />
same advantages but overcomes the disadvantages <strong>of</strong> periodical re-optimization.<br />
However, on-demand re-optimization has also some limitations. Most importantly, the<br />
on-demand re-optimization is more sensitive with regard to workload changes than periodical<br />
re-optimization is. On the one side this is advantageous because we directly adapt to<br />
these changes and therefore, reduce the execution time. On the other side, more care with<br />
regard to robustness (stability) is required. For example, correlation-awareness is required<br />
because otherwise, we might result in frequent plan changes, which hurt performance. For<br />
this reason, we introduce the concepts <strong>of</strong> correlation tables, minimal existence time, and<br />
lazy condition violation, which ensures robustness <strong>of</strong> on-demand re-optimization.<br />
In conclusion <strong>of</strong> our experimental evaluation, the on-demand re-optimization, in comparison<br />
to periodical re-optimization, achieves additional cumulative execution time improvements,<br />
while it requires much less re-optimization steps and therefore, significantly<br />
reduces the total optimization overhead. Furthermore, there are plenty issues for future<br />
work. This includes, for example, the investigation <strong>of</strong> (1) the extension <strong>of</strong> inter-instance<br />
re-optimization to intra-instance re-optimization (mid-query re-optimization) in order to<br />
support ad-hoc integration flows or long running plan instances, (2) specific approaches<br />
for directed re-optimization with regard to complex optimization techniques (e.g., join<br />
enumeration, eager group-by), and (3) the combination <strong>of</strong> progressive parametric query<br />
optimization with on-demand re-optimization in order to reuse generated physical plans.<br />
Although the on-demand re-optimization approach is tailor-made for integration flows that<br />
are deployed once and executed many times, it is also applicable in other areas. Examples<br />
for these areas are continuous queries in DSMS, re-occurring queries in DBMS, and<br />
incremental maintenance <strong>of</strong> data mining results.<br />
198