Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
6 On-Demand Re-<strong>Optimization</strong><br />
evaluating the MEMO structure. However, due to the requirement <strong>of</strong> robustness, we use the<br />
POT3 configurations for all end-to-end comparison scenarios. Furthermore, we show only<br />
the total statistic maintenance time because these times scale linear with an increasing<br />
number <strong>of</strong> statistic tuples due to the use <strong>of</strong> the incremental EMA aggregation method. Finally,<br />
compared to the performance benefit <strong>of</strong> on-demand re-optimization, the overhead<br />
for statistic maintenance is negligible.<br />
Additionally, we evaluated the algorithm overhead when creating, using and updating<br />
PlanOptTrees. There, we used the already described POT3 configuration. Figure 6.18(b)<br />
illustrates the results, varying the number <strong>of</strong> operators m <strong>of</strong> a plan and hence, indirectly<br />
varying the number <strong>of</strong> optimality conditions. The PlanOptTree is created over all<br />
m operators (where we simulated the actual optimization techniques and directly provided<br />
the partial PlanOptTrees), while triggering re-optimization and the subsequent<br />
update <strong>of</strong> the PlanOptTree addressed violated optimality conditions only, which depend<br />
on the randomly changed statistics. We observe a moderate execution time, where the<br />
creation and the update <strong>of</strong> PlanOptTrees were dominated by the merging <strong>of</strong> partial<br />
PlanOptTrees. Due to the different numbers <strong>of</strong> addressed optimality conditions the update<br />
<strong>of</strong> PlanOptTrees is more efficient than the creation <strong>of</strong> an initial PlanOptTree. It<br />
is important to note the almost linear scaling <strong>of</strong> creating PlanOptTree, triggering reoptimization<br />
and updating PlanOptTrees. In conclusion, the overhead <strong>of</strong> PlanOptTree<br />
algorithms is fairly low and can be neglected as well because it is only required for initial<br />
deployment or during on-demand re-optimization.<br />
Robustness<br />
Compared to periodical re-optimization, on-demand re-optimization is more sensitive with<br />
regard to workload changes because it directly reacts on detected violated optimality conditions.<br />
According to the robustness <strong>of</strong> optimization benefits, there are two major effects<br />
that are worth mentioning. First, the on-demand re-optimization does not require the<br />
specification <strong>of</strong> an optimization interval. Therefore, it is more robust to arbitrary workload<br />
changes compared to periodical re-optimization as shown in Figure 6.16. Second,<br />
for on-demand re-optimization there is a higher risk <strong>of</strong> frequently changing plans due to<br />
correlated data or almost equal selectivities that alternate around equality (see Example<br />
6.5). Therefore, we introduced the strategies <strong>of</strong> the correlation table, minimum existence<br />
time and lazy condition evaluation. The minimum existence time simply ensures<br />
a lower bound <strong>of</strong> time between two subsequent re-optimization steps and thus, linearly<br />
reduces the number <strong>of</strong> re-optimization steps. Furthermore, the strategy <strong>of</strong> lazy condition<br />
evaluation overcomes the problem <strong>of</strong> almost equal selectivities. While the effects <strong>of</strong> these<br />
two strategies are fairly obvious, the use <strong>of</strong> the correlation table requires a more detailed<br />
discussion using a series <strong>of</strong> experiments.<br />
Therefore, we reused the correlation experiment <strong>of</strong> Chapter 3 in order to evaluate the<br />
on-demand re-optimization on correlated data with and without the use <strong>of</strong> our lightweight<br />
correlation table. We executed 100,000 instances <strong>of</strong> our example plan P 5 and compared<br />
the resulting execution time. We used a minimum existence time <strong>of</strong> ∆t = 5 s, a lazy<br />
condition evaluation count <strong>of</strong> ten, and a re-optimization threshold <strong>of</strong> τ = 0.001. Figure<br />
6.19(a) recaps the conditional selectivities P (o 2 ), P (o 3 |o 2 ), and P (o 4 |o 2 ∧ o 3 ) <strong>of</strong> the<br />
three Selection operators (with P (o 3 |¬o 2 ) = 1 and P (o 4 |¬o 2 ∨ ¬o 3 ) = 1), which lead to<br />
a strong dependence (correlation) <strong>of</strong> o 3 on o 2 as well as <strong>of</strong> o 4 on o 2 and o 3 .<br />
Figures 6.19(c) and 6.19(b) illustrate the resulting optimization times and execution<br />
196