25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6 On-Demand Re-<strong>Optimization</strong><br />

evaluating the MEMO structure. However, due to the requirement <strong>of</strong> robustness, we use the<br />

POT3 configurations for all end-to-end comparison scenarios. Furthermore, we show only<br />

the total statistic maintenance time because these times scale linear with an increasing<br />

number <strong>of</strong> statistic tuples due to the use <strong>of</strong> the incremental EMA aggregation method. Finally,<br />

compared to the performance benefit <strong>of</strong> on-demand re-optimization, the overhead<br />

for statistic maintenance is negligible.<br />

Additionally, we evaluated the algorithm overhead when creating, using and updating<br />

PlanOptTrees. There, we used the already described POT3 configuration. Figure 6.18(b)<br />

illustrates the results, varying the number <strong>of</strong> operators m <strong>of</strong> a plan and hence, indirectly<br />

varying the number <strong>of</strong> optimality conditions. The PlanOptTree is created over all<br />

m operators (where we simulated the actual optimization techniques and directly provided<br />

the partial PlanOptTrees), while triggering re-optimization and the subsequent<br />

update <strong>of</strong> the PlanOptTree addressed violated optimality conditions only, which depend<br />

on the randomly changed statistics. We observe a moderate execution time, where the<br />

creation and the update <strong>of</strong> PlanOptTrees were dominated by the merging <strong>of</strong> partial<br />

PlanOptTrees. Due to the different numbers <strong>of</strong> addressed optimality conditions the update<br />

<strong>of</strong> PlanOptTrees is more efficient than the creation <strong>of</strong> an initial PlanOptTree. It<br />

is important to note the almost linear scaling <strong>of</strong> creating PlanOptTree, triggering reoptimization<br />

and updating PlanOptTrees. In conclusion, the overhead <strong>of</strong> PlanOptTree<br />

algorithms is fairly low and can be neglected as well because it is only required for initial<br />

deployment or during on-demand re-optimization.<br />

Robustness<br />

Compared to periodical re-optimization, on-demand re-optimization is more sensitive with<br />

regard to workload changes because it directly reacts on detected violated optimality conditions.<br />

According to the robustness <strong>of</strong> optimization benefits, there are two major effects<br />

that are worth mentioning. First, the on-demand re-optimization does not require the<br />

specification <strong>of</strong> an optimization interval. Therefore, it is more robust to arbitrary workload<br />

changes compared to periodical re-optimization as shown in Figure 6.16. Second,<br />

for on-demand re-optimization there is a higher risk <strong>of</strong> frequently changing plans due to<br />

correlated data or almost equal selectivities that alternate around equality (see Example<br />

6.5). Therefore, we introduced the strategies <strong>of</strong> the correlation table, minimum existence<br />

time and lazy condition evaluation. The minimum existence time simply ensures<br />

a lower bound <strong>of</strong> time between two subsequent re-optimization steps and thus, linearly<br />

reduces the number <strong>of</strong> re-optimization steps. Furthermore, the strategy <strong>of</strong> lazy condition<br />

evaluation overcomes the problem <strong>of</strong> almost equal selectivities. While the effects <strong>of</strong> these<br />

two strategies are fairly obvious, the use <strong>of</strong> the correlation table requires a more detailed<br />

discussion using a series <strong>of</strong> experiments.<br />

Therefore, we reused the correlation experiment <strong>of</strong> Chapter 3 in order to evaluate the<br />

on-demand re-optimization on correlated data with and without the use <strong>of</strong> our lightweight<br />

correlation table. We executed 100,000 instances <strong>of</strong> our example plan P 5 and compared<br />

the resulting execution time. We used a minimum existence time <strong>of</strong> ∆t = 5 s, a lazy<br />

condition evaluation count <strong>of</strong> ten, and a re-optimization threshold <strong>of</strong> τ = 0.001. Figure<br />

6.19(a) recaps the conditional selectivities P (o 2 ), P (o 3 |o 2 ), and P (o 4 |o 2 ∧ o 3 ) <strong>of</strong> the<br />

three Selection operators (with P (o 3 |¬o 2 ) = 1 and P (o 4 |¬o 2 ∨ ¬o 3 ) = 1), which lead to<br />

a strong dependence (correlation) <strong>of</strong> o 3 on o 2 as well as <strong>of</strong> o 4 on o 2 and o 3 .<br />

Figures 6.19(c) and 6.19(b) illustrate the resulting optimization times and execution<br />

196

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!