Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
6 On-Demand Re-<strong>Optimization</strong><br />
For on-demand re-optimization, we only maintain statistics that are required to evaluate<br />
the optimality conditions. All other statistics are declined by the PlanOptTree such<br />
that they are not stored and aggregated. When atomic statistics are updated, we also<br />
maintain the aggregate, update the hierarchy <strong>of</strong> complex statistics measures, and evaluate<br />
optimality conditions that are reachable children <strong>of</strong> this statistic node. Arbitrary workload<br />
aggregation methods, as described in Subsection 3.3.2, can be used for aggregation<br />
<strong>of</strong> atomic statistics. Due to this incremental maintenance and immediate condition evaluation,<br />
the use <strong>of</strong> Exponential Moving Average (EMA) is most suitable because (1) it is<br />
incrementally maintained and (2) no negative statistics maintenance (sliding window) is<br />
necessary due to the exponentially decaying weights.<br />
The naïve application <strong>of</strong> triggering re-optimization based on the incrementally monitored<br />
statistics could lead to the problem <strong>of</strong> frequently changing plans.<br />
Example 6.5 (Problem <strong>of</strong> Frequent Plan Changes). Assume the optimality condition <strong>of</strong><br />
sel(σ A ) ≤ sel(σ B ). There are two problems that can cause frequent plan changes (instability).<br />
First, due to unknown statistics, we are only able to monitor conditional selectivities<br />
sel(σ A ) and sel(σ B |σ A ). If A and B are correlated, the optimality condition might be<br />
violated even after re-optimization. Second, if the selectivities are constant but alternate<br />
around equality with sel(σ A ) ≈ sel(σ B ), we would also change the plan back and forth. In<br />
both cases, we would get frequent re-optimization steps that are not amortized.<br />
We explicitly address this problem <strong>of</strong> missing robustness (instability) when triggering<br />
re-optimization with the following strategies:<br />
• Correlation Tables: As described in Subsection 3.3.4, we explicitly compute conditional<br />
selectivities using a lightweight correlation table. Essentially, we maintain<br />
selectivities over multiple versions <strong>of</strong> a plan, where we store and maintain a row<br />
<strong>of</strong> atomic and conditional selectivities for each pair <strong>of</strong> operators with direct data<br />
dependency within the current plan. Unless we see the second operator ordering,<br />
we assume statistical independence. However, based on the maintenance <strong>of</strong> conditional<br />
selectivities, we do not make a wrong decision based on correlation twice. For<br />
on-demand re-optimization, the use <strong>of</strong> this correlation table is even more important.<br />
The integration into the PlanOptTree is realized by a specific complex statistic node<br />
(CSNode) Conditional Selectivity that maintains and reads the correlation table.<br />
• Minimal Existence Time: We use the time period ∆t from periodical re-optimization<br />
as minimal existence time <strong>of</strong> a plan. This means that no optimality conditions are<br />
evaluated during ∆t after the last re-optimization. During this interval, we only collect<br />
statistics but we do not aggregate and evaluate them. As a result, the adaptation<br />
sensibility is reduced in order to avoid that re-optimization is triggered multiple times<br />
in case that (1) we have not finished another asynchronous re-optimization step or<br />
(2) the workload has changed abruptly and caused the violation <strong>of</strong> multiple optimality<br />
conditions with short delay. However, ∆t determines only the minimum period <strong>of</strong><br />
re-optimization and hence, can be set independently <strong>of</strong> the workload characteristics<br />
(low-influence parameter). After that, we continuously check optimality conditions<br />
and adapt faster to workload changes than periodical re-optimization does.<br />
• Lazy Condition Violation: When evaluating an optimality condition, we might not<br />
have seen all atomic statistics <strong>of</strong> a plan instance. Similar to known control strategies,<br />
re-optimization is lazily triggered if the condition is violated ∑ m ′<br />
1 s′ times, which is<br />
176