25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

6 On-Demand Re-<strong>Optimization</strong><br />

For on-demand re-optimization, we only maintain statistics that are required to evaluate<br />

the optimality conditions. All other statistics are declined by the PlanOptTree such<br />

that they are not stored and aggregated. When atomic statistics are updated, we also<br />

maintain the aggregate, update the hierarchy <strong>of</strong> complex statistics measures, and evaluate<br />

optimality conditions that are reachable children <strong>of</strong> this statistic node. Arbitrary workload<br />

aggregation methods, as described in Subsection 3.3.2, can be used for aggregation<br />

<strong>of</strong> atomic statistics. Due to this incremental maintenance and immediate condition evaluation,<br />

the use <strong>of</strong> Exponential Moving Average (EMA) is most suitable because (1) it is<br />

incrementally maintained and (2) no negative statistics maintenance (sliding window) is<br />

necessary due to the exponentially decaying weights.<br />

The naïve application <strong>of</strong> triggering re-optimization based on the incrementally monitored<br />

statistics could lead to the problem <strong>of</strong> frequently changing plans.<br />

Example 6.5 (Problem <strong>of</strong> Frequent Plan Changes). Assume the optimality condition <strong>of</strong><br />

sel(σ A ) ≤ sel(σ B ). There are two problems that can cause frequent plan changes (instability).<br />

First, due to unknown statistics, we are only able to monitor conditional selectivities<br />

sel(σ A ) and sel(σ B |σ A ). If A and B are correlated, the optimality condition might be<br />

violated even after re-optimization. Second, if the selectivities are constant but alternate<br />

around equality with sel(σ A ) ≈ sel(σ B ), we would also change the plan back and forth. In<br />

both cases, we would get frequent re-optimization steps that are not amortized.<br />

We explicitly address this problem <strong>of</strong> missing robustness (instability) when triggering<br />

re-optimization with the following strategies:<br />

• Correlation Tables: As described in Subsection 3.3.4, we explicitly compute conditional<br />

selectivities using a lightweight correlation table. Essentially, we maintain<br />

selectivities over multiple versions <strong>of</strong> a plan, where we store and maintain a row<br />

<strong>of</strong> atomic and conditional selectivities for each pair <strong>of</strong> operators with direct data<br />

dependency within the current plan. Unless we see the second operator ordering,<br />

we assume statistical independence. However, based on the maintenance <strong>of</strong> conditional<br />

selectivities, we do not make a wrong decision based on correlation twice. For<br />

on-demand re-optimization, the use <strong>of</strong> this correlation table is even more important.<br />

The integration into the PlanOptTree is realized by a specific complex statistic node<br />

(CSNode) Conditional Selectivity that maintains and reads the correlation table.<br />

• Minimal Existence Time: We use the time period ∆t from periodical re-optimization<br />

as minimal existence time <strong>of</strong> a plan. This means that no optimality conditions are<br />

evaluated during ∆t after the last re-optimization. During this interval, we only collect<br />

statistics but we do not aggregate and evaluate them. As a result, the adaptation<br />

sensibility is reduced in order to avoid that re-optimization is triggered multiple times<br />

in case that (1) we have not finished another asynchronous re-optimization step or<br />

(2) the workload has changed abruptly and caused the violation <strong>of</strong> multiple optimality<br />

conditions with short delay. However, ∆t determines only the minimum period <strong>of</strong><br />

re-optimization and hence, can be set independently <strong>of</strong> the workload characteristics<br />

(low-influence parameter). After that, we continuously check optimality conditions<br />

and adapt faster to workload changes than periodical re-optimization does.<br />

• Lazy Condition Violation: When evaluating an optimality condition, we might not<br />

have seen all atomic statistics <strong>of</strong> a plan instance. Similar to known control strategies,<br />

re-optimization is lazily triggered if the condition is violated ∑ m ′<br />

1 s′ times, which is<br />

176

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!