25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

6.2 Plan Optimality Trees<br />

<strong>of</strong> nodes within a PlanOptTree for a given plan, which indirectly implies the worst case<br />

complexity for any operation that evaluates at most all nodes <strong>of</strong> such a PlanOptTree.<br />

Theorem 6.1 (Worst-Case Complexity). The worst-case time and space complexity <strong>of</strong> a<br />

PlanOptTree for a plan <strong>of</strong> m operators is O(m 2 ).<br />

Pro<strong>of</strong>. Assume a plan P with m operators. A minimal PlanOptTree has at most m<br />

ONodes, m · s SNodes, 2 · |oc| CSNodes (two complex statistics nodes for each binary<br />

optimality condition), and |oc| OCNodes. Each operator can be included in one optimality<br />

condition per dependency (in case <strong>of</strong> a data dependency this subsumes any temporal<br />

dependency) and in one additional optimality condition for binary operators. Now, let<br />

us assume a sequence <strong>of</strong> operators o. Then, an arbitrary operator o i with 1 ≤ i ≤ m<br />

can—in the worst case—be the target <strong>of</strong> i − 1 dependencies δi − , and it can be the source<br />

<strong>of</strong> m − i dependencies δ<br />

i + . <strong>Based</strong> on the equivalence <strong>of</strong> δ− = δ + and thus, |δ − | = |δ + |, the<br />

maximum number <strong>of</strong> optimality conditions is given by<br />

|oc| =<br />

Hence, Theorem 6.1 holds.<br />

m∑<br />

(i − 1) + m =<br />

i=1<br />

m−1<br />

∑<br />

i=1<br />

i + m =<br />

m · (m + 1)<br />

. (6.1)<br />

2<br />

After we have created the initial PlanOptTree, we can apply the following two optimizations<br />

with an additional single pass over all nodes <strong>of</strong> the PlanOptTree. However, for<br />

simplicity <strong>of</strong> presentation and due to a small impact on our use cases, we did not apply<br />

them in the examples.<br />

• Collapsing Statistic Hierarchies (CSNodes): The hierarchies <strong>of</strong> statistic nodes <strong>of</strong><br />

partial PlanOptTrees are defined for each optimization technique with regard to<br />

re-usability. Thus, there might be unnecessarily fine-grained CSNodes. We collapse<br />

these hierarchies by merging CSNodes with their children if only a single child exists.<br />

Similarly to the merging <strong>of</strong> prefix nodes within a patricia trie [Mor68] this can reduce<br />

the number <strong>of</strong> levels <strong>of</strong> the PlanOptTree.<br />

• Reusing Atomic Statistic Measures (SNodes): For operators <strong>of</strong> the PlanOptTree<br />

with a data dependency between them, we reuse cardinalities across those operators<br />

in order to eliminate redundancy. Due to the data dependency, the output<br />

cardinality <strong>of</strong> operator o i is equal to the input cardinality <strong>of</strong> operator o i+1<br />

(|ds out (o i )| = |ds in (o i+1 )|). Hence, we remove the latter SNode (|ds in (o i+1 )|) and<br />

modify the references. This requires awareness when updating the PlanOptTree after<br />

successful re-optimization, because the re-optimization might have changed the<br />

ordering <strong>of</strong> operators and thus, also changed the data dependencies and statistics.<br />

As a result <strong>of</strong> creating the initial PlanOptTree, we obtain a structure that represents<br />

optimality <strong>of</strong> a plan with the properties sketched in Subsection 6.2.1. It includes all cost<br />

conditions that must be satisfied for plan optimality with regard to the current statistics.<br />

Thus, only the update <strong>of</strong> statistics can trigger re-optimization.<br />

6.2.3 Updating and Evaluating Statistics<br />

We use the PlanOptTree for statistics maintenance and for immediate evaluation <strong>of</strong> optimality<br />

conditions. This triggers re-optimization if optimality conditions are violated.<br />

175

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!