25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.3 Periodical Re-<strong>Optimization</strong><br />

Algorithm 3.2 Pattern Matching <strong>Optimization</strong> (A-PMO)<br />

Require: operator op, dependency graph DG<br />

1: if type(op) is Plan then<br />

2: apply Plan techniques on op (Before) // WD10 (DPSize), WD13, MFO<br />

3: o ← op.getSequenceOfOperators()<br />

4: apply WC2 on o // apply technique for all sequences<br />

5: for i ← 1 to |o| do // for each operator <strong>of</strong> the sequence<br />

6: if type(o i ) ∈ (Plan, Switch, Fork, Iteration, Undefined) then // complex<br />

7: o i ← A-PMO(o i , DG)<br />

8: apply operator techniques on o i // WC1, WC4, WD1, WD2, WM1, WC3<br />

9: else // atomic<br />

10: apply operator techniques on o i // WM1, WM2, WM3, WD3, WD4, WD5,<br />

// WD6, WD8, WD9, WD11, WD12<br />

11: if type(op) is Plan then<br />

12: apply Plan techniques on op (After) // Vect, HLB<br />

enumeration is only executed once for the complete plan. Within this full optimization<br />

algorithm, we use the DPSize [SAC + 79, Moe09] join enumeration algorithm. Another example<br />

is the optimization technique multi-flow optimization (see Chapter 5). Second, for<br />

complex operators (line 6), we recursively invoke this algorithm and subsequently, apply<br />

available optimization techniques. Third, we apply operator-type-specific techniques for<br />

the individual atomic operators (line 9). Note that from each operator, parent nodes and<br />

all other operators are reachable. From the perspective <strong>of</strong> a single optimization technique,<br />

however, only successors (following operators) are considered (forward-only) in order to<br />

avoid to consider the same operator multiple times. Fourth, we apply all techniques, which<br />

need to be executed on top level <strong>of</strong> a plan and after the operator-type-specific techniques.<br />

Among others, we invoke the optimization technique vectorization (see Chapter 4). In<br />

contrast to the full plan enumeration with dynamic programming approaches, this iterative,<br />

transformation-based algorithm preserves the control-flow semantics <strong>of</strong> the given<br />

plan and it iteratively improves the current solution. Thus, it can be aborted between<br />

applying optimization techniques without loss <strong>of</strong> intermediate optimization results.<br />

The worst-case time complexity <strong>of</strong> the A-PMO is given by the optimization technique<br />

with the highest individual complexity. In our case this is the optimization technique<br />

join enumeration, where we use the DPSize join enumeration algorithm. According to the<br />

complexity analysis <strong>of</strong> DPSize [MN06] by Moerkotte and Neumann, it is given by O(n 4 )<br />

for chain and cycle queries as well as by O(c n ) for star and clique queries, where n denotes<br />

the number <strong>of</strong> joined input data sets.<br />

Example 3.4 (<strong>Optimization</strong> Algorithm). Recall the plan P 1 that is shown in Figure 3.7(a).<br />

The A-PMO recursively iterates over all operators and applies available optimization techniques.<br />

We start at the top-level sequence <strong>of</strong> operators, where we apply the rewriting <strong>of</strong><br />

sequences to parallel flows 5 because no data dependencies exists between o 7 and o 8 . The<br />

resulting plan P 1 ′ is shown in Figure 3.7(b). Then, we iterate over the individual operators.<br />

5 Rule-based optimization techniques are applied during the initial deployment <strong>of</strong> a plan. As an example<br />

consider the operators o 4 and o 6, which would be detected as redundant work and thus merged to a<br />

single operator after o 2. This operator would have been included in the parallel flow <strong>of</strong> operator o 7.<br />

For simplicity <strong>of</strong> presentation, we did not apply rule-based optimization techniques in the example.<br />

49

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!