25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong> <strong>Flows</strong><br />

data properties <strong>of</strong> external systems, we use the described lightweight concept for explicitly<br />

taking into account conditional probabilities and correlation at the same time using a<br />

incremental reordering approach that is tailor-made for integration flows.<br />

3.4 <strong>Optimization</strong> Techniques<br />

In this section, we discuss specific optimization techniques that are used within the described<br />

core optimization algorithm. For selected techniques, we present the core idea, the<br />

optimality conditions, possible execution time reduction, the rewriting algorithm and its<br />

time complexity, as well as possible side effects to other techniques.<br />

<strong>Cost</strong>-<strong>Based</strong> Techniques<br />

Reordering <strong>of</strong> Switch-Paths (WD1)<br />

Merging <strong>of</strong> Switch-Paths (WD2)<br />

Execution Pushdown to External Systems (WD3)<br />

Early Selection Application (WD4)<br />

Early Projection Application (WD5)<br />

Early GroupBy Application (WD6)<br />

Materialization Point Insertion (WD7)<br />

Orderby Insertion / Removal (WD8)<br />

Join-Type Selection (WD9)<br />

Join Enumeration (WD10)<br />

Setoperation-Type Selection (WD11)<br />

Splitting / Merging <strong>of</strong> Operators (WD12)<br />

Precomputation <strong>of</strong> Values (WD13)<br />

Early Translation Application (WD14)<br />

(WC1) Rescheduling Start <strong>of</strong> Parallel <strong>Flows</strong><br />

(WC2) Rewriting Sequences to Parallel <strong>Flows</strong><br />

(WC3) Rewriting Iterations to Parallel <strong>Flows</strong><br />

(WC4) Merging Parallel <strong>Flows</strong><br />

(WM1) Message Indexing<br />

(WM2) Recycling Locally Created Intermediates<br />

(WM3) Recycling Externally Loaded Intermediates<br />

(Vect) <strong>Cost</strong>-<strong>Based</strong> Vectorization see Chapter 4<br />

see Chaper 5 Multi-Flow <strong>Optimization</strong> (MFO)<br />

Data Flow<br />

Double Variable Assignment Removal (RD1)<br />

Unnecessary Variable Assignment Removal (RD2)<br />

Unnecessary Variable Declaration Removal (RD3)<br />

Two Sibling Translation Operation Merging (RD4)<br />

Two Sibling Validation Merging (RD5)<br />

Unnecessary Switch-Path Elimination (RD6)<br />

Algebraic Simplification (RD7)<br />

(HLB) Heterogeneous Load Balancing<br />

(RC1) Redundant Control Flow Rewriting<br />

(RC2) Unreachable Subgraph Elimination<br />

(RC3) Local Subprocess Inline Compilation<br />

(RC4) Static Node Compilation<br />

Control Flow<br />

Rule-<strong>Based</strong> Techniques<br />

Figure 3.11: <strong>Cost</strong>-<strong>Based</strong> <strong>Optimization</strong> Techniques<br />

Figure 3.11 distinguishes the used cost-based optimization techniques into control-floworiented<br />

and data-flow-oriented optimization techniques and emphasizes (with bold font)<br />

the techniques that we will describe in detail. All optimization techniques presented in<br />

this section follow the optimization objective <strong>of</strong> minimizing the average execution time<br />

<strong>of</strong> a plan. In addition to these techniques, we will discuss in very detail the cost-based<br />

vectorization in Chapter 4 and the multi-flow optimization in Chapter 5, which both follow<br />

the optimization objective <strong>of</strong> maximizing the message throughput. Apart from these<br />

techniques, we refer the interested reader to our details on message indexing [BHW + 07,<br />

BHLW08d] and rule-based optimization techniques [BHW + 07] that are omitted in this<br />

thesis. The latter includes, for example, relational algebraic simplifications as described<br />

by Dadam [Dad96].<br />

3.4.1 Control-Flow-Oriented Techniques<br />

Control-flow-oriented optimization techniques address the interaction- and control-floworiented<br />

operators <strong>of</strong> our flow meta model. These techniques try to exploit the specific<br />

characteristics <strong>of</strong> operators like alternatives (Switch operator), loops (Iteration operator),<br />

and parallel subflows (Fork operator, constrained by Invoke operators).<br />

60

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!