25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong> <strong>Flows</strong><br />

cost-based optimization techniques (Plan → Execute; Section 3.4). Fourth, selected experimental<br />

evaluation results are presented in order to illustrate the achieved execution<br />

time improvement as well as the required optimization overhead (Execute → Monitor;<br />

Section 3.5). Finally, we summarize the results <strong>of</strong> this chapter and discuss advantages and<br />

disadvantages <strong>of</strong> this approach in Section 3.6.<br />

3.2 Prerequisites for <strong>Cost</strong>-<strong>Based</strong> <strong>Optimization</strong><br />

<strong>Based</strong> on the problem <strong>of</strong> imperative integration flows, the prerequisites <strong>of</strong> cost-based optimization<br />

are two-fold. On the one side, the operator dependency analysis is required in<br />

order to ensure correctness <strong>of</strong> plan rewriting. On the other side, an accurate cost model<br />

is required in order to allow for precise cost estimation when comparing alternative plans<br />

<strong>of</strong> an integration flow. In this section, we address both fundamental requirements.<br />

3.2.1 Dependency Analysis<br />

When rewriting plans, we have to preserve semantic correctness. Here, semantic correctness<br />

is used in the sense <strong>of</strong> preventing the external behavior from being changed.<br />

This is comparable to snapshot isolation in DBMS [CRF08] or in replication scenarios<br />

[DS06, LKPMJP05]. We introduce the dependency analysis for integration flows that resembles<br />

similar dependency models from areas like compiler construction for programming<br />

languages [Muc97] and computational engineering.<br />

The dependency analysis (based on the analysis <strong>of</strong> control-flow and data-flow) is executed<br />

once during the initial deployment <strong>of</strong> a plan in order to generate the so-called<br />

dependency graph DG(P ) <strong>of</strong> a plan P . All <strong>of</strong> our optimization techniques use these operator<br />

dependencies in order to determine whether or not rewriting is possible. In detail,<br />

we distinguish three dependency types:<br />

δm D<br />

• Data Dependency o<br />

1<br />

j −→ oi : Operator o j depends on (reads as input) the message<br />

m 1 that has been modified or created by operator o i (read after write).<br />

δm O<br />

• Output dependency o<br />

1<br />

j −→ oi : Both, operator o j and operator o i , write their results<br />

to message m 1 , where operator o j is a temporal successor <strong>of</strong> o i (m 1 is overwritten<br />

multiple times, write after write). However, the variable might be read in between.<br />

δm −1<br />

• Anti-dependency o<br />

1<br />

j −→ oi : Operator o j modifies the message m 1 , while operator<br />

o i —as a predecessor <strong>of</strong> o j —depends on this message (before m 1 is written, it is<br />

referenced, write after read).<br />

<strong>Based</strong> on this distinction <strong>of</strong> dependency types, the dependency graph DG(P ) is constructed<br />

using three basic rules. First, a data dependency is created if the output variable<br />

<strong>of</strong> an operator is one <strong>of</strong> the input variables <strong>of</strong> a following operator. Second, if two operators<br />

have the same output variable and if this variable is not written in between, an<br />

output dependency is created between the two operators. Third, an anti-dependency is<br />

created if the output variable <strong>of</strong> an operator is the input variable <strong>of</strong> a previous operator<br />

and if this variable is not written in between. Hence, an anti-dependency can only occur<br />

if an output dependency exist. It follows that, for example, output dependencies are subsumed<br />

by other output dependencies, i.e., one operator is involved at most in one output<br />

dependency per data object.<br />

36

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!