Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong> <strong>Flows</strong><br />
cost-based optimization techniques (Plan → Execute; Section 3.4). Fourth, selected experimental<br />
evaluation results are presented in order to illustrate the achieved execution<br />
time improvement as well as the required optimization overhead (Execute → Monitor;<br />
Section 3.5). Finally, we summarize the results <strong>of</strong> this chapter and discuss advantages and<br />
disadvantages <strong>of</strong> this approach in Section 3.6.<br />
3.2 Prerequisites for <strong>Cost</strong>-<strong>Based</strong> <strong>Optimization</strong><br />
<strong>Based</strong> on the problem <strong>of</strong> imperative integration flows, the prerequisites <strong>of</strong> cost-based optimization<br />
are two-fold. On the one side, the operator dependency analysis is required in<br />
order to ensure correctness <strong>of</strong> plan rewriting. On the other side, an accurate cost model<br />
is required in order to allow for precise cost estimation when comparing alternative plans<br />
<strong>of</strong> an integration flow. In this section, we address both fundamental requirements.<br />
3.2.1 Dependency Analysis<br />
When rewriting plans, we have to preserve semantic correctness. Here, semantic correctness<br />
is used in the sense <strong>of</strong> preventing the external behavior from being changed.<br />
This is comparable to snapshot isolation in DBMS [CRF08] or in replication scenarios<br />
[DS06, LKPMJP05]. We introduce the dependency analysis for integration flows that resembles<br />
similar dependency models from areas like compiler construction for programming<br />
languages [Muc97] and computational engineering.<br />
The dependency analysis (based on the analysis <strong>of</strong> control-flow and data-flow) is executed<br />
once during the initial deployment <strong>of</strong> a plan in order to generate the so-called<br />
dependency graph DG(P ) <strong>of</strong> a plan P . All <strong>of</strong> our optimization techniques use these operator<br />
dependencies in order to determine whether or not rewriting is possible. In detail,<br />
we distinguish three dependency types:<br />
δm D<br />
• Data Dependency o<br />
1<br />
j −→ oi : Operator o j depends on (reads as input) the message<br />
m 1 that has been modified or created by operator o i (read after write).<br />
δm O<br />
• Output dependency o<br />
1<br />
j −→ oi : Both, operator o j and operator o i , write their results<br />
to message m 1 , where operator o j is a temporal successor <strong>of</strong> o i (m 1 is overwritten<br />
multiple times, write after write). However, the variable might be read in between.<br />
δm −1<br />
• Anti-dependency o<br />
1<br />
j −→ oi : Operator o j modifies the message m 1 , while operator<br />
o i —as a predecessor <strong>of</strong> o j —depends on this message (before m 1 is written, it is<br />
referenced, write after read).<br />
<strong>Based</strong> on this distinction <strong>of</strong> dependency types, the dependency graph DG(P ) is constructed<br />
using three basic rules. First, a data dependency is created if the output variable<br />
<strong>of</strong> an operator is one <strong>of</strong> the input variables <strong>of</strong> a following operator. Second, if two operators<br />
have the same output variable and if this variable is not written in between, an<br />
output dependency is created between the two operators. Third, an anti-dependency is<br />
created if the output variable <strong>of</strong> an operator is the input variable <strong>of</strong> a previous operator<br />
and if this variable is not written in between. Hence, an anti-dependency can only occur<br />
if an output dependency exist. It follows that, for example, output dependencies are subsumed<br />
by other output dependencies, i.e., one operator is involved at most in one output<br />
dependency per data object.<br />
36