25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong> <strong>Flows</strong><br />

δ msg1<br />

O<br />

δ msg1<br />

-1<br />

Assign (o1)<br />

[out: msg1]<br />

Invoke (o2)<br />

[service s4, in: msg1, out: msg2]<br />

δ msg1<br />

D δ msg1<br />

D<br />

Assign (o1)<br />

[out: msg1]<br />

Fork (o-1)<br />

δ msg2<br />

D<br />

δ msg3<br />

D<br />

Invoke (o3)<br />

[service s5, in: msg1, out: msg3]<br />

Join (o4)<br />

[in: msg2,msg3, out: msg4]<br />

Groupby (o5)<br />

[in: msg4 out: msg5]<br />

Assign (o6)<br />

[in: msg5 out: msg1]<br />

Invoke (o7)<br />

[service s5, in: msg1]<br />

δ msg4<br />

D<br />

δ msg5<br />

D<br />

δ msg1<br />

D<br />

δ msg1<br />

-1<br />

Invoke (o2)<br />

[service s4, in: msg1, out: msg2]<br />

Join (o4)<br />

[in: msg2,msg3, out: msg4]<br />

Groupby (o5)<br />

[in: msg4, out: msg5]<br />

Assign (o6)<br />

[in: msg5, out: msg1]<br />

Invoke (o7)<br />

[service s5, in: msg1]<br />

Invoke (o3)<br />

[service s5, in: msg1, out: msg3]<br />

(a) Dependency Graph DG(P 3)<br />

(b) Plan P ′ 3<br />

Figure 3.2: Example Dependency Graph and its Application<br />

While our approach focuses on the optimization <strong>of</strong> complete plans, Vrhovnik et al. use a<br />

so-called Sphere Hierarchy—in addition to a dependency analysis—to determine optimization<br />

boundaries that must not be crossed [VSS + 07]. Thus, they independently optimize<br />

partitions (spheres) <strong>of</strong> a plan. For example, the operator subsequence <strong>of</strong> a complex operator<br />

(e.g., each subflow <strong>of</strong> the Fork operator) is optimized only locally. However, since<br />

this is BPEL-specific [OAS06] (scope activity) and reduces the optimization potential, we<br />

do not restrict ourselves to these boundaries. Another approach [WPSB07] generates a<br />

minimal dependency set. For this purpose, they merge and optimize explicitly modeled<br />

dependencies. In contrast, we do not use explicitly modeled dependencies but analyze<br />

implicit dependencies (given by the data flow) in order to ensure semantic correctness. Finally,<br />

our automatic dependency-awareness reduces the development effort <strong>of</strong> integration<br />

flows and it is an essential prerequisite for both rule-based and cost-based optimization.<br />

3.2.2 <strong>Cost</strong> Model and <strong>Cost</strong> Estimation<br />

Referring back to the problem <strong>of</strong> changing workload characteristics (Problem 3.2), a tailormade<br />

cost model reflecting these workload characteristics is required as a foundation <strong>of</strong><br />

cost-based plan optimization. Due to the problem <strong>of</strong> missing statistics (Problem 3.3),<br />

execution statistics must be incrementally maintained as input for the defined cost model.<br />

In this context, the problem is to determine costs (cardinalities as well as execution times)<br />

for rewritten parts <strong>of</strong> a plan, where no statistics exist so far. Furthermore, the challenge <strong>of</strong><br />

representing (1) data-flow-, (2) control-flow-, and (3) interaction-oriented operators arises<br />

(Problem 3.1), where the different operator categories are described by different execution<br />

statistics. While for the data-flow- and interaction-oriented operators, cardinality is a<br />

widely-used metric, this is not applicable for the control-flow-oriented operators because<br />

the costs <strong>of</strong> those operators are mainly described by means <strong>of</strong> execution times. In addition,<br />

concrete costs <strong>of</strong> the interaction-oriented operators strongly depend on the involved<br />

external systems and their individual performance. Hence, also for interaction-oriented<br />

operators the execution time rather than the cardinality should be used as the metric.<br />

In consequence <strong>of</strong> the aforementioned characteristics, in this subsection, we propose the<br />

double-metric cost model [BHLW09c] for enabling precise cost estimation <strong>of</strong> a plan.<br />

<strong>Cost</strong> models in other domains typically follow a different approach. A widely used<br />

38

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!