Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong> <strong>Flows</strong><br />
δ msg1<br />
O<br />
δ msg1<br />
-1<br />
Assign (o1)<br />
[out: msg1]<br />
Invoke (o2)<br />
[service s4, in: msg1, out: msg2]<br />
δ msg1<br />
D δ msg1<br />
D<br />
Assign (o1)<br />
[out: msg1]<br />
Fork (o-1)<br />
δ msg2<br />
D<br />
δ msg3<br />
D<br />
Invoke (o3)<br />
[service s5, in: msg1, out: msg3]<br />
Join (o4)<br />
[in: msg2,msg3, out: msg4]<br />
Groupby (o5)<br />
[in: msg4 out: msg5]<br />
Assign (o6)<br />
[in: msg5 out: msg1]<br />
Invoke (o7)<br />
[service s5, in: msg1]<br />
δ msg4<br />
D<br />
δ msg5<br />
D<br />
δ msg1<br />
D<br />
δ msg1<br />
-1<br />
Invoke (o2)<br />
[service s4, in: msg1, out: msg2]<br />
Join (o4)<br />
[in: msg2,msg3, out: msg4]<br />
Groupby (o5)<br />
[in: msg4, out: msg5]<br />
Assign (o6)<br />
[in: msg5, out: msg1]<br />
Invoke (o7)<br />
[service s5, in: msg1]<br />
Invoke (o3)<br />
[service s5, in: msg1, out: msg3]<br />
(a) Dependency Graph DG(P 3)<br />
(b) Plan P ′ 3<br />
Figure 3.2: Example Dependency Graph and its Application<br />
While our approach focuses on the optimization <strong>of</strong> complete plans, Vrhovnik et al. use a<br />
so-called Sphere Hierarchy—in addition to a dependency analysis—to determine optimization<br />
boundaries that must not be crossed [VSS + 07]. Thus, they independently optimize<br />
partitions (spheres) <strong>of</strong> a plan. For example, the operator subsequence <strong>of</strong> a complex operator<br />
(e.g., each subflow <strong>of</strong> the Fork operator) is optimized only locally. However, since<br />
this is BPEL-specific [OAS06] (scope activity) and reduces the optimization potential, we<br />
do not restrict ourselves to these boundaries. Another approach [WPSB07] generates a<br />
minimal dependency set. For this purpose, they merge and optimize explicitly modeled<br />
dependencies. In contrast, we do not use explicitly modeled dependencies but analyze<br />
implicit dependencies (given by the data flow) in order to ensure semantic correctness. Finally,<br />
our automatic dependency-awareness reduces the development effort <strong>of</strong> integration<br />
flows and it is an essential prerequisite for both rule-based and cost-based optimization.<br />
3.2.2 <strong>Cost</strong> Model and <strong>Cost</strong> Estimation<br />
Referring back to the problem <strong>of</strong> changing workload characteristics (Problem 3.2), a tailormade<br />
cost model reflecting these workload characteristics is required as a foundation <strong>of</strong><br />
cost-based plan optimization. Due to the problem <strong>of</strong> missing statistics (Problem 3.3),<br />
execution statistics must be incrementally maintained as input for the defined cost model.<br />
In this context, the problem is to determine costs (cardinalities as well as execution times)<br />
for rewritten parts <strong>of</strong> a plan, where no statistics exist so far. Furthermore, the challenge <strong>of</strong><br />
representing (1) data-flow-, (2) control-flow-, and (3) interaction-oriented operators arises<br />
(Problem 3.1), where the different operator categories are described by different execution<br />
statistics. While for the data-flow- and interaction-oriented operators, cardinality is a<br />
widely-used metric, this is not applicable for the control-flow-oriented operators because<br />
the costs <strong>of</strong> those operators are mainly described by means <strong>of</strong> execution times. In addition,<br />
concrete costs <strong>of</strong> the interaction-oriented operators strongly depend on the involved<br />
external systems and their individual performance. Hence, also for interaction-oriented<br />
operators the execution time rather than the cardinality should be used as the metric.<br />
In consequence <strong>of</strong> the aforementioned characteristics, in this subsection, we propose the<br />
double-metric cost model [BHLW09c] for enabling precise cost estimation <strong>of</strong> a plan.<br />
<strong>Cost</strong> models in other domains typically follow a different approach. A widely used<br />
38