Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong> <strong>Flows</strong><br />
into optimization. Finally, the Action operator exhibits abstract costs in the form <strong>of</strong><br />
|ds in | + |ds out |. However, the Action operator will not be included in any rewriting (except<br />
for parallelism) because it executes arbitrary code snippets and thus, is treated as a<br />
black box by the optimizer.<br />
2: Execution Times: In a second step, we monitor statistics (e.g., execution times<br />
and cardinalities) in order to weight the mentioned abstract costs <strong>of</strong> interaction- and dataflow-oriented<br />
operators. With the aim to estimate the costs for a newly created plan P ′ ,<br />
we aggregate the costs C(o ′ i ) and C(o i) <strong>of</strong> the single operators weighted with the execution<br />
statistics W (o i ) <strong>of</strong> the current plan P . Thus, we estimate missing statistics with<br />
Ŵ (o ′ i) = C(o′ i )<br />
C(o i ) · W (o i). (3.1)<br />
For control-flow-oriented operators, we directly estimate the execution time. The costs for<br />
the complex control-flow-oriented Switch operator can be computed by<br />
⎛ ⎛<br />
⎞⎞<br />
n∑<br />
i∑<br />
⎝P (path i ) · ⎝ W ( ) ∑m i<br />
expr pathj + W (o i,k ) ⎠⎠ , (3.2)<br />
i=1<br />
j=1<br />
where we require switch path probabilities P (path i ) (relative frequencies) for all n paths,<br />
weighted costs for path expression evaluation W ( )<br />
expr pathj because the evaluation <strong>of</strong><br />
these expressions (e.g., XPath) can be cost-intensive as well as weighted costs for the<br />
m i operators <strong>of</strong> each path. Here, the second summation goes only up to j = i because<br />
the evaluation is aborted if we find a true condition due to the if-elseif-else semantic <strong>of</strong><br />
this operator. Similar, the costs for the complex Fork operator (concurrent subflows <strong>of</strong><br />
arbitrary operators) are computed by the most time-consuming subflow:<br />
⎛<br />
⎞<br />
n<br />
max<br />
i=1<br />
k=1<br />
∑m i<br />
⎝ W (o i,j ) + i · W (Start Thread) ⎠ , (3.3)<br />
j=1<br />
where W (Start T hread) denotes a constant, used to represent the required time for creation<br />
and start <strong>of</strong> a thread. When computing the costs for the Iteration operator, with<br />
r ·<br />
n∑<br />
W (o i ) , (3.4)<br />
i=1<br />
the average number <strong>of</strong> iteration loops r is required as well. Further, the waiting time <strong>of</strong><br />
the Delay operator is also taken into account. Finally, the Signal operator has to be<br />
mentioned, where costs (needed for raising an exception) are represented as a constant.<br />
Putting it all together, this cost model has several fundamental properties. Some <strong>of</strong><br />
these properties are used by different chapters <strong>of</strong> this thesis.<br />
• Self-Adjustment: Due to weighting with monitored execution times, the cost model is<br />
self-adjusting with regard to the behavior <strong>of</strong> different operators according to changing<br />
workload characteristics. Thus, the cost model adjusts itself to the present environment<br />
(hardware platform, behavior <strong>of</strong> external systems). Especially, this behavior<br />
<strong>of</strong> external systems or different queries to these systems and thus, also <strong>of</strong> network<br />
properties could not be taken into account by an empirical cost model that is only<br />
based on cardinalities.<br />
42