Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3.4 <strong>Optimization</strong> Techniques<br />
[30ms,<br />
P(A)]<br />
@type='MATMAS04'<br />
Translation (o3)<br />
[in: msg1, out: msg2]<br />
Receive (o1)<br />
[service: s3, out: msg1]<br />
Switch (o2)<br />
[in: msg1]<br />
[30ms,<br />
P(B)]<br />
@type='MATMAS05'<br />
Translation (o5)<br />
[in: msg1, out: msg2]<br />
Assign (o4)<br />
[in: msg2, out: msg3]<br />
Assign (o6)<br />
[in: msg2, out: msg3]<br />
Invoke (o7)<br />
[service s1, in: msg3]<br />
Assign (o8)<br />
[in: msg2, out: msg4]<br />
Invoke (o9)<br />
[service s2, in: msg4]<br />
(a) Plan P 1 (b) Reordering Switch Paths (c) Merging Switch Paths<br />
Figure 3.15: Example Reordering and Merging <strong>of</strong> Switch Paths<br />
paths (assuming non-disjoint expressions, e.g., A : var1 < x and B : var1 < y), where the<br />
total costs are independent <strong>of</strong> the path probabilities because the XPath expression is only<br />
evaluated once. Therefore, we benefit from merging if P (A) < 1.<br />
Finally, note that the monitored path probabilities are conditional probabilities due to<br />
the ordered if-elseif-else semantics <strong>of</strong> the Switch operator. For example, we monitor the<br />
relative frequency <strong>of</strong> P (path 1 ) but the conditional frequency <strong>of</strong> P (path 2 |path 1 ). Please,<br />
refer to Subsection 3.3.4 on how to estimate conditional probabilities in this context.<br />
Selection Reordering<br />
Similar to traditional query processing, reordering <strong>of</strong> selective operators such as Selection,<br />
Projection (distinct), Groupby, Join, and Setoperation (distinct) is important in order<br />
to find the optimal plan that reduces the amount <strong>of</strong> processed data as early as possible.<br />
In contrast to existing approaches, in the context <strong>of</strong> integration flows, the control-flow<br />
semantics must be taken into account when evaluating selective operators. Essentially,<br />
this control-flow awareness applies to all selective data-flow-oriented operators. However,<br />
we use the technique WD4: Early Selection Application in order to explain this controlflow-awareness.<br />
The core idea <strong>of</strong> selection reordering is to reduce the amount <strong>of</strong> processed data by<br />
reordering Selection operators by their selectivity f oi = |ds out |/|ds in |, where f oi ∈ [0, 1].<br />
The costs <strong>of</strong> a single Selection operator is given by |ds in |. Thus, the costs <strong>of</strong> a sequence<br />
<strong>of</strong> Selection operators are determined by<br />
C(P ) =<br />
m∑<br />
|ds in (o i )| =<br />
i=1<br />
⎛<br />
⎞<br />
m∑ ∏i−1<br />
⎝ f oj · |ds in (o 1 )| ⎠ . (3.22)<br />
i=1<br />
This implies that the order <strong>of</strong> Selection operators is optimal if f oi ≤ f oi+1 . Due to<br />
the problem <strong>of</strong> data correlation, the first optimization <strong>of</strong> a plan orders the Selection<br />
operators according to this optimality condition, while all subsequent optimization steps<br />
use the introduced correlation table for correlation-aware incremental re-ordering.<br />
j=1<br />
67