25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.4 <strong>Optimization</strong> Techniques<br />

[30ms,<br />

P(A)]<br />

@type='MATMAS04'<br />

Translation (o3)<br />

[in: msg1, out: msg2]<br />

Receive (o1)<br />

[service: s3, out: msg1]<br />

Switch (o2)<br />

[in: msg1]<br />

[30ms,<br />

P(B)]<br />

@type='MATMAS05'<br />

Translation (o5)<br />

[in: msg1, out: msg2]<br />

Assign (o4)<br />

[in: msg2, out: msg3]<br />

Assign (o6)<br />

[in: msg2, out: msg3]<br />

Invoke (o7)<br />

[service s1, in: msg3]<br />

Assign (o8)<br />

[in: msg2, out: msg4]<br />

Invoke (o9)<br />

[service s2, in: msg4]<br />

(a) Plan P 1 (b) Reordering Switch Paths (c) Merging Switch Paths<br />

Figure 3.15: Example Reordering and Merging <strong>of</strong> Switch Paths<br />

paths (assuming non-disjoint expressions, e.g., A : var1 < x and B : var1 < y), where the<br />

total costs are independent <strong>of</strong> the path probabilities because the XPath expression is only<br />

evaluated once. Therefore, we benefit from merging if P (A) < 1.<br />

Finally, note that the monitored path probabilities are conditional probabilities due to<br />

the ordered if-elseif-else semantics <strong>of</strong> the Switch operator. For example, we monitor the<br />

relative frequency <strong>of</strong> P (path 1 ) but the conditional frequency <strong>of</strong> P (path 2 |path 1 ). Please,<br />

refer to Subsection 3.3.4 on how to estimate conditional probabilities in this context.<br />

Selection Reordering<br />

Similar to traditional query processing, reordering <strong>of</strong> selective operators such as Selection,<br />

Projection (distinct), Groupby, Join, and Setoperation (distinct) is important in order<br />

to find the optimal plan that reduces the amount <strong>of</strong> processed data as early as possible.<br />

In contrast to existing approaches, in the context <strong>of</strong> integration flows, the control-flow<br />

semantics must be taken into account when evaluating selective operators. Essentially,<br />

this control-flow awareness applies to all selective data-flow-oriented operators. However,<br />

we use the technique WD4: Early Selection Application in order to explain this controlflow-awareness.<br />

The core idea <strong>of</strong> selection reordering is to reduce the amount <strong>of</strong> processed data by<br />

reordering Selection operators by their selectivity f oi = |ds out |/|ds in |, where f oi ∈ [0, 1].<br />

The costs <strong>of</strong> a single Selection operator is given by |ds in |. Thus, the costs <strong>of</strong> a sequence<br />

<strong>of</strong> Selection operators are determined by<br />

C(P ) =<br />

m∑<br />

|ds in (o i )| =<br />

i=1<br />

⎛<br />

⎞<br />

m∑ ∏i−1<br />

⎝ f oj · |ds in (o 1 )| ⎠ . (3.22)<br />

i=1<br />

This implies that the order <strong>of</strong> Selection operators is optimal if f oi ≤ f oi+1 . Due to<br />

the problem <strong>of</strong> data correlation, the first optimization <strong>of</strong> a plan orders the Selection<br />

operators according to this optimality condition, while all subsequent optimization steps<br />

use the introduced correlation table for correlation-aware incremental re-ordering.<br />

j=1<br />

67

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!