25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.4 <strong>Optimization</strong> Techniques<br />

determining only the most time-consuming subflow rather than all subflows. A detailed<br />

cost estimation example <strong>of</strong> a rewritten subgraph (applying this technique) was given with<br />

Example 3.2 (<strong>Cost</strong> Estimation) in Subsection 3.2.2<br />

Finally, this technique should be applied after the join enumeration WD10 because<br />

it requires the optimal join order as the basis for its rewriting algorithm but it should<br />

be applied before WD9 because the join type selection might change according to the<br />

estimated cardinalities. Interdependencies to join enumeration are deferred to the next<br />

invocation <strong>of</strong> the optimizer.<br />

Set Operations with Distinctness<br />

For the Setoperation operator, we distinguish different types, among others the UNION<br />

DISTINCT and the UNION ALL. While the UNION ALL can be computed with low costs <strong>of</strong><br />

|ds in1 | + |ds in2 |, the costs <strong>of</strong> UNION DISTINCT are computed by<br />

|ds in1 | + |ds in2 | · |ds out|<br />

. (3.27)<br />

2<br />

We transfer the first data set completely into the result (|ds in1 |) and then, for each tuple<br />

<strong>of</strong> the second data set ds in2 , we need to check whether or not this tuple is already in the<br />

result. In the average case, this causes costs <strong>of</strong> |ds out |/2 for each tuple. As a result, UNION<br />

DISTINCT has a quadratic time complexity <strong>of</strong> O(N 2 ).<br />

The core idea <strong>of</strong> WD11: Setoperation-Type Selection in combination with the WD8:<br />

Orderby Insertion / Removal is to sort both input data sets by their distinct key in order<br />

to enable the application <strong>of</strong> an efficient merge algorithm for ensuring distinctness. Hence,<br />

only costs <strong>of</strong> |ds in1 | + |ds in2 | (similar to a UNION ALL) would be necessary to compute<br />

the UNION DISTINCT. Including the costs for sorting, the result is a time complexity <strong>of</strong><br />

O(N log N). Despite the internal order-preserving XML representation, sorting <strong>of</strong> message<br />

content is applicable in dependence on the source adapter types <strong>of</strong> these messages.<br />

In the following, we consider the required optimality conditions. There are three alternative<br />

subplans for a union distinct R ∪ S. First, there is the normal union distinct<br />

operator with costs that are given by C(R ∪ S) = |R| + |S| · |R ∪ S|/2 (two plans due to<br />

asymmetric costs), where |R| ≤ |R ∪ S| ≤ |R| + |S| holds. Second, we can sort both input<br />

data sets and apply a merge algorithm (third plan), where the costs are computed by<br />

C (sort(R) ∪ M sort(S)) = |R| + |S| + |R| · log 2 |R| + |S| · log 2 |S|. (3.28)<br />

In conclusion, for arbitrary cardinalities, the optimality conditions are |R| ≥ |S| and<br />

|R| + |S| ·<br />

|R ∪ S|<br />

2<br />

≤ |R| + |S| + |R| · log 2 |R| + |S| · log 2 |S|<br />

|R ∪ S| ≤ 2 + 2 · |R| · log 2|R|<br />

|S|<br />

+ 2 · log 2 |S|.<br />

(3.29)<br />

We see that this decision depends on the union output cardinality and both input cardinalities.<br />

If one input is known to be sorted, the corresponding Orderby operator is omitted<br />

and the optimality conditions are modified accordingly.<br />

Example 3.15 (Setoperation-Type Selection). Assume our example plan P 6 (see Figure<br />

3.18(a)) that includes two Setoperation operators <strong>of</strong> type UNION DISTINCT. Using<br />

71

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!