25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.4 <strong>Optimization</strong> Techniques<br />

2,205.05, where for example, the cost <strong>of</strong> the Switch operator o 3 are determined with<br />

C(P ) = 1,000 · 0.5 · 0.7 = 350 by the input data size and the selectivity <strong>of</strong> previous operators.<br />

If we would reorder the Selection operators traditionally, we would get the operator<br />

sequence (o 7 , o 8 , o 1 , o 2 ) shown in Figure 3.16(b). The costs <strong>of</strong> this sequences (using Equation<br />

3.23) are given by C(P ′ ) = 1,000+0.05·1,000+1,000·0.965+1,000·0.965·0.4+1,000·<br />

0.965 · 0.4 · 0.5 = 2,594 and thus, the costs are higher than the initial costs. In contrast,<br />

if we reorder Selection operators in a control-flow-aware manner, we get the operator<br />

sequences (o 8 , o 1 , o 2 , o 7 ) shown in Figure 3.16(c). The costs <strong>of</strong> this sequence are computed<br />

by C(P ′′ ) = 1,000 + 1,000 · 0.4 + 1,000 · 0.4 · 0.5 + 200 + 1,000 · 0.4 · 0.5 · 0.7 · 0.05 = 1,807.<br />

As a result, control-flow-aware selection reordering reduced the costs from 2,594 to 1,807.<br />

To summarize, the control-flow aware selection reordering exhibits a slightly worse time<br />

complexity than the traditional selection ordering. However, the opportunity <strong>of</strong> a significant<br />

plan cost reduction justifies the application <strong>of</strong> this technique. Finally, note that<br />

the same concept <strong>of</strong> effective operator selectivities (P (o i ) · f oi + (1 − P (o i ))) is in general,<br />

also applicable for all selective operators such as Groupby, Join, Setoperation, and<br />

Projection. For the sake <strong>of</strong> a clear presentation, we will not mention this during the<br />

discussion <strong>of</strong> the following data-flow-oriented optimization techniques.<br />

Eager Group-By and Pre-Aggregation<br />

Similar to reordering selective operators, it can be more efficient to apply specific operators<br />

as early as possible in order to reduce the cardinalities <strong>of</strong> intermediate results, where the<br />

earliest possible position can be determined using the dependency graph. The core concept<br />

is to reduce the cardinality <strong>of</strong> intermediate results and thus, to improve the execution time<br />

<strong>of</strong> the following operators.<br />

We concentrate only on WD6: Early Groupby Application, which was considered for<br />

DBMS [CS94] (with complete [YL95] or partial [Lar02] aggregation) and for EII (Enterprise<br />

Information <strong>Integration</strong>) frameworks (adjustable partial window pre-aggregation<br />

[IHW04]). For early group-by application, a sequence <strong>of</strong> Join and Groupby is rewritten<br />

to a construct <strong>of</strong> Groupby and Join (invariant group-by also known as eager group-by) or<br />

to a construct <strong>of</strong> Groupby, Join, and Groupby (pre-aggregation). The assumption is that<br />

it can be more efficient—with respect to the monitored cardinalities and selectivities—to<br />

compute the Join on pre-aggregated partitions rather than on the single tuples. The<br />

precondition is that the list <strong>of</strong> grouping attributes G contains the Join predicate jp.<br />

First <strong>of</strong> all, we need some additional notation, where we use the relational algebra for<br />

simplicity <strong>of</strong> presentation. Assume a join <strong>of</strong> n data sets (with arbitrary multiplicities) and<br />

a subsequent group-by, where the join predicate and group-by attributes are equal with<br />

γ F (X);A1 (R ⋊⋉ R.A1 =S.A 1<br />

S). For left-deep join trees, without cross products, and only one<br />

join implementation, there are then 4n! possible plans because for each join operator, four<br />

possibilities exist to apply the Groupby operator (for invariant group-by, the final γ in<br />

P c -P f can be omitted):<br />

P a (opt) : γ(R ⋊⋉ S) P c : γ((γR) ⋊⋉ S) P e : γ(R ⋊⋉ (γS)) P g : (γR) ⋊⋉ (γS)<br />

P b : γ(S ⋊⋉ R) P d : γ(S ⋊⋉ (γR)) P f : γ((γS) ⋊⋉ R) P h : (γS) ⋊⋉ (γR).<br />

Without loss <strong>of</strong> generality, we assume n = 2 and concentrate on the four possibilities<br />

to arrange group-by and join for a given join order. In addition to the join costs<br />

69

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!