Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
3.4 <strong>Optimization</strong> Techniques<br />
2,205.05, where for example, the cost <strong>of</strong> the Switch operator o 3 are determined with<br />
C(P ) = 1,000 · 0.5 · 0.7 = 350 by the input data size and the selectivity <strong>of</strong> previous operators.<br />
If we would reorder the Selection operators traditionally, we would get the operator<br />
sequence (o 7 , o 8 , o 1 , o 2 ) shown in Figure 3.16(b). The costs <strong>of</strong> this sequences (using Equation<br />
3.23) are given by C(P ′ ) = 1,000+0.05·1,000+1,000·0.965+1,000·0.965·0.4+1,000·<br />
0.965 · 0.4 · 0.5 = 2,594 and thus, the costs are higher than the initial costs. In contrast,<br />
if we reorder Selection operators in a control-flow-aware manner, we get the operator<br />
sequences (o 8 , o 1 , o 2 , o 7 ) shown in Figure 3.16(c). The costs <strong>of</strong> this sequence are computed<br />
by C(P ′′ ) = 1,000 + 1,000 · 0.4 + 1,000 · 0.4 · 0.5 + 200 + 1,000 · 0.4 · 0.5 · 0.7 · 0.05 = 1,807.<br />
As a result, control-flow-aware selection reordering reduced the costs from 2,594 to 1,807.<br />
To summarize, the control-flow aware selection reordering exhibits a slightly worse time<br />
complexity than the traditional selection ordering. However, the opportunity <strong>of</strong> a significant<br />
plan cost reduction justifies the application <strong>of</strong> this technique. Finally, note that<br />
the same concept <strong>of</strong> effective operator selectivities (P (o i ) · f oi + (1 − P (o i ))) is in general,<br />
also applicable for all selective operators such as Groupby, Join, Setoperation, and<br />
Projection. For the sake <strong>of</strong> a clear presentation, we will not mention this during the<br />
discussion <strong>of</strong> the following data-flow-oriented optimization techniques.<br />
Eager Group-By and Pre-Aggregation<br />
Similar to reordering selective operators, it can be more efficient to apply specific operators<br />
as early as possible in order to reduce the cardinalities <strong>of</strong> intermediate results, where the<br />
earliest possible position can be determined using the dependency graph. The core concept<br />
is to reduce the cardinality <strong>of</strong> intermediate results and thus, to improve the execution time<br />
<strong>of</strong> the following operators.<br />
We concentrate only on WD6: Early Groupby Application, which was considered for<br />
DBMS [CS94] (with complete [YL95] or partial [Lar02] aggregation) and for EII (Enterprise<br />
Information <strong>Integration</strong>) frameworks (adjustable partial window pre-aggregation<br />
[IHW04]). For early group-by application, a sequence <strong>of</strong> Join and Groupby is rewritten<br />
to a construct <strong>of</strong> Groupby and Join (invariant group-by also known as eager group-by) or<br />
to a construct <strong>of</strong> Groupby, Join, and Groupby (pre-aggregation). The assumption is that<br />
it can be more efficient—with respect to the monitored cardinalities and selectivities—to<br />
compute the Join on pre-aggregated partitions rather than on the single tuples. The<br />
precondition is that the list <strong>of</strong> grouping attributes G contains the Join predicate jp.<br />
First <strong>of</strong> all, we need some additional notation, where we use the relational algebra for<br />
simplicity <strong>of</strong> presentation. Assume a join <strong>of</strong> n data sets (with arbitrary multiplicities) and<br />
a subsequent group-by, where the join predicate and group-by attributes are equal with<br />
γ F (X);A1 (R ⋊⋉ R.A1 =S.A 1<br />
S). For left-deep join trees, without cross products, and only one<br />
join implementation, there are then 4n! possible plans because for each join operator, four<br />
possibilities exist to apply the Groupby operator (for invariant group-by, the final γ in<br />
P c -P f can be omitted):<br />
P a (opt) : γ(R ⋊⋉ S) P c : γ((γR) ⋊⋉ S) P e : γ(R ⋊⋉ (γS)) P g : (γR) ⋊⋉ (γS)<br />
P b : γ(S ⋊⋉ R) P d : γ(S ⋊⋉ (γR)) P f : γ((γS) ⋊⋉ R) P h : (γS) ⋊⋉ (γR).<br />
Without loss <strong>of</strong> generality, we assume n = 2 and concentrate on the four possibilities<br />
to arrange group-by and join for a given join order. In addition to the join costs<br />
69