Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
5 Multi-Flow <strong>Optimization</strong><br />
not allow a discrete counting <strong>of</strong> outrun messages due to the possible outrun <strong>of</strong> partitions.<br />
5.2.2 Deriving Partitioning Schemes<br />
The partitioning scheme in terms <strong>of</strong> the optimal layout <strong>of</strong> the partition tree is derived<br />
automatically. This includes (1) deriving candidate partitioning attributes from the given<br />
plan and (2) to find the optimal partitioning scheme for the overall partition tree.<br />
Candidate partition attributes are derived from the single operators o i <strong>of</strong> plan P . We<br />
realize this by searching for attributes that are involved in predicates, expressions and<br />
dynamic parameter assignments. This is a linear search over all operators with O(m).<br />
Due to different semantics <strong>of</strong> those attributes, we distinguish between the following three<br />
types <strong>of</strong> partitioning attributes, which have been introduced in Definition 5.1:<br />
1. Value: This scheme causes a data partitioning by exact value. Thus, for all 1/sel<br />
distinct values <strong>of</strong> this attribute, a partition is used. An example for this type is an<br />
equality predicate <strong>of</strong> a query to an external system (Assign and Invoke operator).<br />
2. Value List: Due to disjunctive predicates <strong>of</strong> external queries or local Switch operators,<br />
we can also use a list <strong>of</strong> exact values with or-semantics.<br />
3. Range: According to range query predicates or inequalities range partitioning is used.<br />
Examples for this type are expressions <strong>of</strong> Switch operators or range predicates <strong>of</strong><br />
queries to external systems (Assign and Invoke operator).<br />
After having derived the set <strong>of</strong> candidate partitioning attributes and the type <strong>of</strong> each<br />
partitioning attribute, we need to select candidates that are advantageous to use. First,<br />
we remove all candidates, where a partitioning attribute refers to externally loaded data<br />
because these attribute values are not present for incoming messages at the inbound side <strong>of</strong><br />
the integration platform. Second, we compare the benefit <strong>of</strong> using a partitioning attribute<br />
(see Section 5.3) with a user-specified cost reduction threshold τ and remove all candidates<br />
that are evaluated as being below this threshold.<br />
<strong>Based</strong> on the set <strong>of</strong> partitioning attributes, we create a concrete partitioning scheme for<br />
the partition tree. For h partitioning attributes, there are h! different partitioning schemes.<br />
Due to this factorial complexity, we use a heuristic for finding the optimal scheme. The<br />
intuition is to minimize the number <strong>of</strong> partitions in the index. Assuming no correlations 14 ,<br />
we order the index attributes according to their selectivities with<br />
min<br />
h∑<br />
|b ∈ ba i | iff sel(ba 1 ) ≥ sel(ba i ) ≥ sel(ba h ). (5.1)<br />
i=1<br />
Hence, finding the best partitioning scheme exhibits a complexity <strong>of</strong> O(h log h) due to<br />
the requirement <strong>of</strong> sorting the h partitioning attributes. Another approach would be to<br />
take the additional costs <strong>of</strong> rewritten plans into account. However, the costs <strong>of</strong> additional<br />
operators (splitting and merging <strong>of</strong> partitions) have shown to be negligible.<br />
Example 5.5 (Minimizing the Number <strong>of</strong> Partitions). Recall Example 5.4 and assume<br />
the partitioning attributes ba 1 (customer, value) and ba 2 (total price, range) as well<br />
14 The MFO approach works also for correlated partitioning attributes, where queue management might become<br />
more expensive due to mis-estimated selectivities. Alternatively, the correlation table introduced<br />
in Subsection 3.3.4 could be used for correlation-aware ordering <strong>of</strong> partitioning attributes.<br />
138