25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5 Multi-Flow <strong>Optimization</strong><br />

not allow a discrete counting <strong>of</strong> outrun messages due to the possible outrun <strong>of</strong> partitions.<br />

5.2.2 Deriving Partitioning Schemes<br />

The partitioning scheme in terms <strong>of</strong> the optimal layout <strong>of</strong> the partition tree is derived<br />

automatically. This includes (1) deriving candidate partitioning attributes from the given<br />

plan and (2) to find the optimal partitioning scheme for the overall partition tree.<br />

Candidate partition attributes are derived from the single operators o i <strong>of</strong> plan P . We<br />

realize this by searching for attributes that are involved in predicates, expressions and<br />

dynamic parameter assignments. This is a linear search over all operators with O(m).<br />

Due to different semantics <strong>of</strong> those attributes, we distinguish between the following three<br />

types <strong>of</strong> partitioning attributes, which have been introduced in Definition 5.1:<br />

1. Value: This scheme causes a data partitioning by exact value. Thus, for all 1/sel<br />

distinct values <strong>of</strong> this attribute, a partition is used. An example for this type is an<br />

equality predicate <strong>of</strong> a query to an external system (Assign and Invoke operator).<br />

2. Value List: Due to disjunctive predicates <strong>of</strong> external queries or local Switch operators,<br />

we can also use a list <strong>of</strong> exact values with or-semantics.<br />

3. Range: According to range query predicates or inequalities range partitioning is used.<br />

Examples for this type are expressions <strong>of</strong> Switch operators or range predicates <strong>of</strong><br />

queries to external systems (Assign and Invoke operator).<br />

After having derived the set <strong>of</strong> candidate partitioning attributes and the type <strong>of</strong> each<br />

partitioning attribute, we need to select candidates that are advantageous to use. First,<br />

we remove all candidates, where a partitioning attribute refers to externally loaded data<br />

because these attribute values are not present for incoming messages at the inbound side <strong>of</strong><br />

the integration platform. Second, we compare the benefit <strong>of</strong> using a partitioning attribute<br />

(see Section 5.3) with a user-specified cost reduction threshold τ and remove all candidates<br />

that are evaluated as being below this threshold.<br />

<strong>Based</strong> on the set <strong>of</strong> partitioning attributes, we create a concrete partitioning scheme for<br />

the partition tree. For h partitioning attributes, there are h! different partitioning schemes.<br />

Due to this factorial complexity, we use a heuristic for finding the optimal scheme. The<br />

intuition is to minimize the number <strong>of</strong> partitions in the index. Assuming no correlations 14 ,<br />

we order the index attributes according to their selectivities with<br />

min<br />

h∑<br />

|b ∈ ba i | iff sel(ba 1 ) ≥ sel(ba i ) ≥ sel(ba h ). (5.1)<br />

i=1<br />

Hence, finding the best partitioning scheme exhibits a complexity <strong>of</strong> O(h log h) due to<br />

the requirement <strong>of</strong> sorting the h partitioning attributes. Another approach would be to<br />

take the additional costs <strong>of</strong> rewritten plans into account. However, the costs <strong>of</strong> additional<br />

operators (splitting and merging <strong>of</strong> partitions) have shown to be negligible.<br />

Example 5.5 (Minimizing the Number <strong>of</strong> Partitions). Recall Example 5.4 and assume<br />

the partitioning attributes ba 1 (customer, value) and ba 2 (total price, range) as well<br />

14 The MFO approach works also for correlated partitioning attributes, where queue management might become<br />

more expensive due to mis-estimated selectivities. Alternatively, the correlation table introduced<br />

in Subsection 3.3.4 could be used for correlation-aware ordering <strong>of</strong> partitioning attributes.<br />

138

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!