25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.3 Periodical Re-<strong>Optimization</strong><br />

We can monitor all cardinalities |R|, |S|, and |T | but only the selectivities f R,S and<br />

f (R⋊⋉S),T . To estimate f R,T , we need to derive it with f R,T θ f (R⋊⋉S),T , where θ is a<br />

function representing the correlation. If we assume statistical independence <strong>of</strong> selectivities,<br />

we can set f R,T = f (R⋊⋉S),T . However, in general, the selectivities are derived from the<br />

present conjunction <strong>of</strong> join predicates. This cost comparison is applied for each subset<br />

<strong>of</strong> join operators <strong>of</strong> a plan that can be fully reordered. Note that this heuristic does not<br />

necessarily requires that the cost model exhibits the ASI property [Moe09]. In addition,<br />

simple sorting <strong>of</strong> join operands is not applicable due to the use <strong>of</strong> arbitrary correlation<br />

functions and the need for evaluating if join inputs are connected.<br />

Although this algorithm <strong>of</strong>ten produces good results, obviously, it does not guarantee<br />

to find the optimal join order. This is reasoned by (1) the restrictions <strong>of</strong> considering only<br />

nested loop joins and left-deep-join trees, (2) the selection <strong>of</strong> the first join operand based<br />

on minimum cardinality, and (3) reordering only directly connected join operands rather<br />

than partial subtrees.<br />

In contrast to our transformation-based join reordering heuristic, recent approaches <strong>of</strong><br />

heuristic query optimization [BGLJ10] use merge-based techniques, where ranked subplans<br />

are merged iteratively to an overall plan. While this bottom-up approach is advantageous<br />

for declarative queries, our transformation-based reordering is more advantageous for imperative<br />

integration flows with regard to the characteristic <strong>of</strong> an initially specified plan.<br />

In general, this heuristic join reordering algorithm exhibits a quadratic time complexity<br />

<strong>of</strong> O(m 2 ), where m denotes the number <strong>of</strong> operators with m = n − 1 and n denotes the<br />

number <strong>of</strong> joined input data sets. This is reasoned as follows. First, we iterate over all n<br />

input data sets in order to determine the minimum cardinality. Second, we iterate over all<br />

input data sets and for each, compare the costs assuming a reordering with its predecessors<br />

(similar to selection sort). Thus, in total, we execute at most<br />

n +<br />

n∑<br />

i=3<br />

(i − 2) = n2 − n<br />

2<br />

+ 1 (3.9)<br />

iterations during this algorithm. Finally, note that—except the awareness <strong>of</strong> temporal<br />

dependencies (join enumeration restrictions)—this heuristic join reordering algorithm can<br />

be applied in data management systems as well. As a result, the use <strong>of</strong> this heuristic<br />

join enumeration algorithm (combined with an extended heuristic first fit algorithm for<br />

merging parallel flows that we will describe in Section 3.4) reduces the overall complexity<br />

<strong>of</strong> the periodic plan optimization problem to polynomial time, where most <strong>of</strong> our other<br />

optimization techniques exhibit a linear or quadratic time complexity.<br />

3.3.3 Workload Adaptation Sensibility<br />

The core optimization algorithm can be influenced by a number <strong>of</strong> parameters. We can<br />

leverage these parameters in order to adjust the sensibility <strong>of</strong> adaptation to changing<br />

workload characteristics. For our core estimation approach, workload statistics <strong>of</strong> the<br />

current plan P are monitored. If an alternative plan P ′ has been created, we estimate the<br />

missing operator statistics with Ŵ (o′ i ) = C(o′ i )/C(o i) · W (o i ) using our cost model, where<br />

workload statistics <strong>of</strong> P are aggregated over the sliding time window. In this context, the<br />

following three parameters influence the sensibility <strong>of</strong> workload adaptation:<br />

Workload Sliding Time Window Size ∆w (time interval used for statistic aggregation):<br />

Monitored statistics <strong>of</strong> plan instances p i with p i ∈ [T k − ∆w, T k ] are included in the<br />

55

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!