Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3.3 Periodical Re-<strong>Optimization</strong><br />
We can monitor all cardinalities |R|, |S|, and |T | but only the selectivities f R,S and<br />
f (R⋊⋉S),T . To estimate f R,T , we need to derive it with f R,T θ f (R⋊⋉S),T , where θ is a<br />
function representing the correlation. If we assume statistical independence <strong>of</strong> selectivities,<br />
we can set f R,T = f (R⋊⋉S),T . However, in general, the selectivities are derived from the<br />
present conjunction <strong>of</strong> join predicates. This cost comparison is applied for each subset<br />
<strong>of</strong> join operators <strong>of</strong> a plan that can be fully reordered. Note that this heuristic does not<br />
necessarily requires that the cost model exhibits the ASI property [Moe09]. In addition,<br />
simple sorting <strong>of</strong> join operands is not applicable due to the use <strong>of</strong> arbitrary correlation<br />
functions and the need for evaluating if join inputs are connected.<br />
Although this algorithm <strong>of</strong>ten produces good results, obviously, it does not guarantee<br />
to find the optimal join order. This is reasoned by (1) the restrictions <strong>of</strong> considering only<br />
nested loop joins and left-deep-join trees, (2) the selection <strong>of</strong> the first join operand based<br />
on minimum cardinality, and (3) reordering only directly connected join operands rather<br />
than partial subtrees.<br />
In contrast to our transformation-based join reordering heuristic, recent approaches <strong>of</strong><br />
heuristic query optimization [BGLJ10] use merge-based techniques, where ranked subplans<br />
are merged iteratively to an overall plan. While this bottom-up approach is advantageous<br />
for declarative queries, our transformation-based reordering is more advantageous for imperative<br />
integration flows with regard to the characteristic <strong>of</strong> an initially specified plan.<br />
In general, this heuristic join reordering algorithm exhibits a quadratic time complexity<br />
<strong>of</strong> O(m 2 ), where m denotes the number <strong>of</strong> operators with m = n − 1 and n denotes the<br />
number <strong>of</strong> joined input data sets. This is reasoned as follows. First, we iterate over all n<br />
input data sets in order to determine the minimum cardinality. Second, we iterate over all<br />
input data sets and for each, compare the costs assuming a reordering with its predecessors<br />
(similar to selection sort). Thus, in total, we execute at most<br />
n +<br />
n∑<br />
i=3<br />
(i − 2) = n2 − n<br />
2<br />
+ 1 (3.9)<br />
iterations during this algorithm. Finally, note that—except the awareness <strong>of</strong> temporal<br />
dependencies (join enumeration restrictions)—this heuristic join reordering algorithm can<br />
be applied in data management systems as well. As a result, the use <strong>of</strong> this heuristic<br />
join enumeration algorithm (combined with an extended heuristic first fit algorithm for<br />
merging parallel flows that we will describe in Section 3.4) reduces the overall complexity<br />
<strong>of</strong> the periodic plan optimization problem to polynomial time, where most <strong>of</strong> our other<br />
optimization techniques exhibit a linear or quadratic time complexity.<br />
3.3.3 Workload Adaptation Sensibility<br />
The core optimization algorithm can be influenced by a number <strong>of</strong> parameters. We can<br />
leverage these parameters in order to adjust the sensibility <strong>of</strong> adaptation to changing<br />
workload characteristics. For our core estimation approach, workload statistics <strong>of</strong> the<br />
current plan P are monitored. If an alternative plan P ′ has been created, we estimate the<br />
missing operator statistics with Ŵ (o′ i ) = C(o′ i )/C(o i) · W (o i ) using our cost model, where<br />
workload statistics <strong>of</strong> P are aggregated over the sliding time window. In this context, the<br />
following three parameters influence the sensibility <strong>of</strong> workload adaptation:<br />
Workload Sliding Time Window Size ∆w (time interval used for statistic aggregation):<br />
Monitored statistics <strong>of</strong> plan instances p i with p i ∈ [T k − ∆w, T k ] are included in the<br />
55