25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong> <strong>Flows</strong><br />

period is longer than the sliding time window size (∆t ≥ ∆w). In detail, we aggregated<br />

700,000 statistics (execution times W (o i ) only) and we observed that all statistics were<br />

aggregated in less than 20 ms. The single aggregation methods differ only slightly in their<br />

execution time, where MA is the fastest method but only minor differences are observable.<br />

If ∆t < ∆w or no ∆w is used, incremental statistics maintenance is required. Thus, we<br />

repeated the experiment with our incremental aggregation methods. When comparing full<br />

and incremental maintenance, we see that the incremental methods are a factor <strong>of</strong> 1.5 to<br />

3 slower than the full methods because they require additional computation efforts for<br />

producing valid intermediate results and for many method invocations. EMA is the fastest<br />

incremental method based on its incremental nature. Our Estimator comprises all <strong>of</strong> these<br />

aggregation methods and some additional infrastructural functionalities, where we use the<br />

incremental EMA as default aggregation method. The maintenance <strong>of</strong> all three statistics<br />

(|ds in |,|ds out |, and W (o i )) for all plan instances <strong>of</strong> the test set (2,100,000 statistic tuples)<br />

using our Estimator is illustrated as Estimator (EMA). In conclusion, the overhead for<br />

statistics maintenance during the full comparison scenario was 106 ms. This is negligible<br />

compared to the cumulative execution time <strong>of</strong> 140 min in the optimized case.<br />

Workload Adaptation<br />

Due to changing workload characteristics, the sensibility <strong>of</strong> workload adaptation has high<br />

importance. According to Subsection 3.3.3, there are three possibilities to influence the<br />

sensibility <strong>of</strong> workload adaptation: (1) the workload sliding time window size ∆w, (2) the<br />

optimization period ∆t, and (3) the workload aggregation method Agg. We evaluated<br />

their influence in the following series <strong>of</strong> experiments.<br />

Figure 3.27: Workload Adaptation Delays<br />

Figure 3.27 shows the results <strong>of</strong> an experiment, where we executed n = 20,000 instances<br />

<strong>of</strong> plan P 3 and a modified plan P 3 ′ (with eager group-by) with disabled periodical reoptimization.<br />

After n = 5,000 and n = 15,000 instances, we changed the cardinality <strong>of</strong><br />

one <strong>of</strong> two input data sets (workload changes WC1 and WC2). While in the first part,<br />

the eager group-by was most efficient, the simple join and group-by performed better after<br />

WC1. We fixed a sliding window size <strong>of</strong> ∆w = 5,000 s and MA as the workload aggregation<br />

method. It took 2,100 plan instances to adapt to the workload shift and the plan changed<br />

(PC1 at break even point between estimated plan costs). This adaptation delay depends<br />

on the used sliding time window size ∆w and the aggregation method.<br />

82

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!