25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong> <strong>Flows</strong><br />

for typical workloads. Second, we evaluate the parameters <strong>of</strong> these workload aggregation<br />

methods. Figure 3.28(c) illustrates the influence <strong>of</strong> the parameter EMA smoothing constant<br />

α. We used a sliding time window size <strong>of</strong> ∆w = 10,000 s and illustrated the estimated costs<br />

continuously (∆t = 1 s). Clearly, an decreasing parameter α causes slower adaptation and<br />

therefore more robust estimation. However, for typical parameter settings <strong>of</strong> 0.05 to 0.001<br />

very fast but still robust adaptation can be achieved. Note that for α ∈ {0.2, 0.02, 0.002}<br />

we obtained similar results for the sliding window size <strong>of</strong> ∆w = 1,000 s from the previous<br />

experiment. However, for α = 0.0002 (with ∆w = 1,000 s) the estimates varied significantly<br />

which was caused by too few statistics in the time window in combination with<br />

a low smoothing factor such that the estimated values were significantly determined by<br />

the initial value (first statistic in the window) because the adaptation took too long. In<br />

order to analyze the influence <strong>of</strong> the sliding window size ∆w in general, we conducted an<br />

additional experiment. Figure 3.28(d) illustrates the influence <strong>of</strong> the sliding time window<br />

size, where we fixed Agg = MA and varied ∆w from 10 s to 10,000 s. Clearly, the adaptation<br />

slows down with an increasing ∆w. However, both extremes can lead to wrong (large error)<br />

estimations. The choice <strong>of</strong> the window size should be made based on the specific plan<br />

because, for example, a long-running plan or a infrequently used plan need a longer time<br />

window than plans with many instances per time period. The EMA method, typically, does<br />

not need sliding window semantics due to the time-decaying character where older items<br />

can be neglected. However, if a sliding window is used, the sliding window size ∆w should<br />

be set according to the plan and the used smoothing constant α such that enough statistics<br />

are available as already discussed. Furthermore, the optimization interval influences<br />

the re-estimation granularity. With ∆t = 1 s, we get a continuous cost function, while an<br />

increasing ∆t causes a slower adaptation because this influences the maximal delay <strong>of</strong> ∆t<br />

until re-estimation. Obviously, parameter estimators, which minimize the error between<br />

forecast values and real values could be used to determine optimal parameter values for<br />

∆t and ∆w. However, when and how to adjust these parameters is a trade-<strong>of</strong>f between<br />

additional statistic maintenance overhead and cost estimation accuracy that is beyond the<br />

scope <strong>of</strong> this thesis.<br />

With regard to precise statistic estimation, handling <strong>of</strong> correlated data and conditional<br />

probabilities are important. Therefore, we conducted an experiment in order to evaluate<br />

our lightweight correlation table approach in detail. We reused our end-to-end comparison<br />

scenario (see Figure 3.20), where we executed 100,000 instances <strong>of</strong> our example plan P 5<br />

and compared the resulting execution time when using periodical re-optimization with and<br />

without the use <strong>of</strong> our correlation table. In contrast to the original comparison scenario, we<br />

generated correlated 7 data. Figure 3.29(a) illustrates the conditional selectivities P (o 2 ),<br />

P (o 3 |o 2 ), and P (o 4 |o 2 ∧ o 3 ) <strong>of</strong> the three Selection operators, where we additionally set<br />

P (o 3 |¬o 2 ) = 1 and P (o 4 |¬o 2 ∨ ¬o 3 ) = 1. As a result, o 3 strongly depends on o 2 as well as<br />

o 4 strongly depends on o 2 and o 3 .<br />

Figure 3.29(b) illustrates the resulting execution time with and without the use <strong>of</strong> our<br />

correlation table. We observe that without the use <strong>of</strong> the correlation table, the optimization<br />

technique selection reordering assumes statistical independence and thus, changed the<br />

plan back and forth, even in case <strong>of</strong> constant workload characteristics. This led to the periodic<br />

use <strong>of</strong> suboptimal plans, where the optimization interval ∆t = 5 min prevented more<br />

frequent plan changes. In contrast, the use <strong>of</strong> the correlation table ensured robustness by<br />

7 We did not use the Pearson correlation coefficient and known data generation techniques [Fac10] in order<br />

to enable the exact control <strong>of</strong> unconditional and conditional selectivities.<br />

84

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!