Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong> <strong>Flows</strong><br />
for typical workloads. Second, we evaluate the parameters <strong>of</strong> these workload aggregation<br />
methods. Figure 3.28(c) illustrates the influence <strong>of</strong> the parameter EMA smoothing constant<br />
α. We used a sliding time window size <strong>of</strong> ∆w = 10,000 s and illustrated the estimated costs<br />
continuously (∆t = 1 s). Clearly, an decreasing parameter α causes slower adaptation and<br />
therefore more robust estimation. However, for typical parameter settings <strong>of</strong> 0.05 to 0.001<br />
very fast but still robust adaptation can be achieved. Note that for α ∈ {0.2, 0.02, 0.002}<br />
we obtained similar results for the sliding window size <strong>of</strong> ∆w = 1,000 s from the previous<br />
experiment. However, for α = 0.0002 (with ∆w = 1,000 s) the estimates varied significantly<br />
which was caused by too few statistics in the time window in combination with<br />
a low smoothing factor such that the estimated values were significantly determined by<br />
the initial value (first statistic in the window) because the adaptation took too long. In<br />
order to analyze the influence <strong>of</strong> the sliding window size ∆w in general, we conducted an<br />
additional experiment. Figure 3.28(d) illustrates the influence <strong>of</strong> the sliding time window<br />
size, where we fixed Agg = MA and varied ∆w from 10 s to 10,000 s. Clearly, the adaptation<br />
slows down with an increasing ∆w. However, both extremes can lead to wrong (large error)<br />
estimations. The choice <strong>of</strong> the window size should be made based on the specific plan<br />
because, for example, a long-running plan or a infrequently used plan need a longer time<br />
window than plans with many instances per time period. The EMA method, typically, does<br />
not need sliding window semantics due to the time-decaying character where older items<br />
can be neglected. However, if a sliding window is used, the sliding window size ∆w should<br />
be set according to the plan and the used smoothing constant α such that enough statistics<br />
are available as already discussed. Furthermore, the optimization interval influences<br />
the re-estimation granularity. With ∆t = 1 s, we get a continuous cost function, while an<br />
increasing ∆t causes a slower adaptation because this influences the maximal delay <strong>of</strong> ∆t<br />
until re-estimation. Obviously, parameter estimators, which minimize the error between<br />
forecast values and real values could be used to determine optimal parameter values for<br />
∆t and ∆w. However, when and how to adjust these parameters is a trade-<strong>of</strong>f between<br />
additional statistic maintenance overhead and cost estimation accuracy that is beyond the<br />
scope <strong>of</strong> this thesis.<br />
With regard to precise statistic estimation, handling <strong>of</strong> correlated data and conditional<br />
probabilities are important. Therefore, we conducted an experiment in order to evaluate<br />
our lightweight correlation table approach in detail. We reused our end-to-end comparison<br />
scenario (see Figure 3.20), where we executed 100,000 instances <strong>of</strong> our example plan P 5<br />
and compared the resulting execution time when using periodical re-optimization with and<br />
without the use <strong>of</strong> our correlation table. In contrast to the original comparison scenario, we<br />
generated correlated 7 data. Figure 3.29(a) illustrates the conditional selectivities P (o 2 ),<br />
P (o 3 |o 2 ), and P (o 4 |o 2 ∧ o 3 ) <strong>of</strong> the three Selection operators, where we additionally set<br />
P (o 3 |¬o 2 ) = 1 and P (o 4 |¬o 2 ∨ ¬o 3 ) = 1. As a result, o 3 strongly depends on o 2 as well as<br />
o 4 strongly depends on o 2 and o 3 .<br />
Figure 3.29(b) illustrates the resulting execution time with and without the use <strong>of</strong> our<br />
correlation table. We observe that without the use <strong>of</strong> the correlation table, the optimization<br />
technique selection reordering assumes statistical independence and thus, changed the<br />
plan back and forth, even in case <strong>of</strong> constant workload characteristics. This led to the periodic<br />
use <strong>of</strong> suboptimal plans, where the optimization interval ∆t = 5 min prevented more<br />
frequent plan changes. In contrast, the use <strong>of</strong> the correlation table ensured robustness by<br />
7 We did not use the Pearson correlation coefficient and known data generation techniques [Fac10] in order<br />
to enable the exact control <strong>of</strong> unconditional and conditional selectivities.<br />
84