Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
6.5 Experimental Evaluation<br />
demand re-optimization directly reacts to the workload change and switches plans. Hence,<br />
over time, execution time reductions are yielded due to the fast adaptation. In order to<br />
achieve this with periodical re-optimization, a really small ∆t is required. However, this<br />
would increase the total re-optimization time. Finally, note that the small differences<br />
<strong>of</strong> absolute execution times compared to Chapter 3 are caused by randomly generated<br />
messages for this experiment according to the given selectivities.<br />
Complex-Plan End-to-End Comparison<br />
In addition to the simple-plan scenario, we executed a second series <strong>of</strong> experiments using<br />
the more complex example plan P 7 ′. The difference <strong>of</strong> plan P 7 ′ compared to the already<br />
introduced plan P 7 is that we explicitly changed the join query type chain to a clique<br />
type in order to have the possibility <strong>of</strong> arbitrary join reordering. This plan receives a data<br />
set, loads data from four different systems, and executes schema transformations with<br />
translation operators (XSLT scripts). Finally, the five data sets are joined using four join<br />
operators and the result is sent to a fifth system. We executed 20,000 plan instances using<br />
the input cardinalities shown in Figure 6.15(b) and the same parameter configurations as<br />
for the simple plan scenario. For the four loaded data sets, we used input cardinalities <strong>of</strong><br />
d ∈ {2, 4, 8, 16} (in 100 kB). After every 2,000 instances, we changed the input cardinalities<br />
<strong>of</strong> the four external systems round-robin as shown in Figure 6.15(a).<br />
Regarding the results that are shown in Figures 6.15(c)-6.15(f), we observe similar characteristics<br />
as within the simple-plan scenario. It is important to note that (1) the higher the<br />
optimization opportunities <strong>of</strong> a plan, and (2) the higher the number <strong>of</strong> workload changes,<br />
the higher the execution time improvements achieved by on-demand re-optimization due<br />
to the higher importance <strong>of</strong> immediate adaptation (see Figures 6.15(d) and 6.15(f)). However,<br />
we restricted ourself to moderate differences <strong>of</strong> input data sizes in order to show only<br />
this main characteristic rather than showing arbitrarily high improvements. Further, we<br />
also observe, on average, higher re-optimization time improvements as within the simpleplan<br />
comparison scenario due to the higher influence <strong>of</strong> immediate re-optimization and<br />
directed re-optimization (see Figures 6.15(c) and 6.15(e)). The outliers for periodical reoptimization<br />
have been cause by the Java garbage collection because our implementation<br />
<strong>of</strong> the full join enumeration (DPSize) requires many temporary data objects, which are<br />
lazily deleted if space is required. This effect is only visible for periodical re-optimization<br />
because there, we used full join enumeration, where more objects have been created and the<br />
join enumeration is invoked more <strong>of</strong>ten than our on-demand re-optimization. In conclusion,<br />
the benefit <strong>of</strong> on-demand re-optimization increases with increasing plan complexity<br />
and increasing frequency <strong>of</strong> workload changes.<br />
Scalability<br />
In order to investigate the influencing aspects such as the number <strong>of</strong> workload shifts wc,<br />
the input data size d and the optimization interval ∆t in more detail, we conducted an<br />
additional series <strong>of</strong> scalability experiments, where we varied these parameters. We compare<br />
the unoptimized case with periodical and on-demand re-optimization using the cumulated<br />
plan execution time, the cumulated optimization time, and the number <strong>of</strong> re-optimizations.<br />
As the experimental setup, we executed 5,000 plan instances <strong>of</strong> P 5 for each configuration,<br />
where each workload shift switches between the two selectivity configurations <strong>of</strong> (sel(o 2 ) =<br />
0.8, sel(o 3 ) = 0.6, sel(o 4 ) = 0.1) and (sel(o 2 ) = 0.1, sel(o 3 ) = 0.6, sel(o 4 ) = 0.8). As<br />
191