Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3.6 Summary and Discussion<br />
(a) Selectivity Variations<br />
(b) Execution Time<br />
Figure 3.29: Comparison Scenario <strong>of</strong> Periodical Re-<strong>Optimization</strong> with Correlation<br />
maintaining conditional probabilities over multiple versions <strong>of</strong> a plan. Due to the involvement<br />
<strong>of</strong> three correlated operators, incremental optimization required deleting records<br />
from this correlation table. As a result, there are also some plan switches to suboptimal<br />
plans (e.g., after workload shift *2, we observe three wrong plan switches). However, over<br />
time, the conditional selectivity estimates converge to the real selectivities, which reasons<br />
a 10% improvement with regard to the cumulative execution time. In conclusion, the use<br />
<strong>of</strong> our correlation table ensures robustness in the presense <strong>of</strong> correlated data or conditional<br />
probabilities, while the overhead is negligible (in this comparison scenario, we maintained<br />
three entries in this correlation table).<br />
The details <strong>of</strong> our exhaustive evaluation have shown that significant performance improvements<br />
can be achieved by periodical re-optimization in the sense <strong>of</strong> minimizing the<br />
average execution time <strong>of</strong> a plan, while only moderate overhead is imposed by statistics<br />
monitoring and periodical re-optimization. Even in the case, where no optimization<br />
techniques could be applied, no significant performance penalty was measured. Most importantly,<br />
the optimized plans show a better scalability than unoptimized plans. Thus,<br />
typically, the relative performance improvements increase with an increasing input data<br />
size or an increasing number <strong>of</strong> operators. Finally, with the right choice <strong>of</strong> parameters<br />
the self-adjusting cost model in combination with correlation awareness enables a fast but<br />
still robust adaptation to changing workload characteristics.<br />
3.6 Summary and Discussion<br />
To summarize, in this chapter, we introduced the cost-based optimization <strong>of</strong> imperative<br />
integration flows to overcome the major problem <strong>of</strong> inefficiently performing integration<br />
flows in the presence <strong>of</strong> changing workload characteristics. The incremental maintenance<br />
<strong>of</strong> execution statistics addresses missing knowledge about data properties when integrating<br />
heterogeneous and highly distributed systems and applications. In the area <strong>of</strong> integration<br />
flows, cost-based re-optimization has been considered for the first time. Our solution comprises<br />
the dependency analysis and details on the monitoring <strong>of</strong> workload and execution<br />
statistics as well as the definition <strong>of</strong> the double-metric cost model. <strong>Based</strong> on these foundations,<br />
we discussed the NP-hard Periodic Plan <strong>Optimization</strong> Problem (P-PPO), including<br />
85