Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
2 Preliminaries and Existing Techniques<br />
none <strong>of</strong> the existing approaches addresses the basic characteristic <strong>of</strong> integration flows that<br />
they are deployed once and executed many times. On the one side, there are approaches<br />
that follow the optimize-once model, where rule-based optimization is applied during the<br />
initial deployment <strong>of</strong> a plan only. These approaches cannot adapt to changing workload<br />
characteristics, which may lead to poor plans over time, and many optimization techniques<br />
that require cost-based decisions cannot be applied at all. On the other side, there<br />
are approaches that follow the optimize-always model, where optimization is triggered<br />
whenever an instance <strong>of</strong> an integration flow is executed. This might lead to tremendous<br />
optimization overhead if many instances with rather small amounts <strong>of</strong> data are executed<br />
over time because in such scenarios, the optimization time can be even higher than the<br />
execution time <strong>of</strong> a single instance. In conclusion, we observe the lack <strong>of</strong> a tailor-made<br />
cost-based optimization approach for integration flows that allows the continuous adaptation<br />
to changing workload characteristics and that exploits the specific characteristics <strong>of</strong><br />
integration flows in the form <strong>of</strong> being deployed once and executed many times.<br />
2.2 Adaptive Query Processing<br />
In contrast to the mainly rule-based optimization <strong>of</strong> integration flows, there is a large<br />
body <strong>of</strong> work on the cost-based optimization in various system categories. With regard<br />
to the aim <strong>of</strong> adaptation to changing workload characteristics as well as to unknown and<br />
misestimated statistics, the literature refers to this field as Adaptive Query Processing<br />
(AQP). In this section, we classify and discuss the existing techniques. For this purpose,<br />
we present an extended AQP classification that combines the known, system-oriented classification<br />
<strong>of</strong> Babu and Bizarro [BB05] with the time-oriented classification <strong>of</strong> Deshpande<br />
et al. [DHR06] and extend it by the category <strong>of</strong> integration flows. We focus only on<br />
the main characteristics and drawbacks but refer to the surveys [BB05, DIR07] and the<br />
tutorials [DHR06, IDR07] for a more detailed analysis <strong>of</strong> the individual categories.<br />
2.2.1 Classification <strong>of</strong> Adaptive Query Processing<br />
From a system perspective [BB05], we distinguish between (1) the plan-based adaptation <strong>of</strong><br />
ad-hoc queries in DBMS, (2) the adaptation <strong>of</strong> deployed integration flows (see Section 2.1)<br />
in integration platforms, (3) the adaptation <strong>of</strong> continuous queries (CQs) in DSMS, and<br />
(4) tuple routing as a specific execution model for CQs. We use this system-oriented<br />
classification when surveying existing techniques in the following subsections. All <strong>of</strong> those<br />
different system categories exhibit specific characteristics that are reflected by the specific<br />
optimization approaches.<br />
In addition to this system-oriented classification, we use a time-based classification in<br />
the sense <strong>of</strong> when re-optimization is triggered. Figure 2.7 illustrates the resulting overall<br />
classification. According to the spectrum <strong>of</strong> adaptivity [DHR06], there are essentially four<br />
types when re-optimization can be triggered. First, the coarse-grained inter-query optimization<br />
refers to the standard optimization model <strong>of</strong> DBMS as established in System R<br />
[SAC + 79], where each query is optimized during compile-time and thus, before execution<br />
(optimize-always). For OLTP systems with rather short query execution times, plan or<br />
QEP (query execution plan) caching [ZDS + 08, BBD09, Low09] is a widely used approach<br />
in order to reduce the optimization time by compiling a new QEP only if it does not<br />
exist or if statistics have changed significantly (optimize once). Second, late binding uses<br />
18