25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2 Preliminaries and Existing Techniques<br />

none <strong>of</strong> the existing approaches addresses the basic characteristic <strong>of</strong> integration flows that<br />

they are deployed once and executed many times. On the one side, there are approaches<br />

that follow the optimize-once model, where rule-based optimization is applied during the<br />

initial deployment <strong>of</strong> a plan only. These approaches cannot adapt to changing workload<br />

characteristics, which may lead to poor plans over time, and many optimization techniques<br />

that require cost-based decisions cannot be applied at all. On the other side, there<br />

are approaches that follow the optimize-always model, where optimization is triggered<br />

whenever an instance <strong>of</strong> an integration flow is executed. This might lead to tremendous<br />

optimization overhead if many instances with rather small amounts <strong>of</strong> data are executed<br />

over time because in such scenarios, the optimization time can be even higher than the<br />

execution time <strong>of</strong> a single instance. In conclusion, we observe the lack <strong>of</strong> a tailor-made<br />

cost-based optimization approach for integration flows that allows the continuous adaptation<br />

to changing workload characteristics and that exploits the specific characteristics <strong>of</strong><br />

integration flows in the form <strong>of</strong> being deployed once and executed many times.<br />

2.2 Adaptive Query Processing<br />

In contrast to the mainly rule-based optimization <strong>of</strong> integration flows, there is a large<br />

body <strong>of</strong> work on the cost-based optimization in various system categories. With regard<br />

to the aim <strong>of</strong> adaptation to changing workload characteristics as well as to unknown and<br />

misestimated statistics, the literature refers to this field as Adaptive Query Processing<br />

(AQP). In this section, we classify and discuss the existing techniques. For this purpose,<br />

we present an extended AQP classification that combines the known, system-oriented classification<br />

<strong>of</strong> Babu and Bizarro [BB05] with the time-oriented classification <strong>of</strong> Deshpande<br />

et al. [DHR06] and extend it by the category <strong>of</strong> integration flows. We focus only on<br />

the main characteristics and drawbacks but refer to the surveys [BB05, DIR07] and the<br />

tutorials [DHR06, IDR07] for a more detailed analysis <strong>of</strong> the individual categories.<br />

2.2.1 Classification <strong>of</strong> Adaptive Query Processing<br />

From a system perspective [BB05], we distinguish between (1) the plan-based adaptation <strong>of</strong><br />

ad-hoc queries in DBMS, (2) the adaptation <strong>of</strong> deployed integration flows (see Section 2.1)<br />

in integration platforms, (3) the adaptation <strong>of</strong> continuous queries (CQs) in DSMS, and<br />

(4) tuple routing as a specific execution model for CQs. We use this system-oriented<br />

classification when surveying existing techniques in the following subsections. All <strong>of</strong> those<br />

different system categories exhibit specific characteristics that are reflected by the specific<br />

optimization approaches.<br />

In addition to this system-oriented classification, we use a time-based classification in<br />

the sense <strong>of</strong> when re-optimization is triggered. Figure 2.7 illustrates the resulting overall<br />

classification. According to the spectrum <strong>of</strong> adaptivity [DHR06], there are essentially four<br />

types when re-optimization can be triggered. First, the coarse-grained inter-query optimization<br />

refers to the standard optimization model <strong>of</strong> DBMS as established in System R<br />

[SAC + 79], where each query is optimized during compile-time and thus, before execution<br />

(optimize-always). For OLTP systems with rather short query execution times, plan or<br />

QEP (query execution plan) caching [ZDS + 08, BBD09, Low09] is a widely used approach<br />

in order to reduce the optimization time by compiling a new QEP only if it does not<br />

exist or if statistics have changed significantly (optimize once). Second, late binding uses<br />

18

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!