25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.4 <strong>Cost</strong>-<strong>Based</strong> Vectorization for Multiple Plans<br />

4.4 <strong>Cost</strong>-<strong>Based</strong> Vectorization for Multiple Plans<br />

So far, we have described how to compute the cost-optimal vectorized plan for a single<br />

deployed plan. While this approach significantly improves the performance <strong>of</strong> this plan, the<br />

cost-based vectorization can also hurt the overall performance in case <strong>of</strong> multiple deployed<br />

plans (independent) due to a possibly high number <strong>of</strong> execution buckets regarding all plans.<br />

In conclusion, it is not appropriate to simply use the cost-based vectorization approach or<br />

the fixed number <strong>of</strong> execution buckets for the set <strong>of</strong> all plans as well. In this section, we<br />

present an approach that takes into account executions statistics <strong>of</strong> all deployed plans. As<br />

a result, this approach achieves robustness in terms <strong>of</strong> the overall performance and hence,<br />

allows for more predictable performance <strong>of</strong> the integration platform.<br />

4.4.1 Problem Description<br />

In many real-world scenarios, multiple independent plans P i ∈ {P 1 , . . . , P h } are deployed<br />

within an integration platform that executes instances <strong>of</strong> these plans concurrently. <strong>Cost</strong>based<br />

vectorization overcomes the problems <strong>of</strong> full vectorization, i.e., the number <strong>of</strong> required<br />

threads and the work-cycle domination by single operators with regard to a single<br />

deployed plan. When executing multiple cost-based vectorized plans concurrently, a similar<br />

problem arises. Here, the number <strong>of</strong> threads required by all h plans depends on the<br />

number <strong>of</strong> plans. In detail, it is upper-bounded by ∑ h<br />

i=1 m i, where m i denotes the number<br />

<strong>of</strong> operators <strong>of</strong> plan P i .<br />

In order to overcome this problem in case <strong>of</strong> a high number <strong>of</strong> deployed plans, we define<br />

an extended vectorization problem. The core idea is to restrict the maximum number<br />

<strong>of</strong> threads by K in the sense <strong>of</strong> a user-defined parameter. Then, we compute the fairest<br />

distribution <strong>of</strong> all operators <strong>of</strong> the h plans across the K execution buckets according to the<br />

current workload characteristics and execution statistics. First <strong>of</strong> all, we formally define<br />

the extended cost-based vectorization problem for multiple plans.<br />

Definition 4.4 (<strong>Cost</strong>-<strong>Based</strong> Multiple Plan Vectorization Problem (P-MPV)). Let P with<br />

P i ∈ {P 1 , . . . , P h } denote a set <strong>of</strong> h plans. The P-MPV then describes the problem <strong>of</strong> finding<br />

a restricted cost-optimal plan P i<br />

′′ with k i execution buckets for each P i ∈ P according<br />

to the P-CPV. There, the constraint <strong>of</strong> the maximum overall number <strong>of</strong> execution buckets<br />

<strong>of</strong> ∑ h<br />

i=1 k i ≤ K must hold.<br />

Obviously, when simply solving the standard cost-based optimization problem for each<br />

single plan P i , we might exceed the maximum number <strong>of</strong> execution buckets with ∑ h<br />

i=1 k i ><br />

K. The following example illustrates this problem.<br />

Example 4.10 (Problem when Solving the P-MPV). Assume three plans P a , P b and P c<br />

with different numbers <strong>of</strong> operators and monitored costs as shown in Figure 4.16. We<br />

set the maximum total number <strong>of</strong> execution buckets to K = 7. Further, the P-CPV is<br />

solved for each single plan using the heuristic computation approach. For this example,<br />

we observe that we get ∑ h<br />

i=1 k i = 9 execution buckets and hence, we exceed the maximum<br />

constraint <strong>of</strong> K = 7.<br />

As a result, the P-MPV cannot be solved by applying the P-CPV for each single plan.<br />

In contrast to restricting k, (see Subsection 4.3.3), the given maximum constraint K is<br />

only an upper bound and therefore we have to consider more solution candidates. In<br />

addition, we might not fully use the optimization potential if we simply use K/h buckets<br />

113

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!