Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4.4 <strong>Cost</strong>-<strong>Based</strong> Vectorization for Multiple Plans<br />
4.4 <strong>Cost</strong>-<strong>Based</strong> Vectorization for Multiple Plans<br />
So far, we have described how to compute the cost-optimal vectorized plan for a single<br />
deployed plan. While this approach significantly improves the performance <strong>of</strong> this plan, the<br />
cost-based vectorization can also hurt the overall performance in case <strong>of</strong> multiple deployed<br />
plans (independent) due to a possibly high number <strong>of</strong> execution buckets regarding all plans.<br />
In conclusion, it is not appropriate to simply use the cost-based vectorization approach or<br />
the fixed number <strong>of</strong> execution buckets for the set <strong>of</strong> all plans as well. In this section, we<br />
present an approach that takes into account executions statistics <strong>of</strong> all deployed plans. As<br />
a result, this approach achieves robustness in terms <strong>of</strong> the overall performance and hence,<br />
allows for more predictable performance <strong>of</strong> the integration platform.<br />
4.4.1 Problem Description<br />
In many real-world scenarios, multiple independent plans P i ∈ {P 1 , . . . , P h } are deployed<br />
within an integration platform that executes instances <strong>of</strong> these plans concurrently. <strong>Cost</strong>based<br />
vectorization overcomes the problems <strong>of</strong> full vectorization, i.e., the number <strong>of</strong> required<br />
threads and the work-cycle domination by single operators with regard to a single<br />
deployed plan. When executing multiple cost-based vectorized plans concurrently, a similar<br />
problem arises. Here, the number <strong>of</strong> threads required by all h plans depends on the<br />
number <strong>of</strong> plans. In detail, it is upper-bounded by ∑ h<br />
i=1 m i, where m i denotes the number<br />
<strong>of</strong> operators <strong>of</strong> plan P i .<br />
In order to overcome this problem in case <strong>of</strong> a high number <strong>of</strong> deployed plans, we define<br />
an extended vectorization problem. The core idea is to restrict the maximum number<br />
<strong>of</strong> threads by K in the sense <strong>of</strong> a user-defined parameter. Then, we compute the fairest<br />
distribution <strong>of</strong> all operators <strong>of</strong> the h plans across the K execution buckets according to the<br />
current workload characteristics and execution statistics. First <strong>of</strong> all, we formally define<br />
the extended cost-based vectorization problem for multiple plans.<br />
Definition 4.4 (<strong>Cost</strong>-<strong>Based</strong> Multiple Plan Vectorization Problem (P-MPV)). Let P with<br />
P i ∈ {P 1 , . . . , P h } denote a set <strong>of</strong> h plans. The P-MPV then describes the problem <strong>of</strong> finding<br />
a restricted cost-optimal plan P i<br />
′′ with k i execution buckets for each P i ∈ P according<br />
to the P-CPV. There, the constraint <strong>of</strong> the maximum overall number <strong>of</strong> execution buckets<br />
<strong>of</strong> ∑ h<br />
i=1 k i ≤ K must hold.<br />
Obviously, when simply solving the standard cost-based optimization problem for each<br />
single plan P i , we might exceed the maximum number <strong>of</strong> execution buckets with ∑ h<br />
i=1 k i ><br />
K. The following example illustrates this problem.<br />
Example 4.10 (Problem when Solving the P-MPV). Assume three plans P a , P b and P c<br />
with different numbers <strong>of</strong> operators and monitored costs as shown in Figure 4.16. We<br />
set the maximum total number <strong>of</strong> execution buckets to K = 7. Further, the P-CPV is<br />
solved for each single plan using the heuristic computation approach. For this example,<br />
we observe that we get ∑ h<br />
i=1 k i = 9 execution buckets and hence, we exceed the maximum<br />
constraint <strong>of</strong> K = 7.<br />
As a result, the P-MPV cannot be solved by applying the P-CPV for each single plan.<br />
In contrast to restricting k, (see Subsection 4.3.3), the given maximum constraint K is<br />
only an upper bound and therefore we have to consider more solution candidates. In<br />
addition, we might not fully use the optimization potential if we simply use K/h buckets<br />
113