Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4 Vectorizing <strong>Integration</strong> <strong>Flows</strong><br />
P’a<br />
o1<br />
W(o1)=4<br />
o2<br />
W(o2)=3<br />
o3 k=2<br />
W(o3)=1<br />
P’b<br />
o1 o2 o3 o4 o5<br />
W(o1)=3 W(o2)=2 W(o3)=5 W(o4)=4 W(o5)=3<br />
k=4<br />
P’c<br />
o1<br />
W(o1)=1<br />
o2 o3 o4<br />
W(o2)=9 W(o3)=4 W(o4)=2<br />
k=3<br />
Figure 4.16: Problem <strong>of</strong> Solving P-CPV for all Plans<br />
for each plan and compute the restricted vectorized plan accordingly. The major challenge<br />
<strong>of</strong> solving the P-MPV is to determine the best distribution <strong>of</strong> all operators <strong>of</strong> the h different<br />
plans across the maximum number <strong>of</strong> K execution buckets such that we get the highest<br />
overall performance and do not exceed the maximum constraint. In the next subsection,<br />
we present a computation approach that addresses this challenge.<br />
4.4.2 Computation Approach<br />
The core idea <strong>of</strong> computing the optimal operator distribution across buckets for multiple<br />
plans is to use the costs <strong>of</strong> each plan in order to assign more execution buckets to more<br />
cost-intensive plans. In detail, we use those costs to weight the maximum constraint K<br />
<strong>of</strong> execution buckets and determine a local maximum constraint K i for each plan P i . We<br />
use the plan execution time W (P i ) as well as the message arrival rate R i (that can be<br />
monitored as the number <strong>of</strong> plan instances per time period).<br />
In a first step, we determine the local maximum constraint K i for each plan. If all h plans<br />
would exhibit the same message arrival rate and execution times, we could compute it by<br />
K i = ⌊K/h⌋. However, based on the monitored statistics, we determine the constraint by<br />
⌊<br />
⌋<br />
R i · W (P i )<br />
K i = 1 + ∑ h<br />
j=1 R j · W (P j ) · (K − h) , (4.19)<br />
where K i is at least one execution bucket and at most K execution buckets. Due to the<br />
lower bound, a constraint <strong>of</strong> K < h (smaller than the number <strong>of</strong> plans) is invalid.<br />
In a second step, after we have determined the local maximum constraints, we use a<br />
heuristic computation approach to determine the cost-based operator distribution for each<br />
plan. If the maximum constraint is not exceeded, we use the computed scheme. If the<br />
used number <strong>of</strong> execution buckets k i is smaller than the local constraint, in a third step,<br />
we further redistribute the K i − k i open execution buckets across the remaining plans,<br />
where the local constraint is exceeded.<br />
In detail, Algorithm 4.4 illustrates our heuristic approach. First, we determine the local<br />
maximum constraints and number <strong>of</strong> free buckets according to the statistics (lines 2-4).<br />
Second, for each plan, we execute the heuristic cost-based plan vectorization (lines 5-10)<br />
using the A-CPV. If the determined number <strong>of</strong> buckets is smaller than or equal to the<br />
local maximum constraint, we accept this plan and add the remaining buckets to the free<br />
buckets. Third, after all plans have been processed, we redistribute the remaining free<br />
buckets according to the monitored statistics <strong>of</strong> the individual plans (lines 11-12) in an<br />
upper-bounded fashion. Fourth, we use the heuristic cost-based plan vectorization but<br />
restrict it to the extended maximum constraints (line 13-19) using the A-RCPV. Here, we<br />
114