Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4.6 Experimental Evaluation<br />
allowed). For the cost-based vectorization, we used the introduced heuristic computation<br />
approach with λ = 0, which changed the number <strong>of</strong> buckets (includes additional VMTM<br />
operators) as follows: P 1 : from 10 to 8 buckets, P 2 : from 6 to 3 buckets, P 5 : from 9<br />
to 6 buckets, and P 7 : from 18 to 3 buckets. Due to the absence <strong>of</strong> changing workload<br />
characteristics, only the first invocation <strong>of</strong> the optimizer requires noteable optimization<br />
time for merging buckets and flushing pipelines, while all subsequent re-optimization steps<br />
take much less optimization time. Figure 4.19(b) shows this optimization time. Clearly,<br />
the plan P 7 required the highest optimization time but still takes less than a second, which<br />
is negligible compared to the achieved total execution time reduction <strong>of</strong> over 40 min.<br />
Second, scalability experiments have shown that the absolute improvement increases<br />
with increasing data size but the relative improvement with increasing data size depends<br />
on the used plan. There are plans with constant relative improvement (e.g., P 1 ) and plans,<br />
where the relative improvement decreases with increasing data size (e.g., P 7 ). Further,<br />
the relative improvement <strong>of</strong> both vectorization approaches increases with an increasing<br />
number <strong>of</strong> executed plan instances until the point <strong>of</strong> full pipeline utilization is reached.<br />
From thereon, the relative improvement stays constant.<br />
In conclusion, we achieve performance improvements in the form <strong>of</strong> an increase <strong>of</strong> message<br />
throughput. In addition, we observe that the absolute benefit increases with increasing<br />
number <strong>of</strong> plan instances and with an increasing data size as well.<br />
Performance and Throughput<br />
In order to evaluate both vectorization approaches in more detail, we use a template plan<br />
that can be extended to arbitrary numbers <strong>of</strong> operators. Essentially, we modeled a simple<br />
sequence <strong>of</strong> six operators as shown in Figure 4.20.<br />
External<br />
System<br />
s1<br />
Inbound<br />
Adapter<br />
Receive (o1)<br />
[service: s1, out: msg1]<br />
External<br />
System<br />
s2<br />
External<br />
System<br />
s3<br />
File<br />
Outbound<br />
Adapter<br />
File<br />
Outbound<br />
Adapter<br />
Assign (o2)<br />
[in: msg1, out: msg2]<br />
Invoke (o3)<br />
[service: s2, in: msg2, out: msg3]<br />
Translation (o4)<br />
[in: msg3, out: msg4]<br />
Assign (o5)<br />
[in: msg4, out: msg5]<br />
Invoke (o6)<br />
[service s3, in: msg5]<br />
filename=<br />
’store1/Test1.xml’<br />
mapping=<br />
’maporders.xsl’<br />
filename=<br />
’store2/Test2.xml’<br />
repeated<br />
pattern when<br />
varying the<br />
number <strong>of</strong><br />
operators m<br />
Figure 4.20: Evaluated Example Plan P m<br />
A message is received (Receive), prepared for a writing interaction (Assign), which<br />
is then executed with the file outbound adapter (Invoke). Subsequently, the resulting<br />
message (contains Orders and Orderlines) is modified by a Translation operator and<br />
finally, the message is written to a specific directory (Assign, Invoke). We refer to this<br />
as m = 5 because the Receive operator is removed during vectorization. When scaling<br />
m up to m = 35, we simply copy the last five operators and reconfigure them as a chain<br />
<strong>of</strong> m operators with direct data dependencies. All <strong>of</strong> the resulting Invoke operators refer<br />
to different directories. We ran a series <strong>of</strong> five experiments (each repeated 20 times)<br />
according to the already introduced scale factors. The results <strong>of</strong> these experiments are<br />
shown in Figure 4.21.<br />
121