25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.6 Experimental Evaluation<br />

allowed). For the cost-based vectorization, we used the introduced heuristic computation<br />

approach with λ = 0, which changed the number <strong>of</strong> buckets (includes additional VMTM<br />

operators) as follows: P 1 : from 10 to 8 buckets, P 2 : from 6 to 3 buckets, P 5 : from 9<br />

to 6 buckets, and P 7 : from 18 to 3 buckets. Due to the absence <strong>of</strong> changing workload<br />

characteristics, only the first invocation <strong>of</strong> the optimizer requires noteable optimization<br />

time for merging buckets and flushing pipelines, while all subsequent re-optimization steps<br />

take much less optimization time. Figure 4.19(b) shows this optimization time. Clearly,<br />

the plan P 7 required the highest optimization time but still takes less than a second, which<br />

is negligible compared to the achieved total execution time reduction <strong>of</strong> over 40 min.<br />

Second, scalability experiments have shown that the absolute improvement increases<br />

with increasing data size but the relative improvement with increasing data size depends<br />

on the used plan. There are plans with constant relative improvement (e.g., P 1 ) and plans,<br />

where the relative improvement decreases with increasing data size (e.g., P 7 ). Further,<br />

the relative improvement <strong>of</strong> both vectorization approaches increases with an increasing<br />

number <strong>of</strong> executed plan instances until the point <strong>of</strong> full pipeline utilization is reached.<br />

From thereon, the relative improvement stays constant.<br />

In conclusion, we achieve performance improvements in the form <strong>of</strong> an increase <strong>of</strong> message<br />

throughput. In addition, we observe that the absolute benefit increases with increasing<br />

number <strong>of</strong> plan instances and with an increasing data size as well.<br />

Performance and Throughput<br />

In order to evaluate both vectorization approaches in more detail, we use a template plan<br />

that can be extended to arbitrary numbers <strong>of</strong> operators. Essentially, we modeled a simple<br />

sequence <strong>of</strong> six operators as shown in Figure 4.20.<br />

External<br />

System<br />

s1<br />

Inbound<br />

Adapter<br />

Receive (o1)<br />

[service: s1, out: msg1]<br />

External<br />

System<br />

s2<br />

External<br />

System<br />

s3<br />

File<br />

Outbound<br />

Adapter<br />

File<br />

Outbound<br />

Adapter<br />

Assign (o2)<br />

[in: msg1, out: msg2]<br />

Invoke (o3)<br />

[service: s2, in: msg2, out: msg3]<br />

Translation (o4)<br />

[in: msg3, out: msg4]<br />

Assign (o5)<br />

[in: msg4, out: msg5]<br />

Invoke (o6)<br />

[service s3, in: msg5]<br />

filename=<br />

’store1/Test1.xml’<br />

mapping=<br />

’maporders.xsl’<br />

filename=<br />

’store2/Test2.xml’<br />

repeated<br />

pattern when<br />

varying the<br />

number <strong>of</strong><br />

operators m<br />

Figure 4.20: Evaluated Example Plan P m<br />

A message is received (Receive), prepared for a writing interaction (Assign), which<br />

is then executed with the file outbound adapter (Invoke). Subsequently, the resulting<br />

message (contains Orders and Orderlines) is modified by a Translation operator and<br />

finally, the message is written to a specific directory (Assign, Invoke). We refer to this<br />

as m = 5 because the Receive operator is removed during vectorization. When scaling<br />

m up to m = 35, we simply copy the last five operators and reconfigure them as a chain<br />

<strong>of</strong> m operators with direct data dependencies. All <strong>of</strong> the resulting Invoke operators refer<br />

to different directories. We ran a series <strong>of</strong> five experiments (each repeated 20 times)<br />

according to the already introduced scale factors. The results <strong>of</strong> these experiments are<br />

shown in Figure 4.21.<br />

121

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!