25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

5.1 Motivation and Problem Description<br />

incoming message as shown in Figure 5.1(b). The Receive operator (o 1 ) reads an order<br />

message from the queue and writes it to a local variable. Then, the Assign operator (o 2 )<br />

extracts the customer name <strong>of</strong> the received message via XPath and prepares a query with<br />

this parameter. Subsequently, the Invoke operator (o 3 ) queries the external system s 4 in<br />

order to load credit rating information for that customer. An isolated SQL query Q i per<br />

plan instance (per message) is used. The Join operator (o 4 ) merges the result message<br />

with the received message (with the customer key as join predicate). Finally, the pair <strong>of</strong><br />

Assign and Invoke operators (o 5 and o 6 ) sends the result to system s 3 . We see that multiple<br />

orders from one customer (CustA: m 1 , m 3 ) cause us to pose the same query (Invoke<br />

operator o 3 ) multiple times to the external system s 4 .<br />

Due to the serialized execution <strong>of</strong> plan instances, we may end up with work done multiple<br />

times, for all types <strong>of</strong> operators (interaction-oriented, data-flow-oriented as well as controlflow-oriented<br />

operators). At this point, multi-flow optimization comes into play, where we<br />

consider optimizing the sequence <strong>of</strong> plan instances. Our core idea is to periodically collect<br />

incoming messages and to execute whole message batches with single plan instances. In the<br />

following, we give an overview <strong>of</strong> the naïve, time-based approach as well as the horizontal<br />

(value-based) message queue partitioning as batch creation strategies.<br />

Naïve Time-<strong>Based</strong> Batch Creation<br />

The underlying theme <strong>of</strong> the naïve (time-based) batching approach, as already proposed<br />

in variations for distributed queries [LZL07, LX09], scan sharing [QRR + 08], operator<br />

scheduling strategies in DSMS [Sch07], and web service interactions [SMWM06, GYSD08b,<br />

GYSD08a], is to periodically collect messages (that would initiate plan instances p i ) using<br />

a specific waiting time ∆tw and merge those messages to message batches b i . We then<br />

execute a plan instance p ′ i <strong>of</strong> the modified (rewritten) plan P ′ for the message batch b i .<br />

In the following, we revisit our example and illustrate that naïve (time-based) approach.<br />

Example 5.2 (Batch-Orders Processing). Figure 5.2(b) shows the naïve approach, where<br />

we wait for incoming messages during a period <strong>of</strong> time ∆tw and execute the collected<br />

messages as a batch. For this purpose, P 2 is rewritten to P 2 ′ (see Figure 5.2(a)), where in<br />

this particular example only the prepared query has been modified.<br />

Receive (o1)<br />

[service: s5, out: msg1]<br />

Assign (o2)<br />

[in: msg1, out: msg2]<br />

Invoke (o3)<br />

[service: s4, in: msg2, out: msg3]<br />

INNER<br />

Join (o4)<br />

[in: msg1,msg3, out: msg4]<br />

Qi (rewritten):<br />

SELECT *<br />

FROM<br />

s4.Credit<br />

WHERE<br />

Customer IN()<br />

Qi<br />

Batch Message Queue<br />

m1 [“CustA“]<br />

m2 [“CustB“]<br />

m3 [“CustA“]<br />

m4 [“CustC“]<br />

Wait<br />

∆tw<br />

dequeue<br />

b1<br />

p’1:<br />

o1<br />

o2 o3 o4 o5 o6<br />

Q’1: SELECT *<br />

FROM s4.Credit<br />

WHERE Customer IN(“CustA“,<br />

“CustB“)<br />

Assign (o5)<br />

[in: msg4, out: msg5]<br />

Invoke (o6)<br />

[service s3, in: msg5]<br />

m5 [“CustB“]<br />

m6 [“CustC“]<br />

enqueue<br />

dequeue<br />

b2<br />

p’2:<br />

o1<br />

o2 o3 o4 o5 o6<br />

Q’2: SELECT *<br />

FROM s4.Credit<br />

WHERE Customer IN(“CustA“,<br />

“CustC“)<br />

(a) Example Plan P ′ 2<br />

(b) Message Batch Plan Execution <strong>of</strong> P ′ 2<br />

Figure 5.2: Example Message Batch Plan Execution<br />

In this example, the first batch contains two messages (m 1 and m 2 ) and the second also<br />

contains two messages (m 3 and m 4 ). In order to make use <strong>of</strong> batch processing, we extend<br />

131

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!