25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.1 Motivation and Problem Description<br />

original queries (Q100 + Q200) and the rewritten query with non-overlapping disjunctive<br />

predicate (Q100+200). In addition, for both types <strong>of</strong> disjunctive queries the integration<br />

platform must post-process the received data in order to ensure correctness by assigning<br />

only partial results to the individual messages. Second, the rewriting <strong>of</strong> queries might not<br />

be possible at all for certain service interfaces or custom applications such that a single<br />

query for each distinct message in the batch must be used. For example, Figure 5.3(c)<br />

shows a service interface with a single parameter, where the service implementation always<br />

uses the same query template such that the query cannot be rewritten by an external<br />

client. Thus, the possible throughput improvement strongly depends on the number <strong>of</strong> distinct<br />

items in the batch. We cannot precisely estimate this influence <strong>of</strong> rewritten queries<br />

due to missing knowledge about data properties <strong>of</strong> involved external systems [IHW04].<br />

In conclusion, the naïve approach can hurt performance. Furthermore, it requires that<br />

queries to external systems can be rewritten according to the different items in the batch.<br />

This is not always possible when integrating arbitrary systems and applications.<br />

Batch Creation via Horizontal Queue Partitioning<br />

Due to Problem 5.4, we propose the concept <strong>of</strong> horizontal message queue partitioning 11 as<br />

batch creation strategy. The basic idea is to horizontally partition the inbound message<br />

queues according to specific partitioning attributes ba. With such value-based partitioning,<br />

all messages <strong>of</strong> a batch exhibit the same attribute value according to the partitioning<br />

attribute. Thus, certain operators <strong>of</strong> the plan only need to access this attribute once for the<br />

whole partition rather than for each individual message. The core steps are (1) to derive<br />

the partitioning attributes from the integration flow, (2) to periodically collect messages<br />

during an automatically computed waiting time ∆tw, (3) to read the first partition from<br />

the queue and (4) to execute the messages <strong>of</strong> this partition as a batch with an instance<br />

p ′ i <strong>of</strong> a rewritten plan P ′ . Additionally, (5) we might need to ensure the serial order <strong>of</strong><br />

messages at the outbound side. In order to illustrate the core idea, we revisit our example.<br />

Example 5.3 (Partitioned Batch-Orders Processing). Figure 5.5 reconsiders the example<br />

for partitioned multi-flow execution. The rewritten plan P 2 ′ is equivalent to the instancebased<br />

plan (Figure 5.1(a)) because (1) external queries require no rewriting at all, and (2)<br />

plan rewriting is only required for multiple partitioning attributes.<br />

Partitioned Message Queue<br />

CustA<br />

CustB<br />

m1 [“CustA“]<br />

m3 [“CustA“]<br />

m2 [“CustB“]<br />

m5 [“CustB“]<br />

Wait<br />

∆tw<br />

dequeue<br />

b1<br />

p’1:<br />

o1<br />

o2 o3 o4 o5 o6<br />

Q’1: SELECT *<br />

FROM s4.Credit<br />

WHERE Customer=“CustA“<br />

CustC<br />

m4 [“CustC“]<br />

m6 [“CustC“]<br />

dequeue<br />

b2<br />

p’2:<br />

o1<br />

o2 o3 o4 o5 o6<br />

Q’2: SELECT *<br />

FROM s4.Credit<br />

WHERE Customer=“CustB“<br />

enqueue<br />

Figure 5.5: Partitioned Message Batch Execution P ′ 2<br />

11 Horizontal data partitioning [CNP82] is strongly applied in DBMS and distributed systems. Typically,<br />

this is an issue <strong>of</strong> physical design [ANY04], where a table is partitioned by selection predicates (value).<br />

133

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!