Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
5.1 Motivation and Problem Description<br />
original queries (Q100 + Q200) and the rewritten query with non-overlapping disjunctive<br />
predicate (Q100+200). In addition, for both types <strong>of</strong> disjunctive queries the integration<br />
platform must post-process the received data in order to ensure correctness by assigning<br />
only partial results to the individual messages. Second, the rewriting <strong>of</strong> queries might not<br />
be possible at all for certain service interfaces or custom applications such that a single<br />
query for each distinct message in the batch must be used. For example, Figure 5.3(c)<br />
shows a service interface with a single parameter, where the service implementation always<br />
uses the same query template such that the query cannot be rewritten by an external<br />
client. Thus, the possible throughput improvement strongly depends on the number <strong>of</strong> distinct<br />
items in the batch. We cannot precisely estimate this influence <strong>of</strong> rewritten queries<br />
due to missing knowledge about data properties <strong>of</strong> involved external systems [IHW04].<br />
In conclusion, the naïve approach can hurt performance. Furthermore, it requires that<br />
queries to external systems can be rewritten according to the different items in the batch.<br />
This is not always possible when integrating arbitrary systems and applications.<br />
Batch Creation via Horizontal Queue Partitioning<br />
Due to Problem 5.4, we propose the concept <strong>of</strong> horizontal message queue partitioning 11 as<br />
batch creation strategy. The basic idea is to horizontally partition the inbound message<br />
queues according to specific partitioning attributes ba. With such value-based partitioning,<br />
all messages <strong>of</strong> a batch exhibit the same attribute value according to the partitioning<br />
attribute. Thus, certain operators <strong>of</strong> the plan only need to access this attribute once for the<br />
whole partition rather than for each individual message. The core steps are (1) to derive<br />
the partitioning attributes from the integration flow, (2) to periodically collect messages<br />
during an automatically computed waiting time ∆tw, (3) to read the first partition from<br />
the queue and (4) to execute the messages <strong>of</strong> this partition as a batch with an instance<br />
p ′ i <strong>of</strong> a rewritten plan P ′ . Additionally, (5) we might need to ensure the serial order <strong>of</strong><br />
messages at the outbound side. In order to illustrate the core idea, we revisit our example.<br />
Example 5.3 (Partitioned Batch-Orders Processing). Figure 5.5 reconsiders the example<br />
for partitioned multi-flow execution. The rewritten plan P 2 ′ is equivalent to the instancebased<br />
plan (Figure 5.1(a)) because (1) external queries require no rewriting at all, and (2)<br />
plan rewriting is only required for multiple partitioning attributes.<br />
Partitioned Message Queue<br />
CustA<br />
CustB<br />
m1 [“CustA“]<br />
m3 [“CustA“]<br />
m2 [“CustB“]<br />
m5 [“CustB“]<br />
Wait<br />
∆tw<br />
dequeue<br />
b1<br />
p’1:<br />
o1<br />
o2 o3 o4 o5 o6<br />
Q’1: SELECT *<br />
FROM s4.Credit<br />
WHERE Customer=“CustA“<br />
CustC<br />
m4 [“CustC“]<br />
m6 [“CustC“]<br />
dequeue<br />
b2<br />
p’2:<br />
o1<br />
o2 o3 o4 o5 o6<br />
Q’2: SELECT *<br />
FROM s4.Credit<br />
WHERE Customer=“CustB“<br />
enqueue<br />
Figure 5.5: Partitioned Message Batch Execution P ′ 2<br />
11 Horizontal data partitioning [CNP82] is strongly applied in DBMS and distributed systems. Typically,<br />
this is an issue <strong>of</strong> physical design [ANY04], where a table is partitioned by selection predicates (value).<br />
133