25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5 Multi-Flow <strong>Optimization</strong><br />

cannot be processed as a stream such that these influences are additive components rather<br />

than being subsumed by the most time-consuming influence.<br />

Problem 5.2 (Cache Coherency Problem). One solution to Problem 5.1 might be the<br />

caching <strong>of</strong> results <strong>of</strong> external queries. However, this fails due to the integration <strong>of</strong> heterogeneous<br />

and highly distributed systems and applications (loosely coupled without any<br />

notification mechanisms). In such distributed environments, caching is not applicable because<br />

the central integration platform cannot ensure that the cached data is consistent with<br />

the data in the source systems [LZL07]. A similar problem is also known for caching proxy<br />

servers, which might break client cache directives (RFC 3143) [CD01].<br />

Due to this problem, other projects, such as the MT-Cache, use currency bounds (maximum<br />

time <strong>of</strong> caching certain data objects) [GLRG04]. However, they can only ensure weak<br />

consistency, while for integration flows eventual consistency is needed as described in Subsection<br />

2.3.2. Caching (without semantic cache invalidation) cannot ensure the properties<br />

<strong>of</strong> (1) read-your-writes (consistency between a writing Invoke and a subsequent reading<br />

Invoke <strong>of</strong> the same data object) and (2) session consistency (consistency between multiple<br />

reading Invoke <strong>of</strong> the same data object) <strong>of</strong> eventual consistency [Vog08].<br />

Problem 5.3 (Serialized External Behavior). Depending on the involved external systems,<br />

we need to ensure the serial order <strong>of</strong> messages (see Problem 2.2 Message Outrun). For<br />

example, this can be caused by referential integrity constraints within the target systems.<br />

Thus, we need to guarantee monotonic reads and writes for individual data objects.<br />

Given these problems, the optimization objective <strong>of</strong> throughput maximization has so far<br />

only been addressed by leveraging a higher degree <strong>of</strong> parallelism, such as (1) intra-operator,<br />

horizontal parallelism (data partitioning, see [BABO + 09]), (2) inter-operator, horizontal<br />

parallelism (explicit parallel subflows, see Chapter 3, [LZ05, SMWM06]), and (3) interoperator,<br />

vertical parallelism (pipelining <strong>of</strong> messages and message parts, see Chapter 4,<br />

[PVHL09a, PVHL09b]). Although these techniques can significantly increase the resource<br />

utilization and thus, increase the throughput, they do not reduce the executed work. We<br />

use an example to illustrate the problem <strong>of</strong> expensive external system access and how<br />

multi-flow optimization addresses this problem.<br />

Example 5.1 (Instance-<strong>Based</strong> Orders Processing). Assume our example plan P 2 (Figure<br />

5.1(a)). The instance-based execution model initiates a new plan instance p i for each<br />

Receive (o1)<br />

[service: s5, out: msg1]<br />

Assign (o2)<br />

[in: msg1, out: msg2]<br />

m1 [“CustA“]<br />

Q1: SELECT *<br />

dequeue<br />

SELECT *<br />

m1 p1: o1 o2 o3 o4 o5 o6<br />

FROM s4.Credit<br />

m2 [“CustB“]<br />

WHERE Customer=<br />

dequeue<br />

Qi<br />

m2<br />

m3 [“CustA“]<br />

p2: o1 o2 o3 o4 o5 o6<br />

with<br />

= mi/Customer/Cname<br />

dequeue<br />

m4 [“CustC“]<br />

m3 Q3: SELECT *<br />

p3: o1 o2 o3 o4 o5 o6<br />

m5 [“CustB“]<br />

m6 [“CustC“]<br />

enqueue<br />

(a) Example Plan P 2<br />

Invoke (o3)<br />

[service: s4, in: msg2, out: msg3]<br />

INNER<br />

Join (o4)<br />

[in: msg1,msg3, out: msg4]<br />

Assign (o5)<br />

[in: msg4, out: msg5]<br />

Invoke (o6)<br />

[service s3, in: msg5]<br />

Qi:<br />

Standard Message Queue<br />

(b) Instance-<strong>Based</strong> Plan Execution <strong>of</strong> P 2<br />

FROM s4.Credit<br />

WHERE Customer=“CustA“<br />

Q2: SELECT *<br />

FROM s4.Credit<br />

WHERE Customer=“CustB“<br />

FROM s4.Credit<br />

WHERE Customer=“CustA“<br />

Figure 5.1: Example Instance-<strong>Based</strong> Plan Execution<br />

130

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!