Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
5 Multi-Flow <strong>Optimization</strong><br />
cannot be processed as a stream such that these influences are additive components rather<br />
than being subsumed by the most time-consuming influence.<br />
Problem 5.2 (Cache Coherency Problem). One solution to Problem 5.1 might be the<br />
caching <strong>of</strong> results <strong>of</strong> external queries. However, this fails due to the integration <strong>of</strong> heterogeneous<br />
and highly distributed systems and applications (loosely coupled without any<br />
notification mechanisms). In such distributed environments, caching is not applicable because<br />
the central integration platform cannot ensure that the cached data is consistent with<br />
the data in the source systems [LZL07]. A similar problem is also known for caching proxy<br />
servers, which might break client cache directives (RFC 3143) [CD01].<br />
Due to this problem, other projects, such as the MT-Cache, use currency bounds (maximum<br />
time <strong>of</strong> caching certain data objects) [GLRG04]. However, they can only ensure weak<br />
consistency, while for integration flows eventual consistency is needed as described in Subsection<br />
2.3.2. Caching (without semantic cache invalidation) cannot ensure the properties<br />
<strong>of</strong> (1) read-your-writes (consistency between a writing Invoke and a subsequent reading<br />
Invoke <strong>of</strong> the same data object) and (2) session consistency (consistency between multiple<br />
reading Invoke <strong>of</strong> the same data object) <strong>of</strong> eventual consistency [Vog08].<br />
Problem 5.3 (Serialized External Behavior). Depending on the involved external systems,<br />
we need to ensure the serial order <strong>of</strong> messages (see Problem 2.2 Message Outrun). For<br />
example, this can be caused by referential integrity constraints within the target systems.<br />
Thus, we need to guarantee monotonic reads and writes for individual data objects.<br />
Given these problems, the optimization objective <strong>of</strong> throughput maximization has so far<br />
only been addressed by leveraging a higher degree <strong>of</strong> parallelism, such as (1) intra-operator,<br />
horizontal parallelism (data partitioning, see [BABO + 09]), (2) inter-operator, horizontal<br />
parallelism (explicit parallel subflows, see Chapter 3, [LZ05, SMWM06]), and (3) interoperator,<br />
vertical parallelism (pipelining <strong>of</strong> messages and message parts, see Chapter 4,<br />
[PVHL09a, PVHL09b]). Although these techniques can significantly increase the resource<br />
utilization and thus, increase the throughput, they do not reduce the executed work. We<br />
use an example to illustrate the problem <strong>of</strong> expensive external system access and how<br />
multi-flow optimization addresses this problem.<br />
Example 5.1 (Instance-<strong>Based</strong> Orders Processing). Assume our example plan P 2 (Figure<br />
5.1(a)). The instance-based execution model initiates a new plan instance p i for each<br />
Receive (o1)<br />
[service: s5, out: msg1]<br />
Assign (o2)<br />
[in: msg1, out: msg2]<br />
m1 [“CustA“]<br />
Q1: SELECT *<br />
dequeue<br />
SELECT *<br />
m1 p1: o1 o2 o3 o4 o5 o6<br />
FROM s4.Credit<br />
m2 [“CustB“]<br />
WHERE Customer=<br />
dequeue<br />
Qi<br />
m2<br />
m3 [“CustA“]<br />
p2: o1 o2 o3 o4 o5 o6<br />
with<br />
= mi/Customer/Cname<br />
dequeue<br />
m4 [“CustC“]<br />
m3 Q3: SELECT *<br />
p3: o1 o2 o3 o4 o5 o6<br />
m5 [“CustB“]<br />
m6 [“CustC“]<br />
enqueue<br />
(a) Example Plan P 2<br />
Invoke (o3)<br />
[service: s4, in: msg2, out: msg3]<br />
INNER<br />
Join (o4)<br />
[in: msg1,msg3, out: msg4]<br />
Assign (o5)<br />
[in: msg4, out: msg5]<br />
Invoke (o6)<br />
[service s3, in: msg5]<br />
Qi:<br />
Standard Message Queue<br />
(b) Instance-<strong>Based</strong> Plan Execution <strong>of</strong> P 2<br />
FROM s4.Credit<br />
WHERE Customer=“CustA“<br />
Q2: SELECT *<br />
FROM s4.Credit<br />
WHERE Customer=“CustB“<br />
FROM s4.Credit<br />
WHERE Customer=“CustA“<br />
Figure 5.1: Example Instance-<strong>Based</strong> Plan Execution<br />
130