Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
2 Preliminaries and Existing Techniques<br />
tion model [OAS06], where compensation flows are modeled by the user (e.g., the compensation<br />
<strong>of</strong> an INSERT would be a DELETE with the appropriate identifier). These compensations<br />
are executed for successfully executed parts <strong>of</strong> an integration flow. As a result, the<br />
compensated parts are rolled back (compensated) and completely re-executed after that.<br />
With regard to arbitrary external systems and applications, there might exist operations<br />
where no compensation exists at all. Second, there is the recovery-based transaction model<br />
[BHLW08a, SWDC10] that tries to address the problem <strong>of</strong> missing compensations. Here,<br />
REDO-images—in the sense <strong>of</strong> output messages <strong>of</strong> successfully executed operators—are<br />
stored in order to resume integration flows after the last successful operator. In conclusion,<br />
the problems <strong>of</strong> message lost and message double processing are typically addressed with<br />
persistent message storage and a tailor-made recovery model. Thus, the contract <strong>of</strong> an integration<br />
platform can be extended from store-and-forward to a form that guarantees that<br />
each received message will be successfully delivered exactly once to the external systems.<br />
Beside these data-related guarantees also temporal guarantees must be ensured. From<br />
the perspective <strong>of</strong> integration flow optimization, we would consider executing subsequent<br />
plan instances in parallel. Unfortunately, the problem <strong>of</strong> message outrun would arise.<br />
Problem 2.3 (Message Outrun). Assume two messages m 1 and m 2 , where m 1 arrives<br />
earlier at the integration platform than m 2 , with t 1 < t 2 . If we execute the two resulting<br />
plan instances p 1 and p 2 in parallel, an outrun <strong>of</strong> messages in terms <strong>of</strong> changed sequential<br />
order <strong>of</strong> messages at the outbound side might take place and the result <strong>of</strong> p 2 is sent to<br />
the external system s 1 before the result <strong>of</strong> p 1 . For example, if customer master data is<br />
propagated to the external system s 1 with the customer’s first order, a message outrun can<br />
result in a referential integrity conflict within the target system s 1 . Additional examples<br />
from the area <strong>of</strong> financial messaging that also require serialization are financial statements<br />
and stock exchange orders.<br />
To tackle this problem, typically, inbound message queues are used in combination with<br />
single-threaded plan execution. This serialized execution <strong>of</strong> plan instances guarantees that<br />
no message outrun can take place. This is comparable to snapshot isolation in DBMS<br />
[LKPMJP05, CRF08]. Hence, internal out-<strong>of</strong>-order processing would be possible, because<br />
we only need to ensure the serialized external behavior in the sense that the inbound order<br />
is equivalent to the outbound order <strong>of</strong> messages. More formally, eventual consistency<br />
[Vog08] with the property <strong>of</strong> monotonic writes (serialize the writes <strong>of</strong> the same plan), and<br />
thus, with convergence property, must be guaranteed. In addition, also monotonic reads<br />
with regard to individual data objects must be ensured.<br />
The mentioned transactional properties have several implications for the cost-based<br />
optimization <strong>of</strong> integration flows. First, when rewriting plans during optimization, we must<br />
be aware <strong>of</strong> the problems <strong>of</strong> message lost, message double processing, and message outrun.<br />
Second, the contract <strong>of</strong> an integration platform with any client application or system<br />
is that each received message must be successfully delivered, in arrival-order (monotonic<br />
writes), with monotonic reads from external systems, exactly once to the external systems.<br />
2.4 Use Cases<br />
From a business perspective, we distinguish between horizontal and vertical integration <strong>of</strong><br />
information systems [Sch01]. In this section, we illustrate an example scenario for both<br />
use cases, including concrete integration flows that we will use as running examples and<br />
26