25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2 Preliminaries and Existing Techniques<br />

tion model [OAS06], where compensation flows are modeled by the user (e.g., the compensation<br />

<strong>of</strong> an INSERT would be a DELETE with the appropriate identifier). These compensations<br />

are executed for successfully executed parts <strong>of</strong> an integration flow. As a result, the<br />

compensated parts are rolled back (compensated) and completely re-executed after that.<br />

With regard to arbitrary external systems and applications, there might exist operations<br />

where no compensation exists at all. Second, there is the recovery-based transaction model<br />

[BHLW08a, SWDC10] that tries to address the problem <strong>of</strong> missing compensations. Here,<br />

REDO-images—in the sense <strong>of</strong> output messages <strong>of</strong> successfully executed operators—are<br />

stored in order to resume integration flows after the last successful operator. In conclusion,<br />

the problems <strong>of</strong> message lost and message double processing are typically addressed with<br />

persistent message storage and a tailor-made recovery model. Thus, the contract <strong>of</strong> an integration<br />

platform can be extended from store-and-forward to a form that guarantees that<br />

each received message will be successfully delivered exactly once to the external systems.<br />

Beside these data-related guarantees also temporal guarantees must be ensured. From<br />

the perspective <strong>of</strong> integration flow optimization, we would consider executing subsequent<br />

plan instances in parallel. Unfortunately, the problem <strong>of</strong> message outrun would arise.<br />

Problem 2.3 (Message Outrun). Assume two messages m 1 and m 2 , where m 1 arrives<br />

earlier at the integration platform than m 2 , with t 1 < t 2 . If we execute the two resulting<br />

plan instances p 1 and p 2 in parallel, an outrun <strong>of</strong> messages in terms <strong>of</strong> changed sequential<br />

order <strong>of</strong> messages at the outbound side might take place and the result <strong>of</strong> p 2 is sent to<br />

the external system s 1 before the result <strong>of</strong> p 1 . For example, if customer master data is<br />

propagated to the external system s 1 with the customer’s first order, a message outrun can<br />

result in a referential integrity conflict within the target system s 1 . Additional examples<br />

from the area <strong>of</strong> financial messaging that also require serialization are financial statements<br />

and stock exchange orders.<br />

To tackle this problem, typically, inbound message queues are used in combination with<br />

single-threaded plan execution. This serialized execution <strong>of</strong> plan instances guarantees that<br />

no message outrun can take place. This is comparable to snapshot isolation in DBMS<br />

[LKPMJP05, CRF08]. Hence, internal out-<strong>of</strong>-order processing would be possible, because<br />

we only need to ensure the serialized external behavior in the sense that the inbound order<br />

is equivalent to the outbound order <strong>of</strong> messages. More formally, eventual consistency<br />

[Vog08] with the property <strong>of</strong> monotonic writes (serialize the writes <strong>of</strong> the same plan), and<br />

thus, with convergence property, must be guaranteed. In addition, also monotonic reads<br />

with regard to individual data objects must be ensured.<br />

The mentioned transactional properties have several implications for the cost-based<br />

optimization <strong>of</strong> integration flows. First, when rewriting plans during optimization, we must<br />

be aware <strong>of</strong> the problems <strong>of</strong> message lost, message double processing, and message outrun.<br />

Second, the contract <strong>of</strong> an integration platform with any client application or system<br />

is that each received message must be successfully delivered, in arrival-order (monotonic<br />

writes), with monotonic reads from external systems, exactly once to the external systems.<br />

2.4 Use Cases<br />

From a business perspective, we distinguish between horizontal and vertical integration <strong>of</strong><br />

information systems [Sch01]. In this section, we illustrate an example scenario for both<br />

use cases, including concrete integration flows that we will use as running examples and<br />

26

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!