25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.2 Plan Vectorization<br />

a subplan <strong>of</strong> our example plan P 1 shown in Figure 4.6(a). The vectorized plan P 1 ′ is<br />

the output <strong>of</strong> the A-PV and it is shown in Figure 4.6(b). There, the Switch-specific<br />

rewriting technique has been applied, where we have created two pipeline branches (one<br />

for each switch-path) and changed the data flow such that messages are passed through<br />

the Switch operator. In order to avoid message outrun, we inserted the XOR operator<br />

and the synchronization queue. Note that the full plan <strong>of</strong> P 1 required additional Copy and<br />

And operators for the last Invoke operator because it depends directly on the output <strong>of</strong> the<br />

Translation operator. This serialization concept will be discussed separately.<br />

Rewriting Iteration operators. When rewriting Iteration operators, the main problem<br />

is also the message outrun. We must ensure that all iteration loops (for a message)<br />

have been processed before the next message enters. Basically, a foreach Iteration is<br />

rewritten to a sequence <strong>of</strong> (1) one Split operator, (2) operators <strong>of</strong> the Iteration body<br />

and (3) one Setoperation (UNION ALL) operator. Using this strategy, inherently leads to<br />

the highest degree <strong>of</strong> parallelism, while it requires only moderate additional costs for splitting<br />

and merging. In contrast to this, iterations with while semantics are not vectorized<br />

(one single execution bucket) because we cannot guarantee semantic correctness.<br />

Rewriting Savepoint operators. Within the instance-based model, we can use messagespecific<br />

and context-specific savepoints. Due to the missing global context, we need to<br />

reduce the savepoint semantics to the storage <strong>of</strong> messages, where context information needs<br />

to be stored via specific messages. However, in order to ensure the semantic correctness,<br />

we require two different rewriting methodologies. The message-specific savepoint is simply<br />

vectorized in a standard manner. In contrast to this, the context-specific savepoint, that<br />

stores all current messages at a certain plan position, must be rewritten in a more complex<br />

way. Here, we need to insert one savepoint into each parallel data flow branch with respect<br />

to the operator position in the instance-based case.<br />

Rewriting Invoke operators. In order to realize the serialization <strong>of</strong> external behavior<br />

(precondition for transparency), we must ensure that explicitly modeled sequences <strong>of</strong> writing<br />

interactions (Invoke operators) are serialized (see Rule 3 <strong>of</strong> Definition 3.1). Hence,<br />

we use the And operator for synchronization purposes. If (1) two Invoke operators have<br />

a temporal dependency within P , (2) they perform a writing interaction to the same external<br />

system, and (3) they are included in different pipelines in P ′ , we insert an And<br />

operator right before the second Invoke operator as well as a synchronization queue between<br />

the first Invoke operator and the And operator. The And operator reads from the<br />

synchronization queue and from the original queue and synchronizes the external behavior<br />

by deferring all messages until its source message ID is available in the synchronization<br />

queue. We use an example to illustrate this concept.<br />

Example 4.4 (Serialization <strong>of</strong> External Behavior). Assume a dependency graph DG(P 8 )<br />

<strong>of</strong> a subplan <strong>of</strong> our example plan P 8 (see Figure 4.7(a)) to be part <strong>of</strong> a data-driven integration<br />

flow. If we vectorize this subplan to P 8 ′ (see Figure 4.7(b)) with two pipeline branches,<br />

we need to ensure the serialized external behavior. We inserted an And operator, where the<br />

first Invoke sends synchronizing source message IDs to this operator using the introduced<br />

synchronization queue. Only in the case that the Assign as well as the first Invoke have<br />

been processed successfully, the payload message <strong>of</strong> the right pipeline branch is forwarded<br />

to the second Invoke.<br />

In summary, when rewriting instance-based plans to vectorized plans, we guarantee<br />

semantic correctness for context-specific operators as well.<br />

97

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!