Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
4.2 Plan Vectorization<br />
a subplan <strong>of</strong> our example plan P 1 shown in Figure 4.6(a). The vectorized plan P 1 ′ is<br />
the output <strong>of</strong> the A-PV and it is shown in Figure 4.6(b). There, the Switch-specific<br />
rewriting technique has been applied, where we have created two pipeline branches (one<br />
for each switch-path) and changed the data flow such that messages are passed through<br />
the Switch operator. In order to avoid message outrun, we inserted the XOR operator<br />
and the synchronization queue. Note that the full plan <strong>of</strong> P 1 required additional Copy and<br />
And operators for the last Invoke operator because it depends directly on the output <strong>of</strong> the<br />
Translation operator. This serialization concept will be discussed separately.<br />
Rewriting Iteration operators. When rewriting Iteration operators, the main problem<br />
is also the message outrun. We must ensure that all iteration loops (for a message)<br />
have been processed before the next message enters. Basically, a foreach Iteration is<br />
rewritten to a sequence <strong>of</strong> (1) one Split operator, (2) operators <strong>of</strong> the Iteration body<br />
and (3) one Setoperation (UNION ALL) operator. Using this strategy, inherently leads to<br />
the highest degree <strong>of</strong> parallelism, while it requires only moderate additional costs for splitting<br />
and merging. In contrast to this, iterations with while semantics are not vectorized<br />
(one single execution bucket) because we cannot guarantee semantic correctness.<br />
Rewriting Savepoint operators. Within the instance-based model, we can use messagespecific<br />
and context-specific savepoints. Due to the missing global context, we need to<br />
reduce the savepoint semantics to the storage <strong>of</strong> messages, where context information needs<br />
to be stored via specific messages. However, in order to ensure the semantic correctness,<br />
we require two different rewriting methodologies. The message-specific savepoint is simply<br />
vectorized in a standard manner. In contrast to this, the context-specific savepoint, that<br />
stores all current messages at a certain plan position, must be rewritten in a more complex<br />
way. Here, we need to insert one savepoint into each parallel data flow branch with respect<br />
to the operator position in the instance-based case.<br />
Rewriting Invoke operators. In order to realize the serialization <strong>of</strong> external behavior<br />
(precondition for transparency), we must ensure that explicitly modeled sequences <strong>of</strong> writing<br />
interactions (Invoke operators) are serialized (see Rule 3 <strong>of</strong> Definition 3.1). Hence,<br />
we use the And operator for synchronization purposes. If (1) two Invoke operators have<br />
a temporal dependency within P , (2) they perform a writing interaction to the same external<br />
system, and (3) they are included in different pipelines in P ′ , we insert an And<br />
operator right before the second Invoke operator as well as a synchronization queue between<br />
the first Invoke operator and the And operator. The And operator reads from the<br />
synchronization queue and from the original queue and synchronizes the external behavior<br />
by deferring all messages until its source message ID is available in the synchronization<br />
queue. We use an example to illustrate this concept.<br />
Example 4.4 (Serialization <strong>of</strong> External Behavior). Assume a dependency graph DG(P 8 )<br />
<strong>of</strong> a subplan <strong>of</strong> our example plan P 8 (see Figure 4.7(a)) to be part <strong>of</strong> a data-driven integration<br />
flow. If we vectorize this subplan to P 8 ′ (see Figure 4.7(b)) with two pipeline branches,<br />
we need to ensure the serialized external behavior. We inserted an And operator, where the<br />
first Invoke sends synchronizing source message IDs to this operator using the introduced<br />
synchronization queue. Only in the case that the Assign as well as the first Invoke have<br />
been processed successfully, the payload message <strong>of</strong> the right pipeline branch is forwarded<br />
to the second Invoke.<br />
In summary, when rewriting instance-based plans to vectorized plans, we guarantee<br />
semantic correctness for context-specific operators as well.<br />
97