Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
4 Vectorizing <strong>Integration</strong> <strong>Flows</strong><br />
Over time and hence, with an increasing number <strong>of</strong> plan instances n, the performance<br />
improvement regarding the total execution time grows linearly. We use the term performance<br />
in the sense <strong>of</strong> high throughput and low execution time <strong>of</strong> a finite message sequence<br />
M ′ . For this case <strong>of</strong> a finite sequence, where the incoming order <strong>of</strong> this sequence must be<br />
preserved, (1) the execution time W (P, M ′ ), (2) the message throughput |M ′ |/∆t, and (3)<br />
the latency time T L (M ′ ) are correlated. According to Little’s Law [Lit61], the rationale<br />
for this is the waiting time within the system because instances <strong>of</strong> one plan must not<br />
be executed in parallel. In more detail, decreasing total execution time <strong>of</strong> the message<br />
subsequence decreases the waiting time, increases the message throughput and thus finally<br />
decreases the total latency time <strong>of</strong> this message sequence:<br />
W (P, M ′ ) ∝<br />
1<br />
|M ′ |/∆t ∝ T L(M ′ ). (4.3)<br />
However, the latency <strong>of</strong> single messages can be higher for vectorized plans compared to<br />
instance-based plans.<br />
Message and Flow Meta Model<br />
In order to allow for transparent vectorization as an internal optimization technique, the<br />
control-flow semantics must be preserved when vectorizing a plan. Therefore, we extended<br />
the Message Transformation Model (MTM) (Subsection 2.3.1) in order to make it applicable<br />
for vectorized plans, where we refer to it as VMTM.<br />
In the VMTM, we extend the message meta model from a triple to a quadruple with<br />
m i = (t i , d i , a i , c i ), where the context information c denotes an additional map <strong>of</strong> atomic<br />
name-value attribute pairs with c ij = (n j , v j ). This extension is necessary due to processing<br />
<strong>of</strong> multiple messages within one single standing (vectorized) plan instead <strong>of</strong> independent<br />
plan instances. Thus, instance-related context information such as local variables<br />
(e.g., counters or extracted attribute values) must be stored within the messages.<br />
In contrast to the MTM flow meta model, in the VMTM, the flow relations between<br />
operators o i do not specify the control flow (temporal dependencies) but the explicit data<br />
flow in the form <strong>of</strong> message streams. Additionally, the Fork operator is removed because<br />
in the vectorized case, operators are inherently executed in parallel. Finally, we introduce<br />
the additional operators And and Xor for synchronization <strong>of</strong> operators in order to preserve<br />
Table 4.1: Additional Operators <strong>of</strong> the VMTM<br />
Name Description Input Output Complex<br />
And Reads a synchronization ID and a single<br />
message and forwards the read message.<br />
Xor Reads a synchronization ID and/or multiple<br />
messages and outputs all messages,<br />
which synchronization IDs have already<br />
been seen. Thus, this operator has an intraoperator<br />
state <strong>of</strong> IDs and messages.<br />
Copy Gets a single message, then copies it n − 1<br />
times and puts those messages into the n<br />
output queues.<br />
(2,2) (1,1) No<br />
(1,*) (0,*) No<br />
(1,1) (2,*) No<br />
92