25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4 Vectorizing <strong>Integration</strong> <strong>Flows</strong><br />

Over time and hence, with an increasing number <strong>of</strong> plan instances n, the performance<br />

improvement regarding the total execution time grows linearly. We use the term performance<br />

in the sense <strong>of</strong> high throughput and low execution time <strong>of</strong> a finite message sequence<br />

M ′ . For this case <strong>of</strong> a finite sequence, where the incoming order <strong>of</strong> this sequence must be<br />

preserved, (1) the execution time W (P, M ′ ), (2) the message throughput |M ′ |/∆t, and (3)<br />

the latency time T L (M ′ ) are correlated. According to Little’s Law [Lit61], the rationale<br />

for this is the waiting time within the system because instances <strong>of</strong> one plan must not<br />

be executed in parallel. In more detail, decreasing total execution time <strong>of</strong> the message<br />

subsequence decreases the waiting time, increases the message throughput and thus finally<br />

decreases the total latency time <strong>of</strong> this message sequence:<br />

W (P, M ′ ) ∝<br />

1<br />

|M ′ |/∆t ∝ T L(M ′ ). (4.3)<br />

However, the latency <strong>of</strong> single messages can be higher for vectorized plans compared to<br />

instance-based plans.<br />

Message and Flow Meta Model<br />

In order to allow for transparent vectorization as an internal optimization technique, the<br />

control-flow semantics must be preserved when vectorizing a plan. Therefore, we extended<br />

the Message Transformation Model (MTM) (Subsection 2.3.1) in order to make it applicable<br />

for vectorized plans, where we refer to it as VMTM.<br />

In the VMTM, we extend the message meta model from a triple to a quadruple with<br />

m i = (t i , d i , a i , c i ), where the context information c denotes an additional map <strong>of</strong> atomic<br />

name-value attribute pairs with c ij = (n j , v j ). This extension is necessary due to processing<br />

<strong>of</strong> multiple messages within one single standing (vectorized) plan instead <strong>of</strong> independent<br />

plan instances. Thus, instance-related context information such as local variables<br />

(e.g., counters or extracted attribute values) must be stored within the messages.<br />

In contrast to the MTM flow meta model, in the VMTM, the flow relations between<br />

operators o i do not specify the control flow (temporal dependencies) but the explicit data<br />

flow in the form <strong>of</strong> message streams. Additionally, the Fork operator is removed because<br />

in the vectorized case, operators are inherently executed in parallel. Finally, we introduce<br />

the additional operators And and Xor for synchronization <strong>of</strong> operators in order to preserve<br />

Table 4.1: Additional Operators <strong>of</strong> the VMTM<br />

Name Description Input Output Complex<br />

And Reads a synchronization ID and a single<br />

message and forwards the read message.<br />

Xor Reads a synchronization ID and/or multiple<br />

messages and outputs all messages,<br />

which synchronization IDs have already<br />

been seen. Thus, this operator has an intraoperator<br />

state <strong>of</strong> IDs and messages.<br />

Copy Gets a single message, then copies it n − 1<br />

times and puts those messages into the n<br />

output queues.<br />

(2,2) (1,1) No<br />

(1,*) (0,*) No<br />

(1,1) (2,*) No<br />

92

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!