25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4 Vectorizing <strong>Integration</strong> <strong>Flows</strong><br />

(a) Number <strong>of</strong> Operators m<br />

(b) Input Data Size d<br />

Figure 4.24: Influence <strong>of</strong> λ with Different Numbers <strong>of</strong> Operators and Data Sizes<br />

detail, we see the near-optimal solution with λ = 0. Typically, when increasing the number<br />

<strong>of</strong> execution buckets from this point, the elapsed time increases. However, there are cases<br />

such as the plan with m = 5, where we observe that the instance-similar execution (with<br />

one bucket for all operators) performs better. The difference to traditional instance-based<br />

execution is caused by (1) reused operator instances (e.g., pre-parsed XSLT stylesheets <strong>of</strong><br />

Translation operators), and (2) pipelined inbound processing. Further, for m = 20, we<br />

see the optimal total execution time at λ = 10. Note that even in this case <strong>of</strong> k = 1, we<br />

reach better performance than in the instance-based case because there are no synchronous<br />

(blocking) calls through the whole engine but only within the single plan. In addition,<br />

for small numbers <strong>of</strong> processed messages n, the instance-based execution model performs<br />

worse due to the overhead <strong>of</strong> just-in-time compilation <strong>of</strong> generated plans. Furthermore,<br />

we observe that the higher the number <strong>of</strong> operators m, the higher the influence <strong>of</strong> the<br />

parameter λ. In conclusion, we typically find a very good solution with λ = 0, but when<br />

required, this parameter can be used to easily adjust the degree <strong>of</strong> parallelism.<br />

Furthermore, in the second sub-experiment, we used a single plan with m = 20, we fixed<br />

t = 0, q = 50 and we executed n = 250 messages with different λ and for different data sizes<br />

d ∈ {1, 4, 7} (in 100 kB). Figure 4.24(b) shows the results with regard to the execution<br />

time as well as the number <strong>of</strong> execution buckets (annotated at the top <strong>of</strong> each point) when<br />

varying λ. In general, we see similar behavior as in Figure 4.24(a) (for m = 20). The<br />

different numbers <strong>of</strong> execution buckets for d = 1 and λ ∈ (0, 10) are caused by dynamically<br />

monitored operator costs, which varied slightly. The major difference when comparing the<br />

influence <strong>of</strong> varying the data size with the previous sub-experiment is that the data size<br />

significantly increases the execution time <strong>of</strong> single operators. As a result, we observe that<br />

we require higher values <strong>of</strong> λ to reduce the number <strong>of</strong> execution buckets. In conclusion,<br />

λ should be configured with context knowledge about current workload characteristics.<br />

We could overcome this workload dependency with a relative value <strong>of</strong> λ according to<br />

the maximum operator costs. However, with λ as an absolute value, we can explicitly<br />

determine the maximum work-cycle increase <strong>of</strong> the data flow graph.<br />

In conclusion, there are several parameters with significant influence on the total execution<br />

time and on the behavior <strong>of</strong> cost-based vectorization. As a general heuristic, one<br />

should use a maximum costs increase <strong>of</strong> λ = 0. This simplest configuration typically results<br />

in near-optimal throughput. However, if more context knowledge about the workload<br />

is available, the described parameters can be used as tuning knobs.<br />

126

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!