Cost-Based Optimization of Integration Flows - Datenbanken ...

More documents

Recommendations

Info

3 Fundamentals of Optimizing Integration Flows our transformation-based optimization algorithm, approaches for search space reduction and adjusting the sensibility of workload adaptation as well as a lightweight concept for handling conditional probabilities and correlation. Further, we explained selected optimization techniques that are specific to integration flows because they exploit both the data flow and the control flow in a combined manner. Our evaluation shows significant performance improvements with moderate overhead for periodical re-optimization. In conclusion, our cost-based optimization approach can be integrated seamlessly into the major products in the area of integration platforms. Based on the observation of many independent instances of integration flows, this approach of periodical cost-based re-optimization is tailor-made for integration flows. In detail, the advantages of periodical re-optimization are (1) the asynchronous optimization independently of executing certain instances, (2) the fact that all subsequent instances rather than only the current query benefit from re-optimization, and (3) the inter-instance plan change without the need of state migration. This general optimization framework can be used as foundation for further rewriting techniques and optimization approaches. Apart from these re-optimization advantages, the optimization framework presented so far has still several shortcomings. First, only the optimization objective of minimizing the average plan execution time was considered. This is not always a suitable optimization objective because in high load scenarios often the major optimization objective is throughput maximization, while moderate latency times are acceptable. Therefore, in the following, we will present two integration-flow-specific optimization techniques that have the potential to significantly increase the message throughput. In detail, we present the cost-based vectorization (a control-flow-oriented optimization technique) in Chapter 4 and the multi-flow optimization (a data-flow-oriented optimization technique) in Chapter 5. Second, also the periodical re-optimization algorithm itself has several drawbacks. This includes the generic gathering of statistics for all operators that causes the maintenance of statistics that might not be used by the optimizer. While for the evaluated workload aggregation methods, this overhead was negligible, there might be performance issues when using more complex forecast models. In addition, there is the problem of periodically triggered re-optimization, where a new plan is only found if workload characteristics have changed. Otherwise, we trigger many unnecessary invocations of the optimizer that evaluates the complete search space. Depending on the used optimization techniques this can have notable performance implications. However, if a workload change occurs, it takes a while until re-optimization is triggered. During this adaptation delay, we thus use a suboptimal plan and miss optimization opportunities. Finally, the parameter ∆t (optimization period) has high influence on optimization and execution times and hence, parameterization requires awareness of changing workloads. These four drawbacks are addressed with the concept of on-demand re-optimization that we will present in Chapter 6. However, the periodical re-optimization already provides a reasonable optimization framework including many fundamental concepts and thus, is used as the conceptual basis of this thesis. 86
4 Vectorizing Integration Flows Based on the general cost-based optimization framework, in this chapter, we present the vectorization of integration flows [BHP + 09a, BHP + 09b, BHP + 11] as a control-floworiented optimization technique that is tailor-made for integration flows. This technique tackles the problem of low CPU utilization imposed by the instance-based plan execution of integration flows. The core idea is to transparently rewrite instance-based plans into vectorized plans with pipelined execution characteristics in order to exploit pipeline parallelism over multiple plan instances. Thus, this concept increases the message throughput, while it still ensures the required transactional properties. We call this concept vectorization because a vector of messages is processed at-a-time. In order to enable vectorization, we first describe necessary flow meta model extensions as well as the rule-based plan vectorization that ensures semantic correctness; i.e., the rewriting algorithm preserves the serialized external behavior. Furthermore, we present the cost-based vectorization that computes the optimal grouping of operators to multithreaded execution buckets in order to achieve the optimal degree of pipeline parallelism and hence, maximize message throughput. We present exhaustive, heuristic, and constrained computation approaches. In addition, we also discuss the cost-based vectorization for multiple deployed plans and we sketch how this rather complex optimization technique is embedded within our periodical re-optimization framework. Finally, the experimental evaluation shows that significant throughput improvements are achieved by vectorization, with a moderate increase of latency time for individual messages. The cost-based vectorization further increases this improvement and ensures robustness of vectorization. 4.1 Motivation and Problem Description In scenarios with high load of plan instances, the major optimization objective is often throughput maximization, where moderate latency times are acceptable [UGA + 09]. Unfortunately, despite the optimization techniques on parallelizing subflows, instance-based plans of integration flows, typically, do not achieve a high CPU utilization. Problem 4.1 (Low CPU Utilization). The low CPU utilization is mainly caused by (1) significant waiting times for external systems (for example, the plan instance is blocked, while executing external queries), (2) the trend towards multi- and many-core architectures, which stands in contrast to the single-threaded execution of instance-based integration flows, and (3) the IO bottleneck due to the need for message persistence to enable recoverability of plan instances. In conclusion of Problem 4.1 in combination with the existence of many independent plan instances, there are optimization opportunities with regard to the message throughput, which we could exploit by increasing the degree of parallelism. Essentially, we could leverage four different types of parallelism to overcome that problem, where we additionally use the classification [Gra90] of horizontal (parallel processing of data partitions) and vertical parallelism (pipelining): 87
Page 1:
Cost-Based Optimization of Integrat
Page 4 and 5:
of traditional data management syst
Page 7 and 8:
Contents 1 Introduction 1 2 Prelimi
Page 9:
Contents 6.5 Experimental Evaluatio
Page 12 and 13:
1 Introduction for integration flow
Page 14 and 15:
1 Introduction violated. We present
Page 16 and 17:
2 Preliminaries and Existing Techni
Page 18 and 19:
Page 20 and 21:
Page 22 and 23:
Page 24 and 25:
Page 26 and 27:
Page 28 and 29:
Page 30 and 31:
Page 32 and 33:
Page 34 and 35:
Page 36 and 37:
Page 38 and 39:
Page 40 and 41:
Page 42 and 43:
Page 44 and 45:
3 Fundamentals of Optimizing Integr
Page 46 and 47: 3 Fundamentals of Optimizing Integr
Page 98 and 99: 4 Vectorizing Integration Flows •
Page 100 and 101: 4 Vectorizing Integration Flows exe
Page 102 and 103: 4 Vectorizing Integration Flows Ove
Page 104 and 105: 4 Vectorizing Integration Flows Alg
Page 106 and 107: 4 Vectorizing Integration Flows two
Page 108 and 109: 4 Vectorizing Integration Flows Inv
Page 110 and 111: 4 Vectorizing Integration Flows (a)
Page 112 and 113: 4 Vectorizing Integration Flows o 2
Page 114 and 115: 4 Vectorizing Integration Flows In
Page 116 and 117: 4 Vectorizing Integration Flows 2.
Page 118 and 119: 4 Vectorizing Integration Flows The
Page 120 and 121: 4 Vectorizing Integration Flows ord
Page 122 and 123: 4 Vectorizing Integration Flows 4.3
Page 124 and 125: 4 Vectorizing Integration Flows P
Page 126 and 127: 4 Vectorizing Integration Flows P
Page 128 and 129: 4 Vectorizing Integration Flows t1:
Page 130 and 131: 4 Vectorizing Integration Flows We
Page 138 and 139: 4 Vectorizing Integration Flows 4.7
Page 140 and 141: 5 Multi-Flow Optimization cannot be
Page 142 and 143: 5 Multi-Flow Optimization the query
Page 144 and 145: 5 Multi-Flow Optimization The incom
Page 146 and 147:
5 Multi-Flow Optimization example p
Page 148 and 149:
5 Multi-Flow Optimization not allow
Page 150 and 151:
5 Multi-Flow Optimization partition
Page 152 and 153:
5 Multi-Flow Optimization mention i
Page 154 and 155:
5 Multi-Flow Optimization • Case
Page 156 and 157:
5 Multi-Flow Optimization k ′ . F
Page 158 and 159:
5 Multi-Flow Optimization partition
Page 160 and 161:
5 Multi-Flow Optimization the waiti
Page 162 and 163:
5 Multi-Flow Optimization Execution
Page 164 and 165:
5 Multi-Flow Optimization Thus, for
Page 166 and 167:
5 Multi-Flow Optimization Thus, T L
Page 168 and 169:
5 Multi-Flow Optimization • P 5 :
Page 170 and 171:
5 Multi-Flow Optimization decreasin
Page 172 and 173:
5 Multi-Flow Optimization (a) Fixed
Page 174 and 175:
5 Multi-Flow Optimization reached,
Page 176 and 177:
5 Multi-Flow Optimization (2) plan
Page 178 and 179:
6 On-Demand Re-Optimization categor
Page 180 and 181:
6 On-Demand Re-Optimization present
Page 182 and 183:
6 On-Demand Re-Optimization stratum
Page 184 and 185:
6 On-Demand Re-Optimization o 3 o 4
Page 186 and 187:
6 On-Demand Re-Optimization For on-
Page 188 and 189:
6 On-Demand Re-Optimization 6.3.1 O
Page 190 and 191:
6 On-Demand Re-Optimization such th
Page 192 and 193:
6 On-Demand Re-Optimization Join En
Page 194 and 195:
6 On-Demand Re-Optimization f γ((
Page 196 and 197:
6 On-Demand Re-Optimization The res
Page 198 and 199:
6 On-Demand Re-Optimization project
Page 200 and 201:
6 On-Demand Re-Optimization (a) Sel
Page 202 and 203:
6 On-Demand Re-Optimization (a) Loa
Page 204 and 205:
6 On-Demand Re-Optimization ical re
Page 206 and 207:
6 On-Demand Re-Optimization evaluat
Page 208 and 209:
6 On-Demand Re-Optimization 6.6 Sum
Page 210 and 211:
7 Conclusions Existing approaches b
Page 212 and 213:
Bibliography [BBD05a] Shivnath Babu
Page 214 and 215:
Bibliography [BHP + 09b] [BHP + 11]
Page 216 and 217:
Bibliography [CM95] Sophie Cluet an
Page 218 and 219:
Bibliography [GZ08] [HA03] [Haa07]
Page 220 and 221:
Bibliography [IKNG09] [INSS92] [Ioa
Page 222 and 223:
Bibliography [LX09] [LZ05] Rubao Le
Page 224 and 225:
Bibliography [OMG07] OMG. XML Metad
Page 226 and 227:
Bibliography [Sto02] Michael Stoneb
Page 228 and 229:
Bibliography [ZRH04] Yali Zhu, Elke
Page 230 and 231:
List of Figures 3.27 Workload Adapt
Page 233:
List of Tables 2.1 Interaction-Orie
Page 237:
Selbstständigkeitserklärung Hierm
show all

Cost-Based Optimization of Integration Flows - Datenbanken ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?