Presburger Arithmetic and Its Use in Verification

More documents

Recommendations

Info

CHAPTER 3. FUNCTIONAL PARADIGM AND MULTICORE PARALLELISM Sort algorithm should be improved before implementation. A consideration could be replacement of sequential merging by parallel merging as it has been described in Section 2.2.3. The divide-and-conquer nature of parallel merging helps to reduce Span to O(log 2 n), so the parallelism factor is O(n / log n) whichismuch better than that of the original algorithm. This parallelism factor also confirms our experience of using parallel merging effectively (see Figure 2.3). Take a look at our example of calculating π, timecomplexityofthesequential algorithm is O(n), the Select phase results in the critical path length of O(1) and the Sum phase has the critical path length of O(log n). Therefore we have a Span of O(log n) andaparallelism factor of O(n / log n). We can see that the parallel algorithm is efficient; therefore, a good speedup is feasible in practice. 3.2 Some pitfalls of functional parallelism on the multicore architecture As discussed in Chapter 2, F# and functional programming languages in general are suitable for fast prototyping parallel algorithms. The default immutability helps developers to ensure the correctness of implementation at first, and parallel implementation is obtained with a little effort when making use of elegant parallel constructs. However, the efficiency of these implementations is still an open question. This section discusses some issues of functional parallelism on the multicore architecture and possible workarounds in the context of F#. Memory Allocation and Garbage Collection High-level programming languages often make use of a garbage collector to discard unused objects when necessary. In these languages, users have little control over memory allocation and garbage collection; in the middle of execution of some threads if the garbage collector needs to do its work, it is going to badly affect performance. The advice for avoiding garbage collection is allocating less which leads to garbage collecting less by the runtime system. However, for some memory-bound applications it is difficult to reduce the amount of allocated memory. The problem of memory allocation is more severe in the context of functional programming. Function programming promotes usage of short-lived objects to ensure immutability, and those short-lived objects are created and destroyed frequently, which leads to more work for garbage collectors. Also some garbage collectors are inherently sequential and preventing threads to run in parallel, Matthew et al. stated that sequential garbage collection is a bottleneck to parallel programming in Poly/ML [20]. F# inherits the garbage collector (GC) from .NET runtime, which runs in a separate thread along with applications. Whenever GC needs to collect unused data, it suspends all others’ threads and resumes them when its job is done. .NET GC is running quite efficiently in that concurrent manner; however, if garbage collection occurs often, using the Server GC might be a good choice. The Server GC creates 22
3.2. SOME PITFALLS OF FUNCTIONAL PARALLELISM ON THE MULTICORE ARCHITECTURE aGCthreadforeachcoresoscalabilityofgarbagecollectionisbetter[19]. Come back to two examples which have been introduced in Chapter 2, one of the reasons for a linear speedup of π calculation (see Table 2.1) is no further significant memory allocation except the input array. However, a sublinear speedup of MergeSort algorithm (see Figure 2.3) could be explained by many rounds of garbage collection occurring when immediate arrays are discarded. False Cache-line Sharing When a CPU loads a memory location into cache, it also loads nearby memory locations into the same cache line. The reason is to make the access to this memory cell and nearby cells faster. In the context of multithreading, different threads writing to the same cache line may result in invalidation of all CPUs’ caches and significantly damage performance. In the functional-programming setting, false cache-line sharing is less critical because each value is often written only once when it is initialized. But the fact that consecutive memory allocations make independent data fall into the same cache line also causes problem. Some workarounds are padding data which are concurrently accessed or allocating memory locally in threads. We illustrate the problem by a small experiment as follows: an array which has asizeequaltothenumberofcoresandeacharrayelementisupdated10000000 times [25]. Because the size of the array is small, its all elements tend to fall into the same cache line and many concurrent updates to the same array will invalidate the cache line many times and badly influence the performance. The below code fragment shows concurrent updates on the same cache line: let par1() = let cores = System.Environment.ProcessorCount let counts = Array.zeroCreate cores Parallel.For(0, cores, fun i −> for j =1to 10000000 do counts.[i] ignore The measurement of sequential and parallel versions on the 8-core machine is shown as follows: > Real: 00:00:00.647,CPU: 00:00:00.670,GC gen0: 0,gen1: 0,gen2: 0// sequential > Real: 00:00:00.769,CPU: 00:00:11.310,GC gen0: 0,gen1: 0,gen2: 0// parallel The parallel variant is even slower than the sequential one. We can fix the problem by padding the array by garbage data, this approach is 17× faster than the naive sequential one: let par1Fix1() = let cores = System.Environment.ProcessorCount let padding =128/sizeof let counts = Array.zeroCreate ((1+cores)∗padding) Parallel.For(0, cores, fun i −> let paddedI =(1+i) ∗padding for j =1to 10000000 do 23
Page 1 and 2: Presburger Arithmetic and Its Use i
Page 3 and 4: Abstract Today, when every computer
Page 5 and 6: Preface The thesis is a part of the
Page 7: List of Abbreviations CPU DAG DC DN
Page 10 and 11: 5.2 Simplification of Presburger fo
Page 12 and 13: CHAPTER 1. INTRODUCTION functional
Page 15 and 16: Chapter 2 Multicore parallelism on
Page 17 and 18: 2.1. MULTICORE PARALLELISM: A BRIEF
Page 19 and 20: 2.2. MULTICORE PARALLELISM ON .NET
Page 29 and 30: Chapter 3 Functional paradigm and m
Page 31: 3.1. PARALLEL FUNCTIONAL ALGORITHMS
Page 35 and 36: 3.2. SOME PITFALLS OF FUNCTIONAL PA
Page 37 and 38: Chapter 4 Theory of Presburger Arit
Page 39 and 40: 4.2. DECISION PROCEDURES FOR PRESBU
Page 45 and 46: Chapter 5 Duration Calculus and Pre
Page 47 and 48: 5.1. DURATION CALCULUS AND SIDE-CON
Page 49 and 50: 5.2. SIMPLIFICATION OF PRESBURGER F
Page 51: 5.3. SUMMARY occur in equation do n
Page 54 and 55: CHAPTER 6. EXPERIMENTS ON PRESBURGE
Page 56 and 57: CHAPTER 6. EXPERIMENTS ON PRESBURGE
Page 59 and 60: Chapter 7 Parallel execution of dec
Page 61 and 62: 7.1. A PARALLEL VERSION OF COOPER
Page 63 and 64: 7.1. A PARALLEL VERSION OF COOPER
Page 65 and 66: 7.2. A PARALLEL VERSION OF THE OMEG
Page 71 and 72: Chapter 8 Conclusions In this work,
Page 73 and 74: References [1] Nikolaj Bjørner. Li
Page 75: [28] C. R. Reddy and D. W. Loveland
Page 78 and 79: APPENDIX A. EXAMPLES OF MULTICORE P
Page 80 and 81: APPENDIX A. EXAMPLES OF MULTICORE P
Page 82 and 83:
APPENDIX B. SOURCE CODE OF EXPERIME
Page 84 and 85:
Page 86 and 87:
Page 88 and 89:
Page 90 and 91:
Page 92 and 93:
Page 94 and 95:
Page 96 and 97:
Page 98 and 99:
Page 100:
TRITA-CSC-E 2011:108 ISRN-KTH/CSC/E
show all

Presburger Arithmetic and Its Use in Verification

Create successful ePaper yourself

Delete template?

Save as template?