Algorithms and Data Structures for External Memory

More documents

Recommendations

Info

However, not all memory references are created equal. Large address spaces span multiple levels of the memory hierarchy, and accessing the data in the lowest levels of memory is orders of magnitude faster than accessing the data at the higher levels. For example, loading a register can take a fraction of a nanosecond (10 −9 seconds), and accessing internal memory takes several nanoseconds, but the latency of accessing data on a disk is multiple milliseconds (10 −3 seconds), which is about one million times slower! In applications that process massive amounts of data, the Input/Output communication (or simply I/O) between levels of memory is often the bottleneck. Many computer programs exhibit some degree of locality in their pattern of memory references: Certain data are referenced repeatedly for a while, and then the program shifts attention to other sets of data. Modern operating systems take advantage of such access patterns by tracking the program’s so-called “working set” —avague notion that roughly corresponds to the recently referenced data items [139]. If the working set is small, it can be cached in high-speed memory so that access to it is fast. Caching and prefetching heuristics have been developed to reduce the number of occurrences of a “fault,” in which the referenced data item is not in the cache and must be retrieved by an I/O from a higher level of memory. For example, in a page fault, an I/O is needed to retrieve a disk page from disk and bring it into internal memory. Caching and prefetching methods are typically designed to be general-purpose, and thus they cannot be expected to take full advantage of the locality present in every computation. Some computations themselves are inherently nonlocal, and even with omniscient cache management decisions they are doomed to perform large amounts of I/O and suffer poor performance. Substantial gains in performance may be possible by incorporating locality directly into the algorithm design and by explicit management of the contents of each level of the memory hierarchy, thereby bypassing the virtual memory system. We refer to algorithms and data structures that explicitly manage data placement and movement as external memory (or EM ) algorithms and data structures. Some authors use the terms I/O algorithms or out-of-core algorithms. We concentrate in this manuscript on the I/O 3
4 Introduction communication between the random access internal memory and the magnetic disk external memory, where the relative difference in access speeds is most apparent. We therefore use the term I/O to designate the communication between the internal memory and the disks. 1.1 Overview In this manuscript, we survey several paradigms for exploiting locality and thereby reducing I/O costs when solving problems in external memory. The problems we consider fall into two general categories: (1) Batched problems, in which no preprocessing is done and the entire file of data items must be processed, often by streaming the data through the internal memory in one or more passes. (2) Online problems, in which computation is done in response to a continuous series of query operations. A common technique for online problems is to organize the data items via a hierarchical index, so that only a very small portion of the data needs to be examined in response to each query. The data being queried can be either static, which can be preprocessed for efficient query processing, or dynamic, where the queries are intermixed with updates such as insertions and deletions. We base our approach upon the parallel disk model (PDM) described in the next chapter. PDM provides an elegant and reasonably accurate model for analyzing the relative performance of EM algorithms and data structures. The three main performance measures of PDM are the number of (parallel) I/O operations, the disk space usage, and the (parallel) CPU time. For reasons of brevity, we focus on the first two measures. Most of the algorithms we consider are also efficient in terms of CPU time. In Chapter 3, we list four fundamental I/O bounds that pertain to most of the problems considered in this manuscript. In Chapter 4, we show why it is crucial for EM algorithms to exploit locality, and we discuss an automatic load balancing technique called disk striping for using multiple disks in parallel.
Page 1 and 2: TCSv2n4.qxd 4/24/2008 11:56 AM Page
Page 4 and 5: Algorithms and Data Structures for
Page 6 and 7: Foundations and Trends R○ in Theo
Page 8 and 9: Foundations and Trends R○ in Theo
Page 10 and 11: Preface I first became fascinated a
Page 12: Preface xi I would like to thank my
Page 15 and 16: xiv Contents 6 Lower Bounds on I/O
Page 18 and 19: 1 Introduction The world is drownin
Page 22 and 23: 1.1 Overview 5 Our general goal is
Page 24: Table 1.1 Paradigms for I/O efficie
Page 27 and 28: 10 Parallel Disk Model (PDM) spindl
Page 29 and 30: 12 Parallel Disk Model (PDM) D = nu
Page 31 and 32: 14 Parallel Disk Model (PDM) The pr
Page 33 and 34: 16 Parallel Disk Model (PDM) 2.3 Re
Page 35 and 36: 18 Parallel Disk Model (PDM) archit
Page 38 and 39: 3 Fundamental I/O Operations and Bo
Page 40: As Table 3.1 indicates, the multipl
Page 43 and 44: 26 Exploiting Locality and Load Bal
Page 45 and 46: 28 Exploiting Locality and Load Bal
Page 47 and 48: 30 External Sorting and Related Pro
Page 71 and 72:
54 External Sorting and Related Pro
Page 74 and 75:
6 Lower Bounds on I/O In this chapt
Page 76 and 77:
6.1 Permuting 59 in backwards order
Page 78 and 79:
6.2 Lower Bounds for Sorting and Ot
Page 80:
6.2 Lower Bounds for Sorting and Ot
Page 83 and 84:
66 Matrix and Grid Computations the
Page 86 and 87:
8 Batched Problems in Computational
Page 88 and 89:
8.1 Distribution Sweep 71 Goodrich
Page 90 and 91:
8.1 Distribution Sweep 73 the curre
Page 92 and 93:
Time (seconds) Time (seconds) 7000
Page 94 and 95:
9 Batched Problems on Graphs The pr
Page 96 and 97:
Table 9.2 Best known I/O bounds for
Page 98 and 99:
9.2 Special Cases 81 that contains
Page 100 and 101:
10 External Hashing for Online Dict
Page 102 and 103:
2 000 000 001 010 011 100 101 110 1
Page 104 and 105:
10.2 Directoryless Methods 87 a tot
Page 106 and 107:
11 Multiway Tree Data Structures In
Page 108 and 109:
11.1 B-trees and Variants 91 “sha
Page 110 and 111:
11.3 Parent Pointers and Level-Bala
Page 112 and 113:
11.4 Buffer Trees 95 I/Os by follow
Page 114:
11.4 Buffer Trees 97 between update
Page 117 and 118:
100 Spatial Data Structures and Ran
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 136 and 137:
13 Dynamic and Kinetic Data Structu
Page 138 and 139:
13.2 Continuously Moving Items 121
Page 140 and 141:
14 String Processing In this chapte
Page 142 and 143:
5 a b a 3 c a 0 b a 4 b 6 6 a b a b
Page 144 and 145:
14.3 Suffix Trees and Suffix Arrays
Page 146 and 147:
15 Compressed Data Structures The f
Page 148 and 149:
15.1 Data Representations and Compr
Page 150 and 151:
15.2 External Memory Compressed Dat
Page 152 and 153:
Page 154:
Page 157 and 158:
140 Dynamic Memory Allocation For s
Page 159 and 160:
142 External Memory Programming Env
Page 161 and 162:
144 External Memory Programming Env
Page 163 and 164:
146 Conclusions intriguing connecti
Page 165 and 166:
148 Notations and Acronyms lowercas
Page 167 and 168:
150 Notations and Acronyms � �
Page 169 and 170:
152 References Proceedings of the A
Page 171 and 172:
154 References [41] L. Arge, K. H.
Page 173 and 174:
156 References Computer Science, pp
Page 175 and 176:
158 References ACM Conference on In
Page 177 and 178:
160 References [132] F. K. H. A. De
Page 179 and 180:
162 References [165] R. W. Floyd,
Page 181 and 182:
164 References [195] M. R. Henzinge
Page 183 and 184:
166 References [228] K. Küspert,
Page 185 and 186:
168 References [261] D. R. Morrison
Page 187 and 188:
170 References [293] J. T. Robinson
Page 189 and 190:
172 References [325] “TerraServer
Page 191:
174 References [356] O. Wolfson, P.
show all

Algorithms and Data Structures for External Memory

Create successful ePaper yourself

Delete template?

Save as template?