Algorithms and Data Structures for External Memory

More documents

Recommendations

Info

12.3 Bootstrapping for 2-D Diagonal Cornerand Stabbing Queries 107 measured. Query performance was then tested as before. The results in Table 12.1 indicate that the buffer R*-tree and the Hilbert R-tree achieve a similar degree of packing, but the buffer R*-tree provides better update and query performance. 12.3 Bootstrapping for 2-D Diagonal Corner and Stabbing Queries An obvious paradigm for developing an efficient dynamic EM data structure, given an existing data structure that works well when the problem fits into internal memory, is to “externalize” the internal memory data structure. If the internal memory data structure uses a binary tree, then a multiway tree such as a B-tree must be used instead. However, when searching a B-tree, it can be difficult to report all the items in the answer to the query in an output-sensitive manner. For example, in certain searching applications, each of the Θ(B) subtrees of a given node in a B-tree may contribute one item to the query answer, and as a result each subtree may need to be explored (costing several I/Os) just to report a single item of the answer. Fortunately, we can sometimes achieve output-sensitive reporting by augmenting the data structure with a set of filtering substructures, each of which is a data structure for a smaller version of the same problem. We refer to this approach, which we explain shortly in more detail, as the bootstrapping paradigm. Each substructure typically needs to store only O(B 2 ) items and to answer queries in O(log B B 2 + Z ′ /B) = O � ⌈Z ′ /B⌉ � I/Os, where Z ′ is the number of items reported. A substructure can even be static if it can be constructed in O(B) I/Os, since we can keep updates in a separate buffer and do a global rebuilding in O(B) I/Os whenever there are Θ(B) updates. Such a rebuilding costs O(1) I/Os (amortized) per update. We can often remove the amortization and make it worst-case using the weight-balanced B-trees of Section 11.2 as the underlying B-tree structure. Arge and Vitter [58] first uncovered the bootstrapping paradigm while designing an optimal dynamic EM data structure for diagonal corner two-sided 2-D queries (see Figure 12.1(a)) that meets all three design criteria listed in Chapter 12. Diagonal corner two-sided
108 Spatial Data Structures and Range Search queries are equivalent to stabbing queries, which have the following form: “Given a set of one-dimensional intervals, report all the intervals ‘stabbed’ by the query value x.” (That is, report all intervals that contain x.) A diagonal corner query x on a set of 2-D points � (a1,b2), (a2,b2), ... � is equivalent to a stabbing query x on the set of closed intervals � [a1,b2], [a2,b2], ... � . The EM data structure for stabbing queries is a multiway version of the well-known interval tree data structure [144, 145] for internal memory, which supports stabbing queries in O(logN + Z) CPU time and updates in O(logN) CPU time and uses O(N) space. We can externalize it by using a weight-balanced B-tree as the underlying base tree, where the nodes have degree Θ( √ B ). Each node in the base tree corresponds in a natural way to a one-dimensional range of xvalues; its Θ( √ B ) children correspond to subranges called slabs, and the Θ( √ B 2 )=Θ(B) contiguous sets of slabs are called multislabs, asin Section 8.1 for a similar batched problem. Each interval in the problem instance is stored in the lowest node v in the base tree whose range completely contains the interval. The interval is decomposed by v’s Θ( √ B ) slabs into at most three pieces: the middle piece that completely spans one or more slabs of v, the left end piece that partially protrudes into a slab of v, and the right end piece that partially protrudes into another slab of v, as shown in Figure 12.3. The three pieces slab 1 slab 2 slab 3 slab 4 slab 5 slab 6 slab 7 one-dimensional list of one-dimensional list of left end pieces ending in slab 2 right end pieces ending in slab 6 slab 8 Fig. 12.3 Internal node v of the EM priority search tree, for B = 64 with √ B = 8 slabs. Node v is the lowest node in the tree completely containing the indicated interval. The middle piece of the interval is stored in the multislab list corresponding to slabs 3–5. (The multislab lists are not pictured.) The left and right end pieces of the interval are stored in the left-ordered list of slab 2 and the right-ordered list of slab 6, respectively.
Page 1 and 2:
TCSv2n4.qxd 4/24/2008 11:56 AM Page
Page 4 and 5:
Algorithms and Data Structures for
Page 6 and 7:
Foundations and Trends R○ in Theo
Page 8 and 9:
Foundations and Trends R○ in Theo
Page 10 and 11:
Preface I first became fascinated a
Page 12:
Preface xi I would like to thank my
Page 15 and 16:
xiv Contents 6 Lower Bounds on I/O
Page 18 and 19:
1 Introduction The world is drownin
Page 20 and 21:
However, not all memory references
Page 22 and 23:
1.1 Overview 5 Our general goal is
Page 24:
Table 1.1 Paradigms for I/O efficie
Page 27 and 28:
10 Parallel Disk Model (PDM) spindl
Page 29 and 30:
12 Parallel Disk Model (PDM) D = nu
Page 31 and 32:
14 Parallel Disk Model (PDM) The pr
Page 33 and 34:
16 Parallel Disk Model (PDM) 2.3 Re
Page 35 and 36:
18 Parallel Disk Model (PDM) archit
Page 38 and 39:
3 Fundamental I/O Operations and Bo
Page 40:
As Table 3.1 indicates, the multipl
Page 43 and 44:
26 Exploiting Locality and Load Bal
Page 45 and 46:
28 Exploiting Locality and Load Bal
Page 47 and 48:
30 External Sorting and Related Pro
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 74 and 75: 6 Lower Bounds on I/O In this chapt
Page 76 and 77: 6.1 Permuting 59 in backwards order
Page 78 and 79: 6.2 Lower Bounds for Sorting and Ot
Page 80: 6.2 Lower Bounds for Sorting and Ot
Page 83 and 84: 66 Matrix and Grid Computations the
Page 86 and 87: 8 Batched Problems in Computational
Page 88 and 89: 8.1 Distribution Sweep 71 Goodrich
Page 90 and 91: 8.1 Distribution Sweep 73 the curre
Page 92 and 93: Time (seconds) Time (seconds) 7000
Page 94 and 95: 9 Batched Problems on Graphs The pr
Page 96 and 97: Table 9.2 Best known I/O bounds for
Page 98 and 99: 9.2 Special Cases 81 that contains
Page 100 and 101: 10 External Hashing for Online Dict
Page 102 and 103: 2 000 000 001 010 011 100 101 110 1
Page 104 and 105: 10.2 Directoryless Methods 87 a tot
Page 106 and 107: 11 Multiway Tree Data Structures In
Page 108 and 109: 11.1 B-trees and Variants 91 “sha
Page 110 and 111: 11.3 Parent Pointers and Level-Bala
Page 112 and 113: 11.4 Buffer Trees 95 I/Os by follow
Page 114: 11.4 Buffer Trees 97 between update
Page 117 and 118: 100 Spatial Data Structures and Ran
Page 123: 106 Spatial Data Structures and Ran
Page 136 and 137: 13 Dynamic and Kinetic Data Structu
Page 138 and 139: 13.2 Continuously Moving Items 121
Page 140 and 141: 14 String Processing In this chapte
Page 142 and 143: 5 a b a 3 c a 0 b a 4 b 6 6 a b a b
Page 144 and 145: 14.3 Suffix Trees and Suffix Arrays
Page 146 and 147: 15 Compressed Data Structures The f
Page 148 and 149: 15.1 Data Representations and Compr
Page 150 and 151: 15.2 External Memory Compressed Dat
Page 152 and 153: 15.2 External Memory Compressed Dat
Page 154: 15.2 External Memory Compressed Dat
Page 157 and 158: 140 Dynamic Memory Allocation For s
Page 159 and 160: 142 External Memory Programming Env
Page 161 and 162: 144 External Memory Programming Env
Page 163 and 164: 146 Conclusions intriguing connecti
Page 165 and 166: 148 Notations and Acronyms lowercas
Page 167 and 168: 150 Notations and Acronyms � �
Page 169 and 170: 152 References Proceedings of the A
Page 171 and 172: 154 References [41] L. Arge, K. H.
Page 173 and 174: 156 References Computer Science, pp
Page 175 and 176:
158 References ACM Conference on In
Page 177 and 178:
160 References [132] F. K. H. A. De
Page 179 and 180:
162 References [165] R. W. Floyd,
Page 181 and 182:
164 References [195] M. R. Henzinge
Page 183 and 184:
166 References [228] K. Küspert,
Page 185 and 186:
168 References [261] D. R. Morrison
Page 187 and 188:
170 References [293] J. T. Robinson
Page 189 and 190:
172 References [325] “TerraServer
Page 191:
174 References [356] O. Wolfson, P.
show all

Algorithms and Data Structures for External Memory

Create successful ePaper yourself

Delete template?

Save as template?