Algorithms and Data Structures for External Memory
Algorithms and Data Structures for External Memory
Algorithms and Data Structures for External Memory
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
8.1 Distribution Sweep 73<br />
the current sweep line; vertical segments that are found to be no longer<br />
active are deleted from the slabs.) The remaining two end portions of h<br />
(which “stick out” past a slab boundary) are passed recursively to the<br />
next level of recursion, along with the vertical segments. The downward<br />
sweep then continues. After an initial one-time sorting (to order<br />
the segments with respect to the y-dimension), the sweep at each of the<br />
O(logm n) levels of recursion requires O(n) I/Os, yielding the desired<br />
bound (8.1). Some timing experiments on distribution sweeping appear<br />
in [104]. Arge et al. [48] develop a unified approach to distribution sweep<br />
in higher dimensions.<br />
A central operation in spatial databases is spatial join. A common<br />
preprocessing step is to find the pairwise intersections of the bounding<br />
boxes of the objects involved in the spatial join. The problem of<br />
intersecting orthogonal rectangles can be solved by combining the previous<br />
sweep line algorithm <strong>for</strong> orthogonal segments with one <strong>for</strong> range<br />
searching. Arge et al. [48] take a more unified approach using distribution<br />
sweep, which is extendible to higher dimensions: The active<br />
objects that are stored in the data structure in this case are rectangles,<br />
not vertical segments. The authors choose the branching factor to be<br />
Θ( √ m). Each rectangle is associated with the largest contiguous range<br />
of vertical slabs that it spans. Each of the possible Θ ��√ �� m<br />
2 =Θ(m)<br />
contiguous ranges of slabs is called a multislab. The reason why the<br />
authors choose the branching factor to be Θ( √ m) rather than Θ(m)<br />
is so that the number of multislabs is Θ(m), <strong>and</strong> thus there is room in<br />
internal memory <strong>for</strong> a buffer <strong>for</strong> each multislab. The height of the tree<br />
remains O(logm n).<br />
The algorithm proceeds by sweeping a horizontal line from top to<br />
bottom to process the N rectangles. When the sweep line first encounters<br />
a rectangle R, we consider the multislab lists <strong>for</strong> all the multislabs<br />
that R intersects. We report all the active rectangles in those multislab<br />
lists, since they are guaranteed to intersect R. (Rectangles no longer<br />
active are discarded from the lists.) We then extract the left <strong>and</strong> right<br />
end portions of R that partially “stick out” past slab boundaries, <strong>and</strong><br />
we pass them down to process in the next lower level of recursion. We<br />
insert the remaining portion of R, which spans complete slabs, into the<br />
list <strong>for</strong> the appropriate multislab. The downward sweep then continues.