17.01.2013 Views

Algorithms and Data Structures for External Memory

Algorithms and Data Structures for External Memory

Algorithms and Data Structures for External Memory

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

8.1 Distribution Sweep 73<br />

the current sweep line; vertical segments that are found to be no longer<br />

active are deleted from the slabs.) The remaining two end portions of h<br />

(which “stick out” past a slab boundary) are passed recursively to the<br />

next level of recursion, along with the vertical segments. The downward<br />

sweep then continues. After an initial one-time sorting (to order<br />

the segments with respect to the y-dimension), the sweep at each of the<br />

O(logm n) levels of recursion requires O(n) I/Os, yielding the desired<br />

bound (8.1). Some timing experiments on distribution sweeping appear<br />

in [104]. Arge et al. [48] develop a unified approach to distribution sweep<br />

in higher dimensions.<br />

A central operation in spatial databases is spatial join. A common<br />

preprocessing step is to find the pairwise intersections of the bounding<br />

boxes of the objects involved in the spatial join. The problem of<br />

intersecting orthogonal rectangles can be solved by combining the previous<br />

sweep line algorithm <strong>for</strong> orthogonal segments with one <strong>for</strong> range<br />

searching. Arge et al. [48] take a more unified approach using distribution<br />

sweep, which is extendible to higher dimensions: The active<br />

objects that are stored in the data structure in this case are rectangles,<br />

not vertical segments. The authors choose the branching factor to be<br />

Θ( √ m). Each rectangle is associated with the largest contiguous range<br />

of vertical slabs that it spans. Each of the possible Θ ��√ �� m<br />

2 =Θ(m)<br />

contiguous ranges of slabs is called a multislab. The reason why the<br />

authors choose the branching factor to be Θ( √ m) rather than Θ(m)<br />

is so that the number of multislabs is Θ(m), <strong>and</strong> thus there is room in<br />

internal memory <strong>for</strong> a buffer <strong>for</strong> each multislab. The height of the tree<br />

remains O(logm n).<br />

The algorithm proceeds by sweeping a horizontal line from top to<br />

bottom to process the N rectangles. When the sweep line first encounters<br />

a rectangle R, we consider the multislab lists <strong>for</strong> all the multislabs<br />

that R intersects. We report all the active rectangles in those multislab<br />

lists, since they are guaranteed to intersect R. (Rectangles no longer<br />

active are discarded from the lists.) We then extract the left <strong>and</strong> right<br />

end portions of R that partially “stick out” past slab boundaries, <strong>and</strong><br />

we pass them down to process in the next lower level of recursion. We<br />

insert the remaining portion of R, which spans complete slabs, into the<br />

list <strong>for</strong> the appropriate multislab. The downward sweep then continues.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!