15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

R-trees<br />

The R-tree of Guttman [104] and its many variants are a practical multidimensional generalization of<br />

the B-tree for storing a variety of geometric objects, such as points, segments, polygons, and polyhedra,<br />

using linear disk space. Internal nodes have degree Θ(B) (except possibly the root), and leaves store Θ(B)<br />

items. Each node in the tree has associated with it a bounding box (or bounding polygon) of all the items<br />

in its subtree. A big difference between R-trees and B-trees is that in R-trees the bounding boxes of sibling<br />

nodes are allowed to overlap. If an R-tree is being used for point location, for example, a point may lie<br />

within the bounding box of several children of the current node in the search. In that case, the search<br />

must proceed to all such children.<br />

In the dynamic setting, several popular heuristics are used to determine to insert new items into an<br />

R-tree and how to rebalance it; see [10,91,99] for a survey. The R∗-tree variant of Beckmann et al. [42]<br />

seems to give best overall query performance. New R-tree partitioning methods by de Berg et al. [68]<br />

and Agarwal et al. [9] provide some provable bounds on overlap and query performance.<br />

In the static setting, in which there are no updates, constructing the R∗-tree by repeated insertions, one<br />

by one, is extremely slow. A faster alternative to the dynamic R-tree construction algorithms mentioned<br />

above is to bulk-load the R-tree in a bottom-up fashion [1,116,159]. The quality of the bottom-up R-tree<br />

in terms of query performance is generally not as good as that of an R ∗ -tree, especially for higherdimensional<br />

data [45,118].<br />

In order to get the best of both worlds—the query performance of R ∗ -trees and the bulk construction<br />

efficiency of Hilbert R-trees—Arge et al. [22] and van den Bercken et al. [193] independently devised<br />

fast bulk loading methods based upon buffer trees that do top-down construction in O(n log m n) I/Os,<br />

which matches the performance of the bottom-up methods within a constant factor. The former method<br />

is especially efficient and supports dynamic batched updates and queries.<br />

Specialized Structures for 2-D Orthogonal Range Search<br />

Diagonal corner 2-sided queries (see Fig. 32.4(a)) are equivalent to stabbing queries, which have the<br />

following form: “Given a set of 1-D intervals, report all the intervals ‘stabbed’ by the query value x.”<br />

(That is, report all intervals that contain x.) A diagonal corner query x on a set of 2-D points {(a 1, b 2),<br />

(a 2, b 2), …} is equivalent to a stabbing query x on the set of closed intervals {[a 1, b 2], [a 2, b 2], …}. Arge<br />

and Vitter [28,199] introduced a new paradigm we call bootstrapping to support such queries in optimal<br />

I/O bounds and space: The data structure uses O(n) disk blocks, queries use O(log B N + z) I/Os, and<br />

updates take O(log B N) I/Os. In another example of bootstrapping, Arge et al. [25] achieve the same<br />

bounds for 3-sided orthogonal 2-D range searching (see Figure 32.4(c)).<br />

The dynamic data structure for 3-sided range searching can be generalized using the filtering technique<br />

of Chazelle [51] to handle general 4-sided queries with optimal I/O query bound O(log B N + z) and<br />

optimal disk space usage O(n(log n)/log (log B N + 1)) [25]. The update bound becomes O((log B N) (log<br />

n)/log (log B N + 1)), which may not be optimal.<br />

Other Types of Range Search<br />

For other types of range searching, such as in higher dimensions and for nonorthogonal queries, different<br />

filtering techniques are needed. So far, relatively little work has been done, and many open problems<br />

remain.<br />

Vengroff and Vitter [196] develop the first theoretically near-optimal EM data structure for static 3-D<br />

orthogonal range searching. They create a hierarchical partitioning in which all the points that dominate<br />

a query point are densely contained in a set of blocks. Compression techniques are needed to minimize disk<br />

storage. With some recent modifications [204], (3 + k)-sided 3-D range queries, where k of the dimensions<br />

(0 ≤ k ≤ 3) have finite ranges, can be done in O(log B N + z) I/Os, which is optimal, and the space usage<br />

is O(n(log n) k+1 /(log(log B N + 1)) k ). The result also provides optimal O(log N + Z)-time query performance<br />

for 3-sided 3-D queries in the (internal memory) RAM model, but using O(N log N) space.<br />

© 2002 by CRC Press LLC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!