17.01.2013 Views

Algorithms and Data Structures for External Memory

Algorithms and Data Structures for External Memory

Algorithms and Data Structures for External Memory

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

104 Spatial <strong>Data</strong> <strong>Structures</strong> <strong>and</strong> Range Search<br />

The rebalancing heuristics per<strong>for</strong>m well in many practical scenarios,<br />

especially in low dimensions, but they result in poor worst-case<br />

query bounds. An interesting open problem is whether nontrivial query<br />

bounds can be proven <strong>for</strong> the “typical-case” behavior of R-trees <strong>for</strong><br />

problems such as range searching <strong>and</strong> point location. Similar questions<br />

apply to the methods discussed in Section 12.1. New R-tree partitioning<br />

methods by de Berg et al. [128], Agarwal et al. [17], <strong>and</strong> Arge et al. [38]<br />

provide some provable bounds on overlap <strong>and</strong> query per<strong>for</strong>mance.<br />

In the static setting, in which there are no updates, constructing the<br />

R*-tree by repeated insertions, one by one, is extremely slow. A faster<br />

alternative to the dynamic R-tree construction algorithms mentioned<br />

above is to bulk-load the R-tree in a bottom-up fashion [1, 206, 276].<br />

Such methods use some heuristic <strong>for</strong> grouping the items into leaf nodes<br />

of the R-tree, <strong>and</strong> then recursively build the nonleaf nodes from bottom<br />

to top. As an example, in the so-called Hilbert R-tree of Kamel<br />

<strong>and</strong> Faloutsos [206], each item is labeled with the position of its centroid<br />

on the Peano-Hilbert space-filling curve, <strong>and</strong> a B + -tree is built<br />

upon the totally ordered labels in a bottom-up manner. Bulk loading<br />

a Hilbert R-tree is there<strong>for</strong>e easy to do once the centroid points<br />

are presorted. These static construction methods algorithms are very<br />

different in spirit from the dynamic insertion methods: The dynamic<br />

methods explicitly try to reduce the coverage, overlap, or perimeter of<br />

the bounding boxes of the R-tree nodes, <strong>and</strong> as a result, they usually<br />

achieve good query per<strong>for</strong>mance. The static construction methods do<br />

not consider the bounding box in<strong>for</strong>mation at all. Instead, the hope<br />

is that the improved storage utilization (up to 100%) of these packing<br />

methods compensates <strong>for</strong> a higher degree of node overlap. A dynamic<br />

insertion method related to [206] was presented in [207]. The quality<br />

of the Hilbert R-tree in terms of query per<strong>for</strong>mance is generally<br />

not as good as that of an R*-tree, especially <strong>for</strong> higher-dimensional<br />

data [84, 208].<br />

In order to get the best of both worlds — the query per<strong>for</strong>mance<br />

of R*-trees <strong>and</strong> the bulk construction efficiency of Hilbert R-trees —<br />

Arge et al. [41] <strong>and</strong> van den Bercken et al. [333] independently devised<br />

fast bulk loading methods based upon buffer trees that do top-down<br />

construction in O(nlog m n) I/Os, which matches the per<strong>for</strong>mance of

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!