Algorithms and Data Structures for External Memory
Algorithms and Data Structures for External Memory
Algorithms and Data Structures for External Memory
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
104 Spatial <strong>Data</strong> <strong>Structures</strong> <strong>and</strong> Range Search<br />
The rebalancing heuristics per<strong>for</strong>m well in many practical scenarios,<br />
especially in low dimensions, but they result in poor worst-case<br />
query bounds. An interesting open problem is whether nontrivial query<br />
bounds can be proven <strong>for</strong> the “typical-case” behavior of R-trees <strong>for</strong><br />
problems such as range searching <strong>and</strong> point location. Similar questions<br />
apply to the methods discussed in Section 12.1. New R-tree partitioning<br />
methods by de Berg et al. [128], Agarwal et al. [17], <strong>and</strong> Arge et al. [38]<br />
provide some provable bounds on overlap <strong>and</strong> query per<strong>for</strong>mance.<br />
In the static setting, in which there are no updates, constructing the<br />
R*-tree by repeated insertions, one by one, is extremely slow. A faster<br />
alternative to the dynamic R-tree construction algorithms mentioned<br />
above is to bulk-load the R-tree in a bottom-up fashion [1, 206, 276].<br />
Such methods use some heuristic <strong>for</strong> grouping the items into leaf nodes<br />
of the R-tree, <strong>and</strong> then recursively build the nonleaf nodes from bottom<br />
to top. As an example, in the so-called Hilbert R-tree of Kamel<br />
<strong>and</strong> Faloutsos [206], each item is labeled with the position of its centroid<br />
on the Peano-Hilbert space-filling curve, <strong>and</strong> a B + -tree is built<br />
upon the totally ordered labels in a bottom-up manner. Bulk loading<br />
a Hilbert R-tree is there<strong>for</strong>e easy to do once the centroid points<br />
are presorted. These static construction methods algorithms are very<br />
different in spirit from the dynamic insertion methods: The dynamic<br />
methods explicitly try to reduce the coverage, overlap, or perimeter of<br />
the bounding boxes of the R-tree nodes, <strong>and</strong> as a result, they usually<br />
achieve good query per<strong>for</strong>mance. The static construction methods do<br />
not consider the bounding box in<strong>for</strong>mation at all. Instead, the hope<br />
is that the improved storage utilization (up to 100%) of these packing<br />
methods compensates <strong>for</strong> a higher degree of node overlap. A dynamic<br />
insertion method related to [206] was presented in [207]. The quality<br />
of the Hilbert R-tree in terms of query per<strong>for</strong>mance is generally<br />
not as good as that of an R*-tree, especially <strong>for</strong> higher-dimensional<br />
data [84, 208].<br />
In order to get the best of both worlds — the query per<strong>for</strong>mance<br />
of R*-trees <strong>and</strong> the bulk construction efficiency of Hilbert R-trees —<br />
Arge et al. [41] <strong>and</strong> van den Bercken et al. [333] independently devised<br />
fast bulk loading methods based upon buffer trees that do top-down<br />
construction in O(nlog m n) I/Os, which matches the per<strong>for</strong>mance of