29.08.2013 Views

Temporal and Spatial Databases Chapter 10: Spatial Indexing

Temporal and Spatial Databases Chapter 10: Spatial Indexing

Temporal and Spatial Databases Chapter 10: Spatial Indexing

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

◮ <strong>Spatial</strong> indexes<br />

<strong>Temporal</strong> <strong>and</strong> <strong>Spatial</strong> <strong>Databases</strong><br />

<strong>Chapter</strong> <strong>10</strong>: <strong>Spatial</strong> <strong>Indexing</strong><br />

J. Gamper<br />

◮ 1-D embedding of grid approximation<br />

◮ <strong>Spatial</strong> index structures for points<br />

◮ <strong>Spatial</strong> index structures for rectangles<br />

◮ <strong>Spatial</strong> join<br />

Literature<br />

◮ R.H. Güting: An introduction to spatial database systems. VLDB Journal<br />

3:357–399 (1994)<br />

◮ R.H. Güting: <strong>Spatial</strong> database systems. Tutorial notes.<br />

◮ Some slides are adapted from the slides by Jrg S<strong>and</strong>ers (Univ. of Alberta).<br />

TSDB 2012/13 J. Gamper 1/27


<strong>Spatial</strong> <strong>Indexing</strong>/1<br />

◮ Conventional index structures such as B-trees are not designed to support<br />

spatial queries<br />

◮ Group objects only along one dimension<br />

◮ Do not preserve spatial proximity<br />

◮ Example: NN Query – Nearest neighbor of Q is typically not the nearest<br />

neighbor in any dimension.<br />

TSDB 2012/13 J. Gamper 2/27


<strong>Spatial</strong> <strong>Indexing</strong>/2<br />

◮ <strong>Spatial</strong> index structures try to preserve spatial proximity<br />

◮ Group objects that are close to each other in space on the same data page<br />

◮ Problem: the number of bytes to store extended spatial objects (lines,<br />

polygons) varies<br />

◮ Solution.<br />

◮ Store approximations of spatial objects in the index structure, typically<br />

axis-parallel minimum bounding rectangles (MBR)<br />

◮ Exact object representation (ER) is stored separately; points to ER in the<br />

index<br />

TSDB 2012/13 J. Gamper 3/27


<strong>Spatial</strong> <strong>Indexing</strong>/3<br />

◮ A fundamental idea of spatial indexing is the use of approximations<br />

◮ Two types of approximations<br />

◮ Continuous approximation, e.g., a bounding box<br />

◮ Grid approximation<br />

◮ The use of approximation leads to a filter <strong>and</strong> refine strategy for query<br />

processing.<br />

TSDB 2012/13 J. Gamper 4/27


<strong>Spatial</strong> <strong>Indexing</strong>/4<br />

Filter <strong>and</strong> refine strategy<br />

1. Filter step:<br />

◮ Use index to find all approximations that satisfy the query<br />

◮ Some objects already satisfy the query based on the approximation, others<br />

have to be checked in the refine step<br />

◮ Returns a set of c<strong>and</strong>idate objects, which is a superset of the objects<br />

fulfilling a predicate<br />

2. Refine step:<br />

◮ Load the exact object representations for the c<strong>and</strong>idates<br />

◮ Test whether the c<strong>and</strong>idates satisfy the query<br />

TSDB 2012/13 J. Gamper 5/27


<strong>Spatial</strong> <strong>Indexing</strong>/5<br />

◮ Mainly used to support spatial selection<br />

◮ but supports also other operations, e.g., spatial join or finding the closest<br />

object<br />

◮ A spatial index organizes space <strong>and</strong> the objects in it in some way so that<br />

only parts of the space <strong>and</strong> a subset of the objects need to be considered to<br />

answer a query<br />

◮ Two main approaches:<br />

◮ Map spatial objects to a 1-D space <strong>and</strong> utilize st<strong>and</strong>ard indexing techniques,<br />

e.g., Z-order + B-tree<br />

◮ Dedicated spatial index data structures<br />

◮ Data organizing, e.g., R-tree<br />

◮ Space organizing, e.g., Quad-tree<br />

TSDB 2012/13 J. Gamper 6/27


<strong>Spatial</strong> <strong>Indexing</strong>/6<br />

◮ Most spatial data structures are designed to either store points (for point<br />

values) or rectangles (for line <strong>and</strong> region values)<br />

◮ Operations on those structures: insert, delete, check membership<br />

◮ Typical query types<br />

◮ for points:<br />

◮ Range query: all points within a query rectangle<br />

◮ Nearest neighbor: point closest to a query point<br />

◮ Distance scan: enumerate points in increasing distance from a query point<br />

◮ for rectangles:<br />

◮ Intersection query<br />

◮ Containment query<br />

TSDB 2012/13 J. Gamper 7/27


1-D Embedding of Grid Approximation/1<br />

◮ Basic idea of 1-D embedding of grid approximation<br />

1. The data space is partitioned into rectangular cells (a grid)<br />

2. Find a linear order for the cells of the grid such that cells close together in<br />

space are also close to each other in the linear order; assign a number to<br />

each cell<br />

◮ The order should maintain locality/proximity<br />

◮ The order should be easily to compute<br />

◮ Space filling curves are used for that<br />

3. Define this order recursively for a grid that is obtained by a hierarchical<br />

subdivision of space<br />

4. Objects are approximated by cells<br />

5. Store the cell numbers for objects in a conventional index structure with<br />

respect to the linear order<br />

TSDB 2012/13 J. Gamper 8/27


1-D Embedding of Grid Approximation/2<br />

Example: Space filling curves<br />

TSDB 2012/13 J. Gamper 9/27


1-D Embedding of Grid Approximation/3<br />

◮ Z-Order is the most popular such order (Morton 1966, Orenstein <strong>and</strong><br />

Manola, 1988)<br />

◮ Also termed Morton order or bit-interleaving<br />

◮ Each cell at each level of the hierarchy has an associated bit string whose<br />

length corresponds to the level to which the cell belongs.<br />

◮ e.g., the top-right cell in the left diagram has bit string 11, on the right-side<br />

cell 11<strong>10</strong> is shown.<br />

◮ The bit-string 11<strong>10</strong> is obtained by choosing 11 at the top level, <strong>and</strong> then <strong>10</strong><br />

within the top level quadrant.<br />

◮ The order which is imposed on all cells of a hierarchical subdivision is given<br />

by the lexicographical order of the bit strings.<br />

TSDB 2012/13 J. Gamper <strong>10</strong>/27


1-D Embedding of Grid Approximation/4<br />

◮ Any shape (approximated as a set of cells) over the grid can now be<br />

decomposed into a minimal number of cells at different levels (always<br />

using the highest possible level).<br />

◮ It can therefore be represented by a set of bit strings, called z-elements<br />

◮ For a spatial object, the corresponding set of z-elements builds a set of<br />

spatial keys<br />

◮ <strong>Spatial</strong> index: Put z-elements as spatial keys in lexicographical order into<br />

a B-tree.<br />

◮ Due to the proximity-preserving property various types of queries can be<br />

answered relatively efficiently, e.g., containment or range query with<br />

rectangle r<br />

◮ determine z-elements of r<br />

◮ for each z-element z scan a part of the leaf sequence of the B-tree having z<br />

as prefix.<br />

◮ Check these c<strong>and</strong>idates for actual containment.<br />

TSDB 2012/13 J. Gamper 11/27


1-D Embedding of Grid Approximation/5<br />

Example: Mapping 1D-embedding to a B+-tree<br />

◮ Key values (c,l) in the nodes represent the decimal representation of the<br />

cell number <strong>and</strong> the level.<br />

TSDB 2012/13 J. Gamper 12/27


<strong>Spatial</strong> Index Structures<br />

◮ A (dedicated) spatial index structure organizes objects into buckets<br />

◮ Each bucket has an associated bucket region, a part of space containing<br />

all objects stored in that bucket.<br />

◮ For point data structures, the regions are disjoint<br />

◮ the space is partitioned <strong>and</strong> each point belongs to precisely one bucket<br />

◮ e.g., a kd-tree paritioning of 2d-space where each bucket can hold up to 3<br />

points<br />

◮ For rectangle data structures the bucket regions may overlap<br />

TSDB 2012/13 J. Gamper 13/27


<strong>Spatial</strong> Index Structures for Points/1<br />

◮ <strong>Spatial</strong> index structures for points<br />

◮ Data structures of representing points in k dimensions (multi-attribute)<br />

have a long tradition, e.g., a tuple t = (x1,...,xk)<br />

◮ Can be used to store geometrical points<br />

◮ GRID index: <strong>Spatial</strong> index structure for points (Nievergelt, Hinterberger,<br />

<strong>and</strong> Sevcik 84)<br />

◮ The following example partitions the data space into cells by an irregular grid<br />

◮ The directory is a k-dimensional array whose entries are logical pointers to<br />

buckets.<br />

◮ All points in a cell are stored in the bucket pointed to by the correpsonding<br />

directory entry.<br />

◮ The scales are small <strong>and</strong> are kept in main memory; the directory is on the<br />

disk.<br />

TSDB 2012/13 J. Gamper 14/27


<strong>Spatial</strong> Index Structures for Points/2<br />

◮ kd-Tree (Bentley 75)<br />

◮ Binary tree where each internal node contains a key drawn from one of the<br />

k dimensions<br />

◮ The key in the root node (level 0) divides the data space with respect to<br />

dimension 0, the keys in its sons (level 1) divide the two subspaces with<br />

repsect to dimension 1, <strong>and</strong> so forth, up to dimension k −1, after which<br />

cycling through the dimensions restarts.<br />

◮ Leaves contain the points to be stored<br />

◮ KDB-tree (Robinson 81): introduce buckets, paginate the binary tree, all<br />

leaves at the same level (like B-tree)<br />

◮ LSD-tree (Henrich et al. 89): ab<strong>and</strong>on strict cycling through dimensions;<br />

clever paging algorithm keeps external path length balanced even for very<br />

unbalanced binary trees.<br />

TSDB 2012/13 J. Gamper 15/27


<strong>Spatial</strong> Index Structures for Points/3<br />

◮ Quad-Tree<br />

◮ Class of spatial index structures which divide the data space recursively into<br />

4 quadrants (NW, NE, SW, SE)<br />

TSDB 2012/13 J. Gamper 16/27


<strong>Spatial</strong> Index Structures for Points/4<br />

◮ Quad-Tree (contd.)<br />

◮ Different algorithms for quad-trees for processing points, lines, plygons (i.e.,<br />

different node types, construction <strong>and</strong> query algorithms)<br />

◮ Frequently used in commercial GIS especially for compressing, storing <strong>and</strong><br />

manipulating of raster images<br />

TSDB 2012/13 J. Gamper 17/27


<strong>Spatial</strong> Index Structures for Rectangles/1<br />

◮ <strong>Spatial</strong> index structures for rectangles<br />

◮ Unlike points, rectangles do not fall into a unique cell of a partition <strong>and</strong><br />

might intersect partition boundaries<br />

◮ Three main approaches:<br />

◮ Transformation approach<br />

◮ Overlapping bucket regions<br />

◮ Clipping<br />

TSDB 2012/13 J. Gamper 18/27


<strong>Spatial</strong> Index Structures for Rectangles/2<br />

◮ Transformation approach<br />

◮ k-dimensional rectangles are transformed into 2k-dimensional points, <strong>and</strong> a<br />

point data structure is used.<br />

◮ Rectangle (xl,xr,yb,yt) can be viewed as a point in 4-D space<br />

◮ Example: Interval i = (i1,i2) is mapped into a point (x,y) in 2-D space<br />

◮ An intersection query with an interval q = (q1,q2) translates to a condition:<br />

Find all points (x ′ ,y ′ ) s.t. x ′ < q2 <strong>and</strong> y ′ > q1.<br />

◮ All intervals instersecting q are in the shaded area<br />

TSDB 2012/13 J. Gamper 19/27


<strong>Spatial</strong> Index Structures for Rectangles/3<br />

◮ Overlapping bucket regions<br />

◮ Partitioning space is ab<strong>and</strong>oned <strong>and</strong> bucket regions may overlap, e.g.,<br />

R-tree (Guttmann 84)<br />

◮ Advantage: <strong>Spatial</strong> object (or key) is in a single bucket<br />

◮ Disadvantage: Multiple search paths due to overlapping bucket regions<br />

TSDB 2012/13 J. Gamper 20/27


<strong>Spatial</strong> Index Structures for Rectangles/4<br />

◮ Clipping<br />

◮ Bucket regions are disjoint, but data rectangles are cut into several pieces (if<br />

necessary), e.g., R + -tree (Sellis, Rossopoulos <strong>and</strong> Faloutsos 87)<br />

◮ Advantage: Less branching in search<br />

◮ Disadvantage: Multiple entries for a single spatial object<br />

TSDB 2012/13 J. Gamper 21/27


Basic <strong>Spatial</strong> Queries<br />

◮ Containment Query: Given a<br />

spatial object R, find all objects<br />

that completely contain R. If R is<br />

a point, then it is a point query.<br />

◮ Region Query: Given a region R<br />

(polygon or circle), find all spatial<br />

objects that intersect with R. If R<br />

is a rectanlge, then it is a window<br />

query.<br />

◮ Enclosure Query: Given a plygon<br />

region R, find all objects that are<br />

completely contained in R.<br />

◮ K-nearest neighbor Query:<br />

Given an object P, find the k4<br />

objects that are closest to P<br />

(typically for points)<br />

TSDB 2012/13 J. Gamper 22/27


<strong>Spatial</strong> Join/1<br />

◮ Given two sets of spatial objects (typically minimum bounding rectangles)<br />

S1 = {R1,...,Rm},S2 = {R ′ 1,...,R ′ n}<br />

◮ Determine for S1 <strong>and</strong> S2 all object pairs that are in a relationship described<br />

by a spatial predicate (typically intersection, but other predicates are also<br />

possible)<br />

TSDB 2012/13 J. Gamper 23/27


<strong>Spatial</strong> Join/2<br />

◮ Very active research area in the last few years<br />

◮ Traditional join methods such as hash join or sort/merge join are not<br />

applicable<br />

◮ Filtering Cartesian product is expensive<br />

◮ Central ideas<br />

◮ filter + refine<br />

◮ use of spatial index structures<br />

◮ Classification of strategies<br />

◮ Grid approximation/bounding box<br />

◮ None/one/both oper<strong>and</strong>s are represented in a spatial index structure<br />

TSDB 2012/13 J. Gamper 24/27


<strong>Spatial</strong> Join/3<br />

◮ Grid approximations with<br />

an overlap predicate<br />

◮ A parallel scan of two<br />

sets of z-elements<br />

corresponding to two<br />

sets of spatial objects is<br />

performed<br />

◮ Similar to a merge join<br />

TSDB 2012/13 J. Gamper 25/27


<strong>Spatial</strong> Join/4<br />

◮ Bounding box approximation: For two sets of rectangles R <strong>and</strong> S all<br />

pairs (r,s), r ∈ R <strong>and</strong> s ∈ S such that r intersects s:<br />

◮ No spatial index on R <strong>and</strong> S: bb join algorithm uses a computational<br />

geometry algorithm to detect rectangle intersection, similar to external<br />

merge sorting<br />

◮ <strong>Spatial</strong> index on either R or S: index join scans the non-indexed oper<strong>and</strong><br />

<strong>and</strong> for each object, the bounding box of its SDT attribute is used as a<br />

search argument on the indexed oper<strong>and</strong> (only efficient if non-indexed<br />

oper<strong>and</strong> is not too big)<br />

◮ Both R <strong>and</strong> S are indexed: synchronized traversal of both structures so that<br />

pairs of cells of their repsective partitions covering the same part of space<br />

are encountered together.<br />

TSDB 2012/13 J. Gamper 26/27


Summary<br />

◮ <strong>Spatial</strong> indexes are a crucial part of any database systems that supports<br />

geographical information.<br />

◮ <strong>Spatial</strong> indexing techniques are necessary to efficiently answer queries.<br />

◮ Mapping to lower dimensional space, grid file, kd tree, family of R-tree<br />

indexes<br />

TSDB 2012/13 J. Gamper 27/27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!