**BTrees** & **Bitmap** **Indexes**

14.7

DATABASE SYSTEMS – The Complete

Book

Presented By: Under the

supervision of:

Deepti Kundu Dr. T.Y.Lin

**Indexes** ►►

**Bitmap**

Definition

A bitmap index for a field F is a collection

of bit-vectors of length n, one for each

possible value that may appear in that field

F.[1]

What does that mean?

� Assume relation R

with

– 2 attributes A and B.

– Attribute A is of type

Integer and B is of

type String.

– 6 records, numbered 1

through 6 as shown.

1

2

3

4

5

6

A

30

30

40

50

40

30

B

foo

bar

baz

foo

bar

baz

Example Continued…

� A bitmap for attribute B is:

Value

foo

bar

baz

Vector

100100

010010

001001

1

2

3

4

5

6

A

30

30

40

50

40

30

B

foo

bar

baz

foo

bar

baz

Where do we reach?

� A bitmap index is a special kind of database index

that uses bitmaps.[2]

� **Bitmap** indexes have traditionally been

considered to work well for data such as gender,

which has a small number of distinct values, e.g.,

male and female, but many occurrences of those

values.[2]

A little more…

� A bitmap index for attribute A of relation R is:

– A collection of bit-vectors

– The number of bit-vectors = the number of distinct

values of A in R.

– The length of each bit-vector = the cardinality of R.

– The bit-vector for value v has 1 in position i, if the ith

record has v in attribute A, and it has 0 there if not.[3]

� Records are allocated permanent numbers.[3]

� There is a mapping between record numbers and record

addresses.[3]

Motivation for **Bitmap** **Indexes**

� Very efficient when used for partial match queries.[3]

� They offer the advantage of buckets [2]

– Where we find tuples with several specified attributes

without first retrieving all the record that matched in

each of the attributes.

� They can also help answer range queries [3]

Another Example

Multidimensional Array of multiple types

{(5,d),(79,t),(4,d),(79,d),(5,t),(6,a)}

5 = 100010

79 = 010100

4 = 001000

6 = 000001

d = 101100

t = 010010

a = 000001

Example Continued…

{(5,d),(79,t),(4,d),(79,d),(5,t),(6,a)}

Searching for items is easy, just AND together.

To search for (5,d)

5 = 100010

d = 101100

100010 AND 101100 = 100000

The location of the

record has been traced!

Compressed **Bitmap**s

� Assume:

– The number of records in R are n

– Attribute A has m distinct values in R

� The size of a bitmap index on attribute A is m*n.

� If m is large, then the number of 1’s will be around 1/m.

– Opportunity to encode

� A common encoding approach is called run-length

encoding.[1]

Run-length encoding

� Represents runs

– A run is a sequence of i 0’s followed by a 1, by some suitable binary

encoding of the integer i.

� A run of i 0’s followed by a 1 is encoded by:

– First computing how many bits are needed to represent i, Say k

– Then represent the run by k-1 1’s and a single 0 followed by k bits

which represent i in binary.

– The encoding for i = 1 is 01. k = 1

– The encoding for i = 0 is 00. k = 1

� We concatenate the codes for each run together, and the sequence of bits is

the encoding of the entire bit-vector

Understanding with an Example

� Let us decode the sequence 11101101001011

� Staring at the beginning (left most bit):

– First run: The first 0 is at position 4, so k = 4. The next 4 bits are

1101, so we know that the first integer is i = 13

– Second run: 001011

� k = 1

� i = 0

– Last run: 1011

� k = 1

� i = 3

� Our entire run length is thus 13,0,3, hence our bit-vector is:

0000000000000110001

Managing **Bitmap** **Indexes**

1) How do you find a specific bit-vector for a

value efficiently?

2) After selecting results that match, how do you

retrieve the results efficiently?

3) When data is changed, do you you alter bitmap

index?

1) Finding bit vectors

– Think of each bit-vector as a key to a value.[1]

– Any secondary storage technique will be

efficient in retrieving the values.[1]

– Create secondary key with the attribute value

as a search key [3]

�Btree

�Hash

2) Finding Records

� Create secondary key with the record number as a search

key [3]

� Or in other words,

– Once you learn that you need record k, you can create

a secondary index using the kth position as a search

key.[1]

3) Handling Modifications

Two things to remember:

Record numbers must remain fixed once assigned

Changes to data file require changes to bitmap ind

Deletion

– Tombstone replaces deleted record

– Corresponding bit is set to 0

Insertion

– Record assigned the next record number.

– A bit of value 0 or 1 is appended to each

bit vector

– If new record contains a new value of the

attribute, add one bit-vector.

Modification

– Change the bit corresponding to the old

value of the modified record to 0

– Change the bit corresponding to the new

value of the modified record to 1

– If the new value is a new value of A, then

insert a new bit-vector.

References

[1] Database Systems : The Complete Book - Hector Garcia-Molina, Jeffrey D. Ullman,

Jennifer D. Widom

[2] http://en.wikipedia.org/wiki/**Bitmap**_index#Example

[3] faculty.kfupm.edu.sa/ICS/adam/ICS541/L10-md-bitmap-indexing.ppt

[4]

http://csis.bits-pilani.ac.in/faculty/goel/Data%20Warehousing/Lecture%20Notes/Lecture%20%23

(- a good doc file to read the concepts of bitmap indexes)