11.07.2015 Views

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

Data Structures and Algorithm Analysis - Computer Science at ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

302 Chap. 9 Searchingintroduced in Section 4.4. However, each has different performance characteristicsth<strong>at</strong> make it the method of choice in particular circumstances.The current chapter considers methods for searching d<strong>at</strong>a stored in lists. List inthis context means any list implement<strong>at</strong>ion including a linked list or an array. Mostof these methods are appropri<strong>at</strong>e for sequences (i.e., duplic<strong>at</strong>e key values are allowed),although special techniques applicable to sets are discussed in Section 9.3.The techniques from the first three sections of this chapter are most appropri<strong>at</strong>e forsearching a collection of records stored in RAM. Section 9.4 discusses hashing,a technique for organizing d<strong>at</strong>a in an array such th<strong>at</strong> the loc<strong>at</strong>ion of each recordwithin the array is a function of its key value. Hashing is appropri<strong>at</strong>e when recordsare stored either in RAM or on disk.Chapter 10 discusses tree-based methods for organizing inform<strong>at</strong>ion on disk,including a commonly used file structure called the B-tree. Nearly all programs th<strong>at</strong>must organize large collections of records stored on disk use some variant of eitherhashing or the B-tree. Hashing is practical for only certain access functions (exactm<strong>at</strong>chqueries) <strong>and</strong> is generally appropri<strong>at</strong>e only when duplic<strong>at</strong>e key values arenot allowed. B-trees are the method of choice for dynamic disk-based applic<strong>at</strong>ionsanytime hashing is not appropri<strong>at</strong>e.9.1 Searching Unsorted <strong>and</strong> Sorted ArraysThe simplest form of search has already been presented in Example 3.1: the sequentialsearch algorithm. Sequential search on an unsorted list requires Θ(n) timein the worst case.How many comparisons does linear search do on average? A major consider<strong>at</strong>ionis whether K is in list L <strong>at</strong> all. We can simplify our analysis by ignoringeverything about the input except the position of K if it is found in L. Thus, we haven + 1 distinct possible events: Th<strong>at</strong> K is in one of positions 0 to n − 1 in L (eachposition having its own probability), or th<strong>at</strong> it is not in L <strong>at</strong> all. We can express theprobability th<strong>at</strong> K is not in L asP(K /∈ L) = 1 −n∑P(K = L[i])where P(x) is the probability of event x.Let p i be the probability th<strong>at</strong> K is in position i of L (indexed from 0 to n − 1.For any position i in the list, we must look <strong>at</strong> i + 1 records to reach it. So we sayth<strong>at</strong> the cost when K is in position i is i + 1. When K is not in L, sequential searchwill require n comparisons. Let p n be the probability th<strong>at</strong> K is not in L. Then theaverage cost T(n) will bei=1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!