13.1 through 13.5, 13.10 and 13.11

13.8 Hashing TechniquesOther hashing functions can be used. One technique, called folding, involves applyingan arithmetic function such as addition or a logical function such as exclusive orto different portions of the hash field value to calculate the hash address. Anothertechnique involves picking some digits of the hash field value-for example, thethird, fifth, and eighth digits-to forrn the hash address.l0 The problem with mosthashing functions is that they do not guarantee that distinct values will hash to distinctaddresses, because the hash field space-the number of possible values a hashfield can take-is usually much larger than the address space-the number of availableaddresses for records. The hashing function maps the hash field space to theaddress space.A collision occurs when the hash field value of a record that is being inserted hashesto an address that already contains a different record. In this situation, we ntustinsert the new record in some other position, since its hash address is occupied. Theprocess of finding another position is called collision resolution. There are numerousmethods for collision resolution, including the following:rr' Oper addressing. Proceeding from the occupied position specified by thehash address, the program checks the subsequent positions in order until anunused (empty) position is found. Algorithm 13.2(b) may be used for thispurpose.,e Chaining. For this method, various overflow locations are kept, usually byextending the array with a number of overflow positions. Additionally, apointer field is added to each record location. A collision is resolved by placingthe new record in an unused overflow location and setting the pointer ofthe occupied hash address location to the address of that overflow location.A linked list of overflow records for each hash address is thus maintained. asshown in Figure 13.8(b).r,, Multiple hashing. The program applies a second hash function if the firstresults in a coilision. If another collision results, the program uses opet-laddressing or applies a third hash function and then uses open addressing ifnecessary.Each collision resoiution method requires its own algorithms for insertion,retrieval, and deletion of records. The algorithms for chaining are the simplest.Deletion algorithms for open addressing are rather tricky. Data structures textbooksdiscuss internal hashing aigorithrns in more detail.The goal of a good hashing function is to distribute the records uniformly over theaddres space so as to minimize collisions while not leaving many unused locations.Simulation and analysis studies have shown that it is usually best to keep a hashtable between 70 and 90 percent full so that the number of collisions remains lowand we do not waste too much space. Hence, if we expect to have r records to storein the table, we should choose M locations for the address space such that (r/M) isbetween 0.7 and 0.9. It may also be useful to choose a prime number for M, since ithas been demonstrated that this distributes the hash addresses better over thei O, A deta led drscussion of hashrng functrons is outs de the scope of our presentat on

Previous page

Next page

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

13.1 through 13.5, 13.10 and 13.11

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?