IFF494 Chapter 13 Disk Storage, Basic File Structures, <strong>and</strong> HashingDirectorylnnal danth n{each bucketData file buckets000001010011100101110'I 11Bucket for recordswhose hash valuesstart with 000Bucket for recordswhose hash valuesstart with 001Bucket for recordswhose hash valuesstart with O1Bucket for recordswhose hash valuesstart with 1OBucket for recordswhose hash valuesstart with 110Figure <strong>13.1</strong>1Structure of the extendible hashinq scheme.Bucket for recordswhose hash valuesstartwith |11function h(X) = K mod M; this hash function is called the initial hash function ft'.Overflow because of collisions is still needed <strong>and</strong> can be h<strong>and</strong>led by maintainingindividual overflow chains for each bucket. However, when a collision leads to anoverflow record in any file bucket, the frsr bucket in the file-bucket 0-is split intotwo buckets: the original bucket 0 <strong>and</strong> a new bucket M at the end of the file. The
13.9 Other Primarv File Oroanizationsecordsvatues)0ecorosvalues)1--corosvalues-.cordsvalues))corosvalues0,'corosvalues1ion /r;.ainingto anit intoe. Therecords originally in bucket 0 are distributed between the two buckets based on adifferent hashing function hi*r(K) = K mod 2M. A key property of the two hashfunctions h,<strong>and</strong>h,*, is that any records that hashed to bucket 0 based on /r; will hashto either bucket 0 or bucket M based on ft;n1; this is necessary for linear hashingto work.As further collisions lead to overflorv records, additional buckets are split in the /lrrearorder 1,2,3,.... Ifenough overflows occur, all the original file buckets 0, 1,. . .,M - I will have been split, so the file now has 2M instead of M buckets, <strong>and</strong> all bucketsuse the hash function ll,*,. Hence, the records in overflow are eventually redistributedinto regular buckets, using the function h,*, via a delayed spi lr of theirbuckets. There is no directory; only a virlue n-which is initially set to 0 <strong>and</strong> is incrementedby I whenever ir split occurs-is needed to determine which buckets havebeen split. To retrieve a record with hash key value K, first apply the function h,to K;if hi\) ( n, then apply the function h,*, on K because the bucket is already split.Initially, n = 0, indicating that the function lr, applies to all buckets; /r grows linearlyas buckets are split.When n = M after being incremented, this signifies that all the original bucketshave been split <strong>and</strong> the hash function lr,*, applies to all records in the file. At thispoint, n is reset to 0 (zero), <strong>and</strong> any nerv collisions that cause overflow lead to theuse of a new hashing function hit2(K) = K mod 4M.ln general, a sequence of hashingfunctions h,*,(K) = K mod (2iM) is used, wherey = 0, 1, 2, . . .; a new hashingfunction h;*;*, is needed whenever all the buckets 0, 1, . .., (2/M)- I have been split<strong>and</strong> n is reset to 0. The search for a record with hash key value K is given byAlgorithm 13.3.Splitting can be controlled by monitoring the file load factor instead of by splittingwhenever an overflow occurs. In general, the file load factor I can be defined as / =rl(bfrr N), where r is the current number of file records,bfr is the maximum numberof records that can fit in a bucket, <strong>and</strong> N is the current number of file buckets.Buckets thart have been split can also be recombined if the load factor of the file fallsbelow a certain threshold. Blocks are combined linearly, <strong>and</strong> N is decrementedappropriately. The file load can be used to trigger both splits <strong>and</strong> combinations; inthis manner the file load can be kept within a desired range. Splits can be triggeredwhen the load exceeds a certain threshold-say,0.9-<strong>and</strong> combinations can be triggeredwhen the load falls below another threshold-s ay,0.7.Algorithm 13.3. The Search Procedure for Linear Hashingifn=0then nr