13.1 through 13.5, 13.10 and 13.11

More documents

Recommendations

Info

-.-Chapter 13 Disk Storage, Basic File Structures, and Hashingthen use a new hashing function (based on the new value of M) to redistribute therecords. These reorganizations can be quite time consuming for large files. Newerdynamic file organizations based on hashing allow the number of buckets to varydvnamically with only localized reorganization (see Section 13.8.3).When using external hashing, searching for a record given a value of some fieldother than the hash field is as expensive as in the case of an unordered file. Recorddeletion can be inplemented by removing the record from its bucket. If the buckethas an overflow chain, we can move one of the overflow records into the bucket toreplace the deleted record. If the record to be deleted is already in overflow, we simplyrernove it from the linked list. Notice that rernoving an overflow record impliesthat rve should keep track of emprty positions in overflow. This is done easily bymaintainir-rg a linked list of unused overflow locations.Modiffing a record's field value depends on two factors: the search condition tolocate the record and the field to be modified. If the search condition is an equalitycomparison on the hash field, we can locate the record efficiently by using the hashingfunction; otherwise, we must do a linear search. A nonhash field can be modifiedby changing the record and rewriting it in the same bucket. Modifring the hashfield means that the record can move to another bucket, which requires deletion ofthe old record followed bv insertion of the modified record.13.8.3 Hashing Techniques That Allow DynamicFile ExpansionA major drawback of the srarlc hashing scheme just discussed is that the hashaddress space is fixed. Hence, it is difficult to expand or shrink the file dynamically.The schemes described in this section attempt to remedy this situation. The firstscheme-extendible hashing-stores an acces structure in addition to the fiie, andhence is somewhat sirnilar to indexing (Chapter 14). The main difference is that theaccesstructure is based on the values that result after application ofthe hash functionto the search field. hr indexing, the acces structure is based on the values ofthesearch freld itself. The second technique, called linear hashing, does not requireadditional accesstmctures.These hashing schemes take advantage of the f-acthat the result of applying a hashingfunction is a r.ronnegative integer and hence can be represented as a binary number.The access structure is built on the binary representation of the hashingfunction result, which is a string of bits. We call this the hash value of a record.Records are distributed amons buckets based on the values of the lendirry bits intheir hash values.Extendible Hashing. In extendible hashing, a type of directory-an array of 2'rbucket addresses-is maintained, where d is called the global depth of the directory.The integer value corresponding to the first (high-order) d bits of a hash valueis used as an index to the array to determine a directory entry, and the address inthat entry determines the bucket in which the corresponding records are stored.
'l 3.8 Hashing Techniquesute theNewert() varyre fieldI{ccordbucketcket toie slmrnplies.sily bytlon toquality-. hashn-rodirehashtion ofe hashnically.re firstle, andhat ther func-; of thecquirer hash-.'numashing:ecord.&irs invof2ddirecrvalueress instored.However, there does not have to be a distinct bucket for each of the 2,i directorylocations. Several directory locations rvith the same first d'bits tbr their hash valuesmay contain the same bucket address if all the records that hash to these locations fitin a single bucket. A local depth d'-stored with each bucket-specifies the numberof bits on lvhich the bucket contents irre based. Figure 13.13 shows a directorywith global depth ri = 3.The value of d can be increased or c-lecreased by one at a time, thus doublir-rg or halvingthe number of entries ir-r the directory array. Doubling is needed if a bucket,whose local depth d' is equal to the global depth d, overflows. Halving occurs if d >d'for all the buckets after some deletions occur. Most record retrievals require twoblock accesses-one to the directory and the other to the bucket.To illustrate br.rcket splitting, suppose thirt a new inserted record causes overflow inthe bucket whose hash values start with 0l-the third bucket in Figure 13.13. Therecords will be distributed between twcl bnckets: the first contains all records whosehash values start with 010, ar-rd the second all those rvhose htrsl-r values start witl-r011. Norv the two directory locations fbr 0t0 and 0l I point to the two new distinctbuckets. Before the split, they pointed to the sante bucket. The local depth d' of thetwo new buckets is 3, which is one more than the local depth of the old bucket.If a bucket that overflows and is split used to have a locai depth d' equal to the globaldepth d of the directory, then the size of the directory must now be doubled so thatwe can use au extra bit to distinguish the two new buckets. For example, if thebucket for records rvhose hash values start with lll in Figure 1-3.11 overflows, thetwo new buckets need a directory with global depth d = 4, because the two bucketsare now labeled I I l0 and 1111, and hence their local depths are both 4. The directorysize is l-rence doubled, and each of the other original locations in the directoryis also split into two locations, both of which have the same pointer value as did theoriginal location.The rnain advantage of extendible hashing that makes it attractive is that the performanceof the file does not degrade as the file grows, as opposed to static externalhashing where collisions increase and the corresponding chaining caLlses additionalaccesses. Additionally, no space is allocated in extendible hashing tbr fttture growth,but additional buckets can be allocated dynamically as needecl. The space overheadfor the directory tabie is negligible. The maximum directory size is 2t, where k is thenumber of bits in the hash value. Another advantage is that splitting causes minorreorganization in most cases, since only the records in one bucket are redistributedto the two new buckets. The only time reorganization is more expensive is when thedirectory has to be doubled (or halved). A disadvantagethat the directory must besearched before accessing the buckets themselves, resulting in two block accessesinstead of one in static hashing. This perfbrmance penalty is considered minor andhence the scheme is considered quite desirable tbr dynamic files.Linear Hashing. The idea behind linear hashing is to ailow a hash file to expandand shrink its number of buckets dynar-r,ically without needing a directory. Supposethat the file starts withMbuckets numbered0, 1,... ,M- l and uses the modhash
Page 1 and 2: -\,--{+.nupt",. 1 3Disk StoragG, Ba
Page 3 and 4: 13.1 Introduction!( ) frl \.i()tl t
Page 5 and 6: 13.2 Secondarv Storaoe Devices467to
Page 7 and 8: 13.2 Secondary Storage Devicesi\ (r
Page 9 and 10: 13.2 Secondary Storage Devices 471T
Page 11 and 12: 13.3 Buffering of Blocks473r-e' and
Page 13 and 14: 13.4 Placinq File Records on Disk47
Page 15 and 16: 13.4 Placino File Records on Disk 4
Page 17 and 18: engtht."IIInumber of records per bl
Page 19 and 20: 13.5 Ooerations on Files481artment.
Page 21 and 22: I3.7 Files of Ordered Records (Sort
Page 23 and 24: 13.7 Files of Ordered Records (Sort
Page 25 and 26: 13.8 Hashino Techniouesn eachcbleme
Page 27 and 28: 13.8 Hashing TechniquesOther hashin
Page 29: 13.8 Hashing Techniques 491s mayMai
Page 33 and 34: 13.9 Other Primarv File Oroanizatio
Page 35 and 36: 13.10 Parallelizing Disk Access Usi
Page 37 and 38: 13. 10 Parallelizing Disk Access Us
Page 39 and 40: ,l|{l}'qn--*13.11 Storaoe Area Netw
Page 41 and 42: Review Ouestions. it has)ntoa'qulre

13.1 through 13.5, 13.10 and 13.11

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?