11.07.2015 Views

Encyclopedia of Computer Science and Technology

Encyclopedia of Computer Science and Technology

Encyclopedia of Computer Science and Technology

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

hashing 223third party) defragmentation utility, users can periodicallyreorganize their hard drive so that files are again stored incontiguous sectors.Files can also be reorganized to optimize space ratherthan access time. If an operating system has a minimumcluster size c4K, a single file with only 32 bytes <strong>of</strong> datawill still consume 4,096 bytes. However, if all the files arewritten together as one huge file (with an index that specifieswhere each file begins) that waste <strong>of</strong> space would beavoided. This is the principle <strong>of</strong> disk compression. Diskcompression does slow access somewhat (due to the needto look up <strong>and</strong> position to the actual data location for afile) <strong>and</strong> the system becomes more fragile (since garblingthe giant file would prevent access to the data in perhapsthous<strong>and</strong>s <strong>of</strong> originally separate files). The low cost <strong>of</strong>high capacity drives today has made compression lessnecessary.Interfacing Hard DrivesWhen the operating system wants to read or write datato the disk, it must send comm<strong>and</strong>s to the driver, a programthat translates high-level comm<strong>and</strong>s to the instructionsneeded to operate the disk controller, which in turnoperates the motors controlling the disk heads. The twomost commonly used interfaces for PC internal hard drivestoday are both based on the ATA (Advanced <strong>Technology</strong>Attachment) st<strong>and</strong>ard. The older st<strong>and</strong>ard is PATA (parallelATA), also called IDE (Integrated Drive Electronics)or EIDE (Enhanced IDE). Increasingly common today isSATA, or serial ATA. Another alternative, more commonlyused on servers, is SCSI (Small <strong>Computer</strong> System Interface).SCSI is more expensive but has several advantages: It hasthe ability to organize incoming comm<strong>and</strong>s for greater efficiency<strong>and</strong> also features greater flexibility (an EIDE controllercan connect only two hard drives, while SCSI can “daisychain” a large number <strong>of</strong> disk drives or other peripherals).In practice, the two interfaces perform about equally well.USB (Universal Serial Bus) is frequently used to interfacewith external hard drive units (see usb).The capacity continues to increase, with data able tobe written more densely or perhaps in multiple layers onthe same disk surface. Denser storage also <strong>of</strong>fers the abilityto make drives more compact. Already hard drives with adiameter <strong>of</strong> about an inch have been built by IBM <strong>and</strong> othersfor use in digital cameras.The proliferation <strong>of</strong> multimedia (including video) <strong>and</strong>the growth <strong>of</strong> databases has fed a voracious appetite forhard drive space. Disks with a capacity <strong>of</strong> 1 TB (terabyte,or trillion bytes) were starting to come onto the market by2007. For larger installations, disk arrays (see raid) <strong>of</strong>ferhigh capacity <strong>and</strong> data-protecting redundancy.Perpendicular hard drive recording technology recentlydeveloped by Hitachi aligns the magnetic “grains” that holdbits <strong>of</strong> data vertically instead <strong>of</strong> horizontally, allowing fora considerably higher data density (<strong>and</strong> thus capacity, for agiven size disk). Hitachi suggests that eventually 1 TB canbe stored on a 3.5" disk.Drive speeds (<strong>and</strong> thus data throughput) have also beenincreasing, with more users choosing 7200 rpm rather thanthe formerly st<strong>and</strong>ard 5400 rpm drives. (There are drives asfast as 15,000 rpm, but for most applications the benefits <strong>of</strong>higher speed drop <strong>of</strong>f rapidly.)Another factor in data access time <strong>and</strong> throughput is theuse <strong>of</strong> a dedicated memory device (see cache) to “pre-fetch”data likely to be needed. Windows Vista allows memoryfrom some USB memory sticks (see flash drive) to workas a disk cache. “Hybrid” hard drives directly integratingRAM <strong>and</strong> drive storage are also available.Further ReadingJacob, Bruce, Spencer Ng, <strong>and</strong> David Wang. Memory Systems:Cache, DRAM, Disk. San Francisco: Morgan Kaufmann, 2007.“Perpendicular Hard Drive Recording <strong>Technology</strong>.” Available online.URL: http://www.webopedia.com/DidYouKnow/<strong>Computer</strong>_<strong>Science</strong>/2006/perpendicular_hard_drive_technology.asp.Accessed August 6, 2007.“Storage.” Tom’s Hardware Guide. Available online. URL: http://www.tomshardware.com/storage/index.html. Accessed August 6,2007.“What’s Inside a Hard Drive?” Available online. URL: http://www.webopedia.com/DidYouKnow/Hardware_S<strong>of</strong>tware/2002/InsideHardDrive.aspdYouKnow/Hardware_S<strong>of</strong>tware/2002/InsideHardDrive.asp. Accessed August 6, 2007.hashingA hash is a numeric value generated by applying a mathematicalformula to the numeric values <strong>of</strong> the charactersin a string <strong>of</strong> text (see characters <strong>and</strong> strings). The formulais chosen so that the values it produces are always thesame length (regardless <strong>of</strong> the length <strong>of</strong> the original text)<strong>and</strong> are very likely to be unique. (Two different stringsshould not produce the same hash value. Such an event iscalled a collision.)ApplicationsThe two major application areas for hashing are informationretrieval <strong>and</strong> cryptographic certification. In databases,an index table can be built that contains the hash values forthe key fields <strong>and</strong> the corresponding record number for eachfield, with the entries in hash value order. To search thedatabase, an input key is hashed <strong>and</strong> the value is comparedwith the index table (which can be done using a very fastbinary search). If the hash value is found, the correspondingrecord number is used to look up the record. This tends tobe much faster than searching an index file directly.Alternatively, a “coarser” but faster hashing function canbe used that will give the same hash value to small groups(called bins) <strong>of</strong> similar records. In this case the hash fromthe search key is matched to a bin <strong>and</strong> then the recordswithin the bin are searched for an exact match.In cryptography an encrypted message can be hashed,producing a unique fixed-length value. (The fixed lengthprevents attackers from using mathematical relationshipsthat might be discoverable from the field lengths.) Thehashed message can then be encrypted again to create anelectronic signature (see certificate, digital). For longmessages this is more efficient than having to apply the signaturefunction to each block <strong>of</strong> the encrypted message, yet

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!