11.07.2015 Views

Encyclopedia of Computer Science and Technology

Encyclopedia of Computer Science and Technology

Encyclopedia of Computer Science and Technology

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

134 data compressionlocal network, or connect to the Internet through DSL,an enhanced phone line. Traveling home, the user mightuse a personal digital assistant (PDA) with a wireless linkto make a restaurant reservation (see wireless computing).The user wants all these services to be seamless <strong>and</strong>essentially interchangeable, but today data communicationsis more like roads in the early days <strong>of</strong> the automobile—afew fast paved roads here <strong>and</strong> there, but manybumpy dirt paths.Further ReadingForouszan Behrouz. Data Communications <strong>and</strong> Networking. NewYork: McGraw Hill, 2006.Stallings, William. Data <strong>and</strong> <strong>Computer</strong> Communications. 8th ed.Upper Saddle River, N.J.: Prentice Hall, 2006.Strangio, Christopher E. “Data Communications Basics.” Availableonline. URL: http://www.camiresearch.com/Data_Com_Basics/data_com_tutorial.html. Accessed July 8, 2007.White, Curt. Data Communications <strong>and</strong> <strong>Computer</strong> Networks: A BusinessUser’s Approach. 4th ed. Boston: Course <strong>Technology</strong>,2006.data compressionThe process <strong>of</strong> removing redundant information from dataso that it takes up less space is called data compression.Besides saving disk space, compressing data such as e-mailattachments can make data communications faster.Compression methods generally begin with the realizationthat not all characters are found in equal numbers intext. For example, in English, letters such as e <strong>and</strong> s arefound much more frequently than letters such as j or x.By assigning the shortest bit codes to the most commoncharacters <strong>and</strong> the longer codes to the least common characters,the number <strong>of</strong> bits needed to encode the text can beminimized.Huffman coding, first developed in 1952, is an algorithmthat uses a tree in which the pairs <strong>of</strong> the least probable (thatis, least common) characters are linked, the next least probablelinked, <strong>and</strong> so on until the tree is complete.Another coding method, arithmetic coding, matchescharacters’ probabilities to bits in such a way that the samebit can represent parts <strong>of</strong> more than one encoded character.This is even more efficient than Huffman coding, but thenecessary calculations make the method somewhat slowerto use.Another approach to compression is to look for words(or more generally, character strings) that match thosefound in a dictionary file. The matching strings are replacedby numbers. Since a number is much shorter than a wholeword or phrase, this compression method can greatlyreduce the size <strong>of</strong> most text files. (It would not be suitablefor files that contain numerical rather than text data, sincesuch data, when interpreted as characters, would look like ar<strong>and</strong>om jumble.)The Lempel-Ziv (LZ) compression method does notuse an external dictionary. Instead, it scans the file itselffor text strings. Whenever it finds a string that occurredearlier in the text, it replaces the later occurrences withan <strong>of</strong>fset, or count <strong>of</strong> the number <strong>of</strong> bytes separating theA basic approach to data compression is to look for recurring patterns<strong>and</strong> store them in a “dictionary.” Each occurrence <strong>of</strong> thepattern can then be replaced by a brief reference to the dictionaryentry. The resulting file may then be considerably smaller than theoriginal.occurrences. This means that not only common words butcommon prefixes <strong>and</strong> suffixes can be replaced by numbers.A variant <strong>of</strong> this scheme does not use <strong>of</strong>fsets to thefile itself, but compiles repeated strings into a dictionary<strong>and</strong> replaces them in the text with an index to their positionin the dictionary.Graphics files can <strong>of</strong>ten be greatly compressed by replacinglarge areas that represent the same color (such as a bluesky) with a number indicating the count <strong>of</strong> pixels with thatvalue. However, some graphics file formats such as GIF arealready compressed, so further compression will not shrinkthem much.More exotic compression schemes for graphics can usefractals or other iterative mathematical functions to encodepatterns in the data. Most such schemes are “lossy” in thatsome <strong>of</strong> the information (<strong>and</strong> thus image texture) is lost,but the loss may be acceptable for a given application. Lossycompression schemes are not used for binary (numeric dataor program code) files because errors introduced in a programfile are likely to affect the program’s performance (ifnot “break” it completely). Though they may have less seriousconsequences, errors in text are also generally consideredunacceptable.TrendsThere are a variety <strong>of</strong> compression programs used onunix systems, but variants <strong>of</strong> the Zip program are nowthe overwhelming favorite on Windows-based systems. Zipcombines compression <strong>and</strong> archiving. Archiving, or thebundling together <strong>of</strong> many files into a single file, contributesa further reduction in file size. This is because files inmost file systems must use a whole number <strong>of</strong> disk sectors,even if that means wasting most <strong>of</strong> a sector. Combining filesinto one file means that at most a bit less than one sectorwill be wasted.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!