11.07.2015 Views

Encyclopedia of Computer Science and Technology

Encyclopedia of Computer Science and Technology

Encyclopedia of Computer Science and Technology

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

402 regular expressioncompetitive <strong>and</strong> other pressures led their designers to startadding complexity.One way processor designers coped with the dem<strong>and</strong>for more complicated instructions was to give the main processora microprocessor with its own set <strong>of</strong> simple instructions.When the main processor received one <strong>of</strong> the complexinstructions, it would be executed by being broken downinto simpler instructions or “microcode” to be executed bythe sub-processor.This approach gave processor designers greater flexibility.It also made things easier for compiler designers,because the compiler could translate higher-level languagestatements into fewer, more complex instructions, leavingit to the hardware with its micro engine to break themdown into the ultimate machine operations. However, italso meant that the processor had to decode <strong>and</strong> executemore instructions in every processor cycle, making it lessefficient <strong>and</strong> slower <strong>and</strong> losing some <strong>of</strong> the benefits <strong>of</strong> thefaster processors that were becoming available.In 1975, John Cocke <strong>and</strong> his colleagues at IBM decidedto build a new minicomputer architecture from the groundup. Instead <strong>of</strong> using complex instructions <strong>and</strong> decodingthem with a micro engine, they would use only simpleinstructions that could be executed one per cycle. The clock(<strong>and</strong> thus the cycle time) would be much faster than forexisting machines, <strong>and</strong> the processor would use pipeliningso it could decode the next instruction while still executingthe previous one. Similarly, in many cases the next item<strong>of</strong> data needed could be fetched at the same time the datafrom the previous step was being written (stored). Thisapproach became known as reduced instruction set computing(RISC), because the number <strong>of</strong> instructions had beenreduced compared to exiting systems, which then becameknown as complex instruction set computing (CISC).Since the RISC system had only simple instructions,compilers could no longer use many complicated but h<strong>and</strong>yinstructions. The compiler would have to take over the job<strong>of</strong> the micro engine <strong>and</strong> break all statements down intothe basic instructions. It became important that the compilerbe able to generate the optimal set <strong>of</strong> instructions byanalyzing how data would have to be moved around inthe machine’s registers <strong>and</strong> memory. In other words, RISChardware gained higher performance through simplificationat the hardware level but at the cost <strong>of</strong> making compilersmore complicated. Fortunately, both hardware <strong>and</strong> s<strong>of</strong>twaredesigners were able to meet the challenge <strong>and</strong> in the processlearn how to get the most out <strong>of</strong> new technology.RISC would also play a part in the design <strong>of</strong> the microprocessorsthat began to power personal computers. Forexample, the DEC Alpha, a “pure” RISC chip introducedin 1992, provided a level <strong>of</strong> power that made it suitable forhigh-performance workstations. Another successful RISCbaseddevelopment has been the SPARC (Scalable ProcessorARChitecture) developed by Sun Microsystems for servers,computer clusters, <strong>and</strong> workstations.Perhaps the most interesting development, however, hasbeen the gradual application <strong>of</strong> RISC principles to mainstreamprocessors such as the Intel 80×86 series used inmost personal computers today. Increasingly, the recentPentium series chips, while supporting their legacy <strong>of</strong> CISCinstructions, are processing them using an inner architecturethat uses RISC principles <strong>and</strong> takes advantage <strong>of</strong> pipelining,as well as using more registers <strong>and</strong> a larger datacache. However, the sheer increase in clock cycle speed <strong>and</strong>performance in the newer chips has made the old trade<strong>of</strong>fbetween complicated <strong>and</strong> simple instructions less relevant.Further ReadingD<strong>and</strong>amudi, Sivarama P. Guide to RISC Processors for Programmers<strong>and</strong> Engineers: Introduction to Assembly Language Programmingfor Pentium <strong>and</strong> RISC Processors. New York: Springer, 2005.Knuth, Donald E. MMIX—A RISC <strong>Computer</strong> for the New Millennium.Vol. 1, fascicle 1 <strong>of</strong> Art <strong>of</strong> <strong>Computer</strong> Programming. UpperSaddle River, N.J.: Addison-Wesley Pr<strong>of</strong>essional, 2005.regular expressionMany users <strong>of</strong> UNIX <strong>and</strong> the old MS-DOS are familiar withthe ability to use “wildcards” to find filenames that matchspecified patterns. For example, suppose a user wants to listall <strong>of</strong> the TIF graphics files in a particular directory. Sincethese files have the extension .tif, a UNIX ls comm<strong>and</strong> or aDOS dir comm<strong>and</strong>, when given the pattern *.tif, will match<strong>and</strong> list all the TIF files. (One does have to be aware <strong>of</strong>whether the operating system in question is case-sensitive.UNIX is, while MS-DOS is not.)The specification *.tif tells the comm<strong>and</strong> “match all fileswhose names consist <strong>of</strong> one or more characters <strong>and</strong> that endwith a period followed by the letters tif.” It is one <strong>of</strong> manypossible regular expressions. (See the accompanying tablefor more examples.) The asterisk here is a “metacharacter.”This means that it is not treated as a literal character, but asa pattern that will be matched in a specified way.Most operating systems that have comm<strong>and</strong> processors(see shell) allow for some form <strong>of</strong> regular expressions, butdon’t necessarily implement all <strong>of</strong> the metacharacters. UNIXprovides the most extensive use for regular expressions (seeUNIX). UNIX has an operating system facility called globthat exp<strong>and</strong>s regular expressions (that is, substitutes forthem whatever matches) <strong>and</strong> passes them on to the manyUNIX tools or utilities designed to work with regular expressions.These tools include editors such as ex <strong>and</strong> vi, thecharacter translation utility (tr), the “stream editor” (sed),<strong>and</strong> the string-searching tool grep. For example, sed can beused to remove all blank lines from a file by specifyingsed ‘s/^$/d’ list.txtThis comm<strong>and</strong> finds all lines with no characters (^$) inthe file list.txt <strong>and</strong> deletes them from the output. Even moreextensive use <strong>of</strong> pattern-matching with regular expressionsis found in many scripting languages (see scripting languages,awk, <strong>and</strong> Perl).It is true that most <strong>of</strong> today’s computer users don’t enteroperating system comm<strong>and</strong>s in text form but instead usemenus <strong>and</strong> manipulate icons (see user interface <strong>and</strong>Micros<strong>of</strong>t Windows). If such a user wants to change oneword to another throughout a word processing document,he or she is likely to open the Edit menu, select Find,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!