11.07.2015 Views

Encyclopedia of Computer Science and Technology

Encyclopedia of Computer Science and Technology

Encyclopedia of Computer Science and Technology

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

96 compilersuch as a keyword (reserved word) such as if or while.Next, the tokens are parsed or grouped according to therules <strong>of</strong> the language. The result <strong>of</strong> parsing is a “parse tree”that resolves statements into their component parts. Forexample, an assignment statement may be parsed into anidentifier, an assignment operator (such as =), <strong>and</strong> a value tobe assigned. The value in turn may be an arithmetic expressionthat consists <strong>of</strong> operators <strong>and</strong> oper<strong>and</strong>s.Parsing can be done either “bottom up” (finding theindividual components <strong>of</strong> the statement <strong>and</strong> then linkingthem together) or “top down” (identifying the type <strong>of</strong> statement<strong>and</strong> then breaking it down into its component parts).A set <strong>of</strong> grammatical rules specifies how each construct(such as an arithmetic expression) can be broken into (orbuilt up from) its component parts.The next step is semantic analysis. During this phase theparsed statements are analyzed further to make sure theydon’t violate language rules. For example, most languagesrequire that variables must be declared before they are referencedby the program. Many languages also have rules forwhich data types may be converted to other types when thetwo types are used in the same operation.The result <strong>of</strong> front-end processing is an intermediate representationsomewhere between the source statements <strong>and</strong>machine-level statements. The intermediate representationis then passed to the back end.Code Generation <strong>and</strong> OptimizationThe process <strong>of</strong> code generation usually involves multiplepasses that gradually substitute machine-specific code <strong>and</strong>data for the information in the parse tree. An importantconsideration in modern compilers is optimization, whichis the process <strong>of</strong> substituting equivalent (but more efficient)constructs for the original output <strong>of</strong> the front end. Forexample, an optimizer can replace an arithmetic expressionwith its value so that it need not be repeatedly calculatedwhile the program is running. It can also “hoist out” aninvariant expression from a loop so that it is calculated onlyonce before the loop begins. On a larger scale, optimizationcan also improve the communication between differentparts (procedures) <strong>of</strong> the program.The compiler must attempt to “prove” that the change itis making in the program will never cause the program tooperate incorrectly. It can do this, for example, by tracingthe possible paths <strong>of</strong> execution through the program (suchas through branching <strong>and</strong> loops) <strong>and</strong> verifying that eachpossible path yields the correct result. A compiler that istoo “aggressive” in making assumptions can produce subtleprogram errors. (Many compilers allow the user to controlthe level <strong>of</strong> optimization, <strong>and</strong> whether to optimize for speedor for compactness <strong>of</strong> program size.) During development,a compiler is <strong>of</strong>ten set to include special debugging code inthe output. This code preserves potentially important informationthat can help the debugging facility better identifyprogram bugs. After the program is working correctly, itwill be recompiled without the debugging code.The final code generation is usually accomplished byusing templates that match each intermediate constructionwith a construction in the target (usually assembly)Compilation is a multistep process. Lexical analysis breaks statementsdown into tokens, which are then parsed <strong>and</strong> subjected tosemantic analysis. The resulting intermediate representation can beoptimized before the final object code is generated.language, plugging items in as specified by the template.Often a final step, called peephole optimization, examinesthe assembly code <strong>and</strong> identifies redundancies or, if possible,replaces a memory reference so that a faster machineregister is used instead.In most applications the assembly code produced bythe compiler is linked to code from other source files. Forexample, in a C++ applications class definitions <strong>and</strong> codethat use objects from the classes may be compiled separately.Also most languages (such as C <strong>and</strong> C++) have operatingsystem-specific libraries that contain commonly usedsupport functions.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!