11.07.2015 Views

Encyclopedia of Computer Science and Technology

Encyclopedia of Computer Science and Technology

Encyclopedia of Computer Science and Technology

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

parsing 361sis.) A token is normally defined as a series <strong>of</strong> one or morecharacters separated by “whitespace” (blanks, carriagereturns, <strong>and</strong> so on). A token is thus analogous to a word inEnglish.The series <strong>of</strong> tokens is then sent to the parser. The parser’sjob is to identify the significance <strong>of</strong> each token <strong>and</strong> togroup the tokens into properly formed statements. Generally,the parser first checks the tokens for keywords—wordssuch as “if” or “loop” that have a special meaning in aparticular programming language. (In the BASIC example,PRINT is a keyword: In many other languages such functionsare external rather than being part <strong>of</strong> the languageitself.) As keywords (<strong>and</strong> punctuation symbols such as thesemicolon used at the end <strong>of</strong> statements in C <strong>and</strong> Pascal)are identified, the parser uses a set <strong>of</strong> rules to determine theoverall structure <strong>of</strong> the statement. For example, a languagemight define an if statement as follows:If then else This means that when the parser encounters an “if” it willexpect to find between that word <strong>and</strong> “then” an expressionthat can be tested for being true or false (see Booleanoperators). Following “then,” it will expect to find a completestatement. If it finds the optional keyword “else,” thatword will be followed by an alternative statement. Thus inthe statementIf Total > Limit Print “Overflow” else PrintTotalThe elements would be broken down as follows:IfTotal > LimitPrint“Overflow”elsePrintTotalkeywordBoolean expressionkeywordString literal (charactersto be printed)keywordkeywordvariableWhen writing a parser, the programmer depends ona precise <strong>and</strong> exhaustive description <strong>of</strong> the possible legalconstructs in the language (see also Backus-Naur form).In turn, these rules are turned into procedures by whichthe parser can construct a representation <strong>of</strong> the relationshipsbetween the tokens. This representation is <strong>of</strong>ten representedas an upside-down tree, rather like the sentencediagrams used in English class.In general form, an expression, for example, can be diagrammedas consisting <strong>of</strong> one or more terms (variables,constants, or literal values) or other expressions separatedby operators.A parse tree for the statement A = B + C × D. Notice how theexpression on the right-h<strong>and</strong> side <strong>of</strong> the equals (assignment) sign iseventually parsed into the component identifiers <strong>and</strong> operators.Notice that these diagrams are <strong>of</strong>ten recursive. That is,the definition <strong>of</strong> an expression can include expressions.The number <strong>of</strong> levels that can be “nested” is usually limitedby the compiler if not by the definition <strong>of</strong> the language.The underlying rules must be constructed in such a waythat they are not ambiguous. That is, any given string <strong>of</strong>tokens must result in one, <strong>and</strong> only one parse tree.Once the elements have been extracted <strong>and</strong> classified,a compiler must also analyze the nonkeyword tokens tomake sure they represent valid data types, any variableshave been previously defined, <strong>and</strong> the language’s namingconventions have been followed (see compiler).Fortunately, people who are designing comm<strong>and</strong> processors,scripting languages, <strong>and</strong> other applications requiringparsers need not work from scratch. Tools such as YACC (agrammar definition compiler) <strong>and</strong> BISON <strong>and</strong> ANTLR (parsergenerators) are available for UNIX <strong>and</strong> other platforms.Further ReadingAho, Alfred V., et al. Compilers: Principles, Techniques, & Tools. 2nded. Boston: Pearson/Addison-Wesley, 2007.Bowen, Jonathan P., <strong>and</strong> Peter T. Breuer. “Razor: The Cutting Edge<strong>of</strong> Parser <strong>Technology</strong>.” Oxford University Computing Laboratory.Available online. URL: http://www.jpbowen.com/pub/toulouse92.pdf. Accessed August 17, 2007.Donnelly, Charles, <strong>and</strong> Richard M. Stallman. Bison Manual: Usingthe YACC-Compatible Parser Generator. Boston, Mass.: GNUPress, 2003.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!