08.07.2015 Views

[PDF] Syntax and Semantics

[PDF] Syntax and Semantics

[PDF] Syntax and Semantics

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Language <strong>Syntax</strong> <strong>and</strong> <strong>Semantics</strong>CS 355January 13, 20111. <strong>Syntax</strong> vs <strong>Semantics</strong>• The syntax rules of a programming language specify which input strings (source code) are in thelanguage.• The semantics of a language specify what each string means.Example: What is the syntax of a C if statement? What are the corresponding semantics?2. Describing <strong>Syntax</strong>(a) Lexemes <strong>and</strong> tokens• Lexemes are the lowest level (atomic) syntactical units (identifiers, literals, operators, reservedwords, punctuation, etc. . . ).• Each lexeme is categorized as a token.The following Pascal for loopfor i := 1 to 10 (* my little loop *)buf[i] := i*i;cat be broken down into lexemes <strong>and</strong> tokens as followsLexemes Tokensfor fori identifier:= assignment op1 int literalto to10 int literalbuf identifier[ left square bracei identifier] right square brace:= assignment opi identifier* mult opi identifier; semicolon• We can think of a program as a string of lexemes rather than a string of characters. Lexicalanalysis is the process of breaking a program string into a stream of lexemes/tokens (usuallythe first step in the compile process).• Note how comments <strong>and</strong> whitespace are h<strong>and</strong>led in the above code.• In almost all programming languages, the tokens can be described by regular expressions. Inother words, the language of tokens is regular (more on this later).1


(b) Context free grammars• A context free grammar (CFG) consists of the following:– A set of terminal symbols which are the characters in the strings generated by a grammar(e.g., tokens in our case)– A set of nonterminal symbols (variables), which are place-holders for patterns of terminalsymbols that can be generated by the nonterminal symbols.– A set of productions, which are rules for replacing (or rewriting) nonterminal symbols (onthe left side of the production) in a string with other nonterminal or terminal symbols(on the right side of the production).– A start symbol, which is a special nonterminal symbol that appears in the initial stringgenerated by the grammar.• To generate a string of terminal symbols from a CFG, we:i. Begin with a string consisting of the start symbol;ii. Apply one of the productions with the start symbol on the left h<strong>and</strong> size, replacing thestart symbol with the right h<strong>and</strong> side of the production;iii. Repeat the process of selecting nonterminal symbols in the string, <strong>and</strong> replacing themwith the right h<strong>and</strong> side of some corresponding production, until all nonterminals havebeen replaced by terminal symbols.• Example: a CFG for a arithmetic expressionsexpr → numexpr → ( expr )expr → expr + exprexpr → expr − exprexpr → expr ∗ exprexpr → expr / exprThe only nonterminal is expr which is also the start. The terminals are (, ), +, -, * /,num. We can generate a string in this language by starting with expr <strong>and</strong> then applyingvarious productions as follows:expr ⇒ expr ∗ expr⇒⇒⇒⇒⇒expr + expr ∗ expr(expr) + expr ∗ expr(num) + expr ∗ expr(num) + expr ∗ num(num) + num ∗ numNote that we always replaced the leftmost nonterminal, so we call this a leftmost derivationof the resulting string.(c) Backus-Naur form• Backus-Naur form (BNF) is a metalanguage that mimics CFG’s for describing programminglanguage syntax. We use angle brackets to delimit non-terminals, the pipe symbol | tomean “logical or,” <strong>and</strong> ::= to separate the left <strong>and</strong> right h<strong>and</strong> side of a production.• Example: Pascal if statement ::= if then | if then else 2


• Note that rules can be “recursive” which allows us to defines “lists” of objects: ::= identifier| identifier, • Example: A grammar for a small language ::= begin end ::= | ; ::= = ::= A | B | C ::= + | - | • Example: A grammar for simple arithmetic expressions over variables <strong>and</strong> numbers ::= num ::= id ::= ::= ::= ’(’ ’)’ ::= ’+’ ::= ’-’ ::= ’+’ | ’-’ | ’*’ | ’/’A leftmost derivation of a+b*2.3. Parse Trees• We can describe the hierarchical structure of a particular derivation using a parse tree.• Internal nodes are labeled with nonterminals.• Leaves are labeled with terminals.• Parse tree for a+b*24. Ambiguity• A grammar that can produce to different parse trees for the same sentence is called ambiguous.• Much of the semantics of a programming language depends on the structure of these parse trees,so we need to avoid ambiguous grammars.• Example: a+b*2 can yield two different parse trees.3


5. Operator Precedence• Operators that appear lower in the parse tree are evaluated first, so we want to force our grammarsto push higher precedent operators lower down in the parse tree.• Example: increasing operator precedence(a) +, -(b) * /• Resulting grammar:6. Associativity ::= ::= ::= ::= ::= ’(’ + + ’)’ ::= num | id ::= ’+’ | ’-’ ::= ’*’ | ’/’• Are the parse trees for expressions with two or more adjacent occurrences of the same precedencecorrect?• Example3+4+5 = (3+4)+5 (left associative)3^4^5 = 3^(4^5) (right associative)• New grammar7. Dangling else ::= ::= ::= ::= ::= ’(’ ’)’ ::= num | id ::= ’+’ | ’-’ ::= ’*’ | ’/’• The following grammar is ambiguous ::= | whatever ::= if then else ::= if then • The unambiguous grammar. key observation: between a then <strong>and</strong> its matching else, there cannot be an if without an else.4


::= | ::= if then else | whatever ::= if then | if then else • In practice, the dangling-else ambiguity is intentionally left unresolved in the language’s grammar;explicitly fixing the grammar (as above) makes the grammar less readable. It is assumed thatthe programmer <strong>and</strong> language implementor is aware of the ambiguity <strong>and</strong> knows how resolvethe problem. A good example is the grammar listed in the appendix the famous Kernighan <strong>and</strong>Ritchie book The C Programming Language.• Ada’s solution to the dangling-else problemSince it’s hard to construct an unambiguous grammar for if-then-else statements in C or Pascal,does this hint at a bad language design? Ada’s solution to the “dangling else” problem (seep. 122 in text) is to syntactically differentiate between “nested if statements” <strong>and</strong> “cascaded ifstatements” (note use of elsif).if Score < 50 then -- cascaded if-statement in Adagrade := ’F’;elsif Score < 60 thengrade := ’C’;elsif Score < 70 thengrade := ’B’;elsegrade := ’A’;endif;• Perl forces you to use block statements (i.e., curly braces) for if statements <strong>and</strong> uses an elsifkeyword for cascaded if statements:if ($score < 50) {$grade = ’F’;} elsif ($score < 60) {$grade = ’C’;...}8. Static semantics• Many of the rules about what makes a legal program can not be fully specified via BNF or otherCFG representations, e.g.:– In Java or C++, all variables must be declared before use.– The target of a goto statement in C must be matched with the label of the same name.• Many other rules that could be specified using BNF would make the grammar to large <strong>and</strong>cumbersome, e.g:– Typing rules in Cstruct {int i; float f} vara, varb;struct {int i; int j} varc;vara = varb; /* legal struct copy */vara = varc; /* illegal struct copy */• These types of rules that are/can not be specified by the grammar, but can be checked at compiletime (i.e. statically) are called static semantics.5


• Like CFGs, static semantics are only loosely related to the true semantics (i.e. meaning) of aprogram.• Attribute grammars– To each symbol X in the grammar, we assign a set of attributes A(X) that we use to helpenforce the language’s static semantics.– We can associate a predicate with each grammar production to ascertain the “legality” of aprogram sentence. The predicate will use the attribute information.– Attribute values are passed up <strong>and</strong> down the parse tree. Values at the leaves may come formthe compiler’s symbol table.– Most attributes are connected with static type checking (which we’ll discuss in detail later inthe course).9. Dynamic <strong>Semantics</strong>• The dynamic semantics (or just semantics) of a program sentence is the meaning of that sentence.This can be very rich <strong>and</strong> <strong>and</strong> to describe precisely <strong>and</strong> concisely.• Using operational <strong>Semantics</strong> we can describe the meaning of a statement or expression by describingits execution on a simpler or already understood machine (real or virtual).– Describe the state of the machine before <strong>and</strong> after the statement is executed, e.g., processorinstructions are often described this way – what memories, registers, etc... are affected bythe instruction.– Operational semantics of C for statementfor (expr1; expr2; expr3)statement;loop:done:10. <strong>Syntax</strong> Treesexpr1;if expr2 = 0 goto donestatement;expr3;goto loop;• An internal node of an abstract syntax tree (AST) represents a programming language construct<strong>and</strong> its children denote meaningful components of that construct.source codewhile (i < 10) {x = 2*(x + i);i = i+1;}ident="i"abstract syntax treewhile_stmt


– Internal nodes represent an operation or procedure <strong>and</strong> their child nodes represent the correspondingoper<strong>and</strong>s or parameters.– As opposed to a parse tree, only significant language constructs are retained (i.e., thoseconstructs that affect the semantics of the program). Purely syntactical elements (like parentheses,semi-colons, or other grouping operators) are discarded.– An AST is common data structure generated by the syntax analysis phase of a compiler <strong>and</strong> ispassed on to the next sequence of phases where more analysis, optimization, <strong>and</strong> (eventually)code generation is performed.• A parse tree is sometimes referred to as a concrete syntax tree.7

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!