13.07.2015 Views

C# Language Specification - Willy .Net

C# Language Specification - Willy .Net

C# Language Specification - Willy .Net

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 9 Lexical structure9. Lexical structure9.1 ProgramsA <strong>C#</strong> program consists of one or more source files, known formally as compilation units (§16.1). A source file isan ordered sequence of Unicode characters. Source files typically have a one-to-one correspondence with files ina file system, but this correspondence is not required.Conceptually speaking, a program is compiled using three steps:1. Transformation, which converts a file from a particular character repertoire and encoding scheme into asequence of Unicode characters.2. Lexical analysis, which translates a stream of Unicode input characters into a stream of tokens.3. Syntactic analysis, which translates the stream of tokens into executable code.Conforming implementations must accept Unicode source files encoded with the UTF-8 encoding form (asdefined by the Unicode standard), and transform them into a sequence of Unicode characters. Implementationsmay choose to accept and transform additional character encoding schemes (such as UTF-16, UTF-32, or non-Unicode character mappings).[Note: It is beyond the scope of this standard to define how a file using a character representation other thanUnicode might be transformed into a sequence of Unicode characters. During such transformation, however, it isrecommended that the usual line-separating character (or sequence) in the other character set be translated to thetwo-character sequence consisting of the Unicode carriage-return character followed by Unicode line-feedcharacter. For the most part this transformation will have no visible effects; however, it will affect theinterpretation of verbatim string literal tokens (§9.4.4.5). The purpose of this recommendation is to allow averbatim string literal to produce the same character sequence when its source file is moved between systems thatsupport differing non-Unicode character sets, in particular, those using differing character sequences for lineseparation.end note]9.2 GrammarsThis specification presents the syntax of the <strong>C#</strong> programming language using two grammars. The lexicalgrammar (§9.2.1) defines how Unicode characters are combined to form line terminators, white space,comments, tokens, and pre-processing directives. The syntactic grammar (§9.2.2) defines how the tokensresulting from the lexical grammar are combined to form <strong>C#</strong> programs.9.2.1 Lexical grammarThe lexical grammar of <strong>C#</strong> is presented in §9.3, §9.4, and §9.5. The terminal symbols of the lexical grammar arethe characters of the Unicode character set, and the lexical grammar specifies how characters are combined toform tokens (§9.4), white space (§9.3.3), comments (§9.3.2), and pre-processing directives (§9.5).Every source file in a <strong>C#</strong> program must conform to the input production of the lexical grammar (§9.3).9.2.2 Syntactic grammarThe syntactic grammar of <strong>C#</strong> is presented in the chapters and appendices that follow this chapter. The terminalsymbols of the syntactic grammar are the tokens defined by the lexical grammar, and the syntactic grammarspecifies how tokens are combined to form <strong>C#</strong> programs.Every source file in a <strong>C#</strong> program must conform to the compilation-unit production (§16.1) of the syntacticgrammar.51

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!