19.11.2014 Views

The Fortress Language Specification - CiteSeerX

The Fortress Language Specification - CiteSeerX

The Fortress Language Specification - CiteSeerX

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

If a character (or any other syntactic entity) x precedes another character y in the program, we say that x is to the left<br />

of y and that y is to the right of x, regardless of how they may appear in a typical rendered display of the program.<br />

Thus, it is always meaningful to speak, for example, of the left-hand and right-hand operands of a binary operator, or<br />

the left-hand side of an assignment expression.<br />

5.4 ASCII Conversion<br />

To facilitate interaction with legacy tools and particularly to aid in program entry, <strong>Fortress</strong> specifies an ASCII encoding<br />

for programs. For every valid <strong>Fortress</strong> program, there is an equivalent program that contains only ASCII characters. 3<br />

To support this encoding, a <strong>Fortress</strong> program undergoes ASCII conversion, which produces an equivalent <strong>Fortress</strong><br />

program. ASCII conversion is idempotent: converting a program that resulted from conversion results in the same<br />

program. Unless otherwise specified, all constraints and properties of <strong>Fortress</strong> programs stipulated in this specification<br />

apply to the programs after they have been converted. This section gives a high-level overview of ASCII conversion.<br />

ASCII conversion consists of two steps. <strong>The</strong> first step consists of “pasting” words across line breaks, so that long<br />

identifiers and numerals can be split across lines. Identifiers may be very long in ASCII because many Unicode<br />

characters are encoded with long sequences of ASCII characters (the actual conversion to Unicode characters is done<br />

in the next step). Roughly speaking, in this step, two consecutive lines are pasted together if the first ends with<br />

an ampersand that is immediately preceded by a word character, and the second begins with an ampersand that is<br />

immediately followed by a word character.<br />

<strong>The</strong> second step replaces certain restricted words, and sequences of operator and special characters, with single Unicode<br />

characters. Roughly speaking, if a restricted word is either the official Unicode 5.0 name with underscores in<br />

place of spaces and hyphens, or a specified alternative name, of some character that is not a printable ASCII character,<br />

then the restricted word is replaced by that character. In some cases, even a fragment of a restricted name may be<br />

replaced by a single character (most commonly a Greek letter). Some multicharacter sequences of ASCII operator and<br />

special characters are also replaced by non-ASCII operator or special characters; we call such a sequence ASCII shorthand.<br />

However, this replacement is not generally done within string literals, which instead provide escape sequences<br />

to get non-ASCII characters (see Section 5.10).<br />

Precise descriptions of both these steps are given in Appendix E, including the rules for replacing fragments of restricted<br />

words and the specification of alternative names for non-operator characters. Alternative names for operator<br />

characters are given in Appendix F.<br />

5.5 Input Elements and Scanning<br />

After ASCII conversion, a <strong>Fortress</strong> program is broken up into input elements by a process called scanning. 4 That is,<br />

scanning transforms a <strong>Fortress</strong> program from a sequence of Unicode characters to a sequence of input elements. <strong>The</strong><br />

characters that comprise an input element always appear contiguously in the input sequence. Every input element is<br />

a whitespace element (comments are whitespace elements) or a token. Every token is a reserved word, a literal, an<br />

identifier, an operator token, or a special token. <strong>The</strong>re are five kinds of literals: boolean literals, character literals,<br />

string literals, the void literal, and numerals (i.e., numeric literals).<br />

Conceptually, we can think of scanning as follows: First, the comments, character literals and string literals are<br />

identified. <strong>The</strong>n the remaining characters are divided into words (i.e., contiguous sequences of word characters:<br />

letters, digits, connecting punctuation, primes and apostrophes), whitespace characters, and other characters. In some<br />

cases, words separated by a single ‘ . ’ or whitespace character (and no other characters) are joined to form a single<br />

3 See Appendix E for the precise notion of equivalence guaranteed by ASCII conversion.<br />

4 <strong>Fortress</strong> has a facility for defining new syntax, discussed in Chapter 36. However, except for that chapter, this specification generally ignores<br />

this facility, and describes the <strong>Fortress</strong> language only for programs that use the standard <strong>Fortress</strong> syntax.<br />

45

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!