26.10.2013 Views

2 - Forth Interest Group

2 - Forth Interest Group

2 - Forth Interest Group

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Compiling A NS <strong>Forth</strong><br />

Tom Almy<br />

Tualatin, Oregon<br />

Introduction<br />

<strong>Forth</strong>CMP is the author's native code metacompiler for<br />

<strong>Forth</strong>. In 1982 I wrote NCC, which was the first Native<br />

Code Compiler for interactive <strong>Forth</strong> environments. A year<br />

later, the compiler was ported to LMI <strong>Forth</strong> and is still sold<br />

as part of the LMI <strong>Forth</strong> package. The NCC exists in<br />

versions for 280, 80x86, and 80386 protected mode.<br />

However, I felt that for optimum performance a meta-<br />

compiler was necessary. A fully compiled <strong>Forth</strong> applica-<br />

tion would be faster and more compact than one tied to<br />

an interpretive environment. So <strong>Forth</strong>CMP (then called<br />

Cforth) was born. <strong>Forth</strong>CMP is run from the DOS com-<br />

mand line to compile <strong>Forth</strong> applications directly into<br />

executable machine code files. <strong>Forth</strong>CMP-generated, ex-<br />

ecutable files are much smaller than comparable executables<br />

produced by other language compilers, and are typically<br />

smaller than traditionally metacompiled <strong>Forth</strong> applications.<br />

Execution speed is comparable to current C compilers.<br />

Since <strong>Forth</strong>CMP is itself written in <strong>Forth</strong>, application<br />

programs can evaluate expressions and execute applica-<br />

tion colon definitions during compilation time (these are<br />

called 'host colon definitions'). Variables and other data<br />

structures exist in the target (application) address space,<br />

but can be accessed during compilation as though in the<br />

host. Thus, data structures can be algorithmically initial-<br />

ized during compilation. Code definitions are allowed,<br />

and generate target words and machine code subroutines.<br />

The major difference between <strong>Forth</strong>CMP and a traditional<br />

interpreted <strong>Forth</strong> is in the generation of colon definitions.<br />

In <strong>Forth</strong>CMP, they are machine code subroutines. The<br />

system stack is used as the data stack, and a separate stack<br />

is used for the return stack. Several calling conventions are<br />

used to handle the subroutine return address, depending<br />

on usage hints provided to the compiler. The usage hints<br />

can allow passing of arguments in registers and eliminate<br />

the need to move the return address between the data and<br />

return stacks.<br />

The Compilation Process<br />

In a traditional <strong>Forth</strong>, the compiler treats words in three<br />

classes. Words marked as IMMEDIATE are executed during<br />

compilation, and generally represent control structures.<br />

Words not so marked are compiled and then executed<br />

when the word is executed. Words not in the dictionary<br />

must be numeric literals, which are compiled as inline<br />

constants.<br />

In <strong>Forth</strong>CMP, there are many more classes of words:<br />

Literals-these are not compiled, but are pushed on a<br />

compilation 'literal stack.' This allows evaluation of<br />

literal subexpressions at compile time, and also allows<br />

generating instructions with literal arguments, which<br />

reduces the amount of code generated.<br />

Intrinsics-these words are built into the compiler and<br />

are like traditional IMMEDIATE words; however, most<br />

primitive, standard <strong>Forth</strong> words are intrinsics so that<br />

they can generate inline code. The ANS <strong>Forth</strong> version of<br />

<strong>Forth</strong>CMP has about 120 intrinsics. Application pro-<br />

grams cannot define intrinsics (which means they can-<br />

not define compiling, immediate words). Intrinsics have<br />

no execution tokens, since they compile inline code.<br />

Functions--colon and code definitions, either part of<br />

the application or compiled when needed from a library.<br />

It is possible to forward reference functions, with the<br />

restriction that the execution token is not directly<br />

accessible until the function is defined. Words defined<br />

for the host environment cannot be referenced in the<br />

target. Only functions have execution tokens-the value<br />

returned by ' (tick) or FIND-that are suitable for<br />

EXECUTE, and then only if the IN/OUT compiler hint<br />

(which allows passing arguments in registers) is not<br />

used.<br />

Constants-words defined as CONSTANT or 2CONSTANT<br />

behave as numeric literals when compiled. Constants<br />

are not allocated any memory in the target image, thus<br />

have no data address or execution token.<br />

Variables-words defined as VARIABLE, 2VARIABLE,<br />

SCONSTANT, or CREATE behave as numeric literals<br />

when compiled. There is no code generated, and the<br />

I execution token is the data address.<br />

Arrays and Tables-<strong>Forth</strong>CMP provides defining words<br />

to generate single-dimensional arrays and tables (con-<br />

stant arrays). These generate inline accessing code<br />

when used in a colon definition. The execution token is<br />

the data address.<br />

<strong>Forth</strong> Dimensions 7 July 1995 August

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!