2 - Forth Interest Group
2 - Forth Interest Group
2 - Forth Interest Group
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Compiling A NS <strong>Forth</strong><br />
Tom Almy<br />
Tualatin, Oregon<br />
Introduction<br />
<strong>Forth</strong>CMP is the author's native code metacompiler for<br />
<strong>Forth</strong>. In 1982 I wrote NCC, which was the first Native<br />
Code Compiler for interactive <strong>Forth</strong> environments. A year<br />
later, the compiler was ported to LMI <strong>Forth</strong> and is still sold<br />
as part of the LMI <strong>Forth</strong> package. The NCC exists in<br />
versions for 280, 80x86, and 80386 protected mode.<br />
However, I felt that for optimum performance a meta-<br />
compiler was necessary. A fully compiled <strong>Forth</strong> applica-<br />
tion would be faster and more compact than one tied to<br />
an interpretive environment. So <strong>Forth</strong>CMP (then called<br />
Cforth) was born. <strong>Forth</strong>CMP is run from the DOS com-<br />
mand line to compile <strong>Forth</strong> applications directly into<br />
executable machine code files. <strong>Forth</strong>CMP-generated, ex-<br />
ecutable files are much smaller than comparable executables<br />
produced by other language compilers, and are typically<br />
smaller than traditionally metacompiled <strong>Forth</strong> applications.<br />
Execution speed is comparable to current C compilers.<br />
Since <strong>Forth</strong>CMP is itself written in <strong>Forth</strong>, application<br />
programs can evaluate expressions and execute applica-<br />
tion colon definitions during compilation time (these are<br />
called 'host colon definitions'). Variables and other data<br />
structures exist in the target (application) address space,<br />
but can be accessed during compilation as though in the<br />
host. Thus, data structures can be algorithmically initial-<br />
ized during compilation. Code definitions are allowed,<br />
and generate target words and machine code subroutines.<br />
The major difference between <strong>Forth</strong>CMP and a traditional<br />
interpreted <strong>Forth</strong> is in the generation of colon definitions.<br />
In <strong>Forth</strong>CMP, they are machine code subroutines. The<br />
system stack is used as the data stack, and a separate stack<br />
is used for the return stack. Several calling conventions are<br />
used to handle the subroutine return address, depending<br />
on usage hints provided to the compiler. The usage hints<br />
can allow passing of arguments in registers and eliminate<br />
the need to move the return address between the data and<br />
return stacks.<br />
The Compilation Process<br />
In a traditional <strong>Forth</strong>, the compiler treats words in three<br />
classes. Words marked as IMMEDIATE are executed during<br />
compilation, and generally represent control structures.<br />
Words not so marked are compiled and then executed<br />
when the word is executed. Words not in the dictionary<br />
must be numeric literals, which are compiled as inline<br />
constants.<br />
In <strong>Forth</strong>CMP, there are many more classes of words:<br />
Literals-these are not compiled, but are pushed on a<br />
compilation 'literal stack.' This allows evaluation of<br />
literal subexpressions at compile time, and also allows<br />
generating instructions with literal arguments, which<br />
reduces the amount of code generated.<br />
Intrinsics-these words are built into the compiler and<br />
are like traditional IMMEDIATE words; however, most<br />
primitive, standard <strong>Forth</strong> words are intrinsics so that<br />
they can generate inline code. The ANS <strong>Forth</strong> version of<br />
<strong>Forth</strong>CMP has about 120 intrinsics. Application pro-<br />
grams cannot define intrinsics (which means they can-<br />
not define compiling, immediate words). Intrinsics have<br />
no execution tokens, since they compile inline code.<br />
Functions--colon and code definitions, either part of<br />
the application or compiled when needed from a library.<br />
It is possible to forward reference functions, with the<br />
restriction that the execution token is not directly<br />
accessible until the function is defined. Words defined<br />
for the host environment cannot be referenced in the<br />
target. Only functions have execution tokens-the value<br />
returned by ' (tick) or FIND-that are suitable for<br />
EXECUTE, and then only if the IN/OUT compiler hint<br />
(which allows passing arguments in registers) is not<br />
used.<br />
Constants-words defined as CONSTANT or 2CONSTANT<br />
behave as numeric literals when compiled. Constants<br />
are not allocated any memory in the target image, thus<br />
have no data address or execution token.<br />
Variables-words defined as VARIABLE, 2VARIABLE,<br />
SCONSTANT, or CREATE behave as numeric literals<br />
when compiled. There is no code generated, and the<br />
I execution token is the data address.<br />
Arrays and Tables-<strong>Forth</strong>CMP provides defining words<br />
to generate single-dimensional arrays and tables (con-<br />
stant arrays). These generate inline accessing code<br />
when used in a colon definition. The execution token is<br />
the data address.<br />
<strong>Forth</strong> Dimensions 7 July 1995 August