4th International Conference on Principles and Practices ... - MADOC
4th International Conference on Principles and Practices ... - MADOC
4th International Conference on Principles and Practices ... - MADOC
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
The Project Maxwell Assembler System<br />
Bernd Mathiske, Doug Sim<strong>on</strong>, Dave Ungar<br />
Sun Microsystems Laboratories<br />
16 Network Circle, Menlo Park, CA 94025, USA<br />
{Bernd.Mathiske,Doug.Sim<strong>on</strong>,David.Ungar}@sun.com<br />
ABSTRACT<br />
The Java TM programming language is primarily used for<br />
platform-independent programming. Yet it also offers many<br />
productivity, maintainability <strong>and</strong> performance benefits for<br />
platform-specific functi<strong>on</strong>s, such as the generati<strong>on</strong> of machine<br />
code.<br />
We have created reliable assemblers for SPARC TM , AMD64,<br />
IA32 <strong>and</strong> PowerPC which support all user mode <strong>and</strong> privileged<br />
instructi<strong>on</strong>s <strong>and</strong> with 64-bit mode support for all but<br />
the latter. These assemblers are generated as Java source<br />
code by our extensible assembler framework, which itself is<br />
written in the Java language. The assembler generator also<br />
produces javadoc comments that precisely specify the legal<br />
values for each oper<strong>and</strong>.<br />
Our design is based <strong>on</strong> the Klein Assembler System written<br />
in Self. Assemblers are generated from a specificati<strong>on</strong>,<br />
as are table-driven disassemblers <strong>and</strong> unit tests. The specificati<strong>on</strong>s<br />
that drive the generators are expressed as Java<br />
language objects. Thus no extra parsers are needed <strong>and</strong> developers<br />
do not need to learn any new syntax to extend the<br />
framework for additi<strong>on</strong>al ISAs.<br />
Every generated assembler is tested against a preexisting<br />
assembler by comparing the output of both. Each instructi<strong>on</strong>’s<br />
test cases are derived from the cross product of its<br />
potential oper<strong>and</strong> values. The majority of tests are positive<br />
(i.e., result in a legal instructi<strong>on</strong> encoding). The framework<br />
also generates negative tests, which are expected to cause an<br />
error detecti<strong>on</strong> by an assembler. As with the Klein Assembler<br />
System, we have found bugs in the external assemblers<br />
as well as in ISA reference manuals.<br />
Our framework generates tens of milli<strong>on</strong>s of tests. For<br />
symbolic oper<strong>and</strong>s, our tests include all applicable predefined<br />
c<strong>on</strong>stants. For integral oper<strong>and</strong>s, the important boundary<br />
values, such as the respective minimum, maximum, 0,<br />
1 <strong>and</strong> -1, are tested. Full testing can take hours to run but<br />
gives us a high degree of c<strong>on</strong>fidence regarding correctness.<br />
Permissi<strong>on</strong> to make digital or hard copies of all or part of this work for<br />
pers<strong>on</strong>al or classroom use is granted without fee provided that copies are<br />
not made or distributed for profit or commercial advantage <strong>and</strong> that copies<br />
bear this notice <strong>and</strong> the full citati<strong>on</strong> <strong>on</strong> the first page. To copy otherwise, to<br />
republish, to post <strong>on</strong> servers or to redistribute to lists, requires prior specific<br />
permissi<strong>on</strong> <strong>and</strong>/or a fee.<br />
PPPJ 2006, August 30 – September 1, 2006, Mannheim, Germany.<br />
Copyright 2006 ACM ...$5.00.<br />
Keywords<br />
cross assembler, assembler generator, disassembler, automated<br />
testing, the Java language, domain-specific framework,<br />
systems programming<br />
1. INTRODUCTION AND MOTIVATION<br />
Even though the Java programming language is designed<br />
for platform-independent programming, many of its attracti<strong>on</strong>s<br />
1 are clearly more generally applicable <strong>and</strong> thus also<br />
carry over to platform-specific tasks. For instance, popular<br />
integrated development envir<strong>on</strong>ments (IDEs) that are<br />
written in the Java language have been extended (see e.g.<br />
[5]) to support development in languages such as C/C++,<br />
which get statically compiled to platform-specific machine<br />
code. Except for legacy program reuse, we see no reas<strong>on</strong><br />
why compilers in such an envir<strong>on</strong>ment should not enjoy all<br />
the usual advantages attributed to developing software in<br />
the Java language (in c<strong>on</strong>trast to C/C++). Furthermore,<br />
several Java virtual machines have been written in the Java<br />
language (e.g., [3], [21], [14]), including compilers from byte<br />
code to machine code.<br />
With the c<strong>on</strong>tributi<strong>on</strong>s presented in this paper we intend<br />
to encourage <strong>and</strong> support further compiler c<strong>on</strong>structi<strong>on</strong> research<br />
<strong>and</strong> development in Java. Our software relieves programmers<br />
of arguably the most platform-specific task of all,<br />
the correct generati<strong>on</strong> of machine instructi<strong>on</strong>s adhering to<br />
existing general purpose instructi<strong>on</strong> set architecture (ISA)<br />
specificati<strong>on</strong>s.<br />
We focus <strong>on</strong> this low-level issue in clean separati<strong>on</strong> from<br />
any higher level tasks such as instructi<strong>on</strong> selecti<strong>on</strong>, instructi<strong>on</strong><br />
scheduling, addressing mode selecti<strong>on</strong>, register allocati<strong>on</strong>,<br />
or any kind of optimizati<strong>on</strong>. This separati<strong>on</strong> of c<strong>on</strong>cerns<br />
allows us to match our specificati<strong>on</strong>s directly <strong>and</strong> uniformly<br />
to existing documentati<strong>on</strong> (reference manuals) <strong>and</strong> to<br />
exploit pre-existing textual assemblers for systematic, comprehensive<br />
testing. Thus our system virtually eliminates an<br />
entire class of particularly hard-to-find bugs <strong>and</strong> users gain<br />
a fundament of trust to build further compiler layers up<strong>on</strong>.<br />
C<strong>on</strong>sidering different approaches for building assemblers,<br />
we encounter these categories:<br />
1 To name just a few: automatic memory management,<br />
generic static typing, object orientati<strong>on</strong>, excepti<strong>on</strong> h<strong>and</strong>ling,<br />
excellent IDE support, large collecti<strong>on</strong> of st<strong>and</strong>ard libraries.<br />
3