4th International Conference on Principles and Practices ... - MADOC
4th International Conference on Principles and Practices ... - MADOC
4th International Conference on Principles and Practices ... - MADOC
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
tomated assembler <strong>and</strong> disassembler testing (secti<strong>on</strong> 5.5).<br />
We briefly discuss related work in secti<strong>on</strong> 6 <strong>and</strong> the paper<br />
c<strong>on</strong>cludes with notable observati<strong>on</strong>s <strong>and</strong> future work (secti<strong>on</strong><br />
7).<br />
3. HOW TO USE THE ASSEMBLERS<br />
Each assembler c<strong>on</strong>sists of the top level package<br />
com.sun.max.asm <strong>and</strong> the subpackage matching its ISA<br />
as listed in Figure 1. In additi<strong>on</strong>, the package<br />
com.sun.max.asm.x86 is shared between the AMD64 <strong>and</strong><br />
the IA32 assembler. Hence, to use the AMD64 assembler<br />
the following packages are needed: 4 com.sun.max.asm,<br />
com.sun.max.asm.amd64 <strong>and</strong> com.sun.max.asm.x86. N<strong>on</strong>e<br />
of the assemblers requires any of the packages under .gen<br />
<strong>and</strong> .dis.<br />
To use an assembler, <strong>on</strong>e starts by instantiating <strong>on</strong>e of the<br />
leaf classes shown in Figure 2. The top class Assembler provides<br />
comm<strong>on</strong> methods for all assemblers, c<strong>on</strong>cerning e.g.<br />
label binding <strong>and</strong> output to streams or byte arrays. The<br />
generated classes in the middle c<strong>on</strong>tain the ISA-specific assembly<br />
routines. For ease of use, these methods are purposefully<br />
closely oriented at existing assembly reference manuals,<br />
with method names that mimic mnem<strong>on</strong>ics <strong>and</strong> parameters<br />
that directly corresp<strong>on</strong>d to individual symbolic <strong>and</strong> integral<br />
oper<strong>and</strong>s.<br />
Here is an example for AMD64 that creates a small sequence<br />
of machine code instructi<strong>on</strong>s (shown in Figure 3) in<br />
a Java byte array:<br />
import s t a t i c<br />
. . . asm . amd64 . AMD64GeneralRegister64 . ∗ ;<br />
. . .<br />
public byte [ ] c r e a t e I n s t r u c t i o n s ( ) {<br />
l<strong>on</strong>g s t a r t A d d r e s s = 0 x12345678L ;<br />
AMD64Assembler asm =<br />
new AMD64Assembler ( s t a r t A d d r e s s ) ;<br />
}<br />
Label loop = new Label ( ) ;<br />
Label s u b r o u t i n e = new Label ( ) ;<br />
asm . f i x L a b e l ( subroutine , 0x234L ) ;<br />
asm . mov(RDX, 12 , RSP . i n d i r e c t ( ) ) ;<br />
asm . bindLabel ( loop ) ;<br />
asm . c a l l ( s u b r o u t i n e ) ;<br />
asm . sub (RDX, RAX) ;<br />
asm . cmpq(RDX, 0 ) ;<br />
asm . j n z ( loop ) ;<br />
asm . mov( 2 0 , RCX. base ( ) , RDI . index ( ) ,<br />
SCALE 8 , RDX) ;<br />
return asm . toByteArray ( ) ;<br />
Instead of using a byte array, assembler output can also be<br />
directed to a stream (e.g. to write to a file or into memory):<br />
OutputStream stream = new . . . Stream ( . . . ) ;<br />
asm . output ( stream ) ;<br />
The above example illustrates two different kinds of label<br />
usage. Label loop is bound to the instructi<strong>on</strong> following the<br />
bindLabel() call. In c<strong>on</strong>trast, label subroutine is bound<br />
to an absolute address. In both cases, the assembler creates<br />
PC-relative code, though, by computing the respective<br />
4 In additi<strong>on</strong>, general purpose packages from MaxwellBase<br />
<strong>and</strong> the JRE are needed.<br />
offset argument. 5 An explicit n<strong>on</strong>-label argument can be<br />
expressed by using int (or sometimes l<strong>on</strong>g) values instead<br />
of labels, as in:<br />
asm . c a l l ( 2 0 0 ) ;<br />
The variant of call() used here is defined in the raw assembler<br />
(AMD64RawAssembler) superclass of our assembler <strong>and</strong><br />
it takes a “raw” int argument:<br />
public void c a l l ( int r e l 3 2 ) { . . . }<br />
In c<strong>on</strong>trast, the call() method used in the first example<br />
is defined in the label assembler (AMD64LabelAssembler),<br />
which sits between our assembler class <strong>and</strong> the raw assembler<br />
class:<br />
public void c a l l ( Label label ) {<br />
. . . c a l l ( l a b e l O f f s e t A s I n t ( label ) ) ; . . .<br />
}<br />
This method builds <strong>on</strong> the raw call() method, as sketched<br />
in its body.<br />
These methods, like many others, are syntactically differentiated<br />
by means of parameter overloading. This Java<br />
language feature is also leveraged to distinguish whether a<br />
register is used directly, indirectly, or in the role of a base<br />
or an index. For example, the expressi<strong>on</strong> RSP.indirect()<br />
above results in a different Java type than plain RSP, thus<br />
clarifying which addressing mode the given mov instructi<strong>on</strong><br />
must use. Similarily, RCX.base() specifies a register in the<br />
role of a base, etc.<br />
If there is an argument with a relatively limited range of<br />
valid values, a matching enum class rather than a primitive<br />
Java type is defined as the parameter type. This is for<br />
instance the case regarding SCALE 8 in the SIB addressing<br />
expressi<strong>on</strong> above. Its type is declared as follows:<br />
public enum S c a l e . . . {<br />
SCALE 1 , SCALE 2 , SCALE 4 , SCALE 8 ;<br />
. . .<br />
}<br />
Each RISC assembler features synthetic instructi<strong>on</strong>s according<br />
to the corresp<strong>on</strong>ding reference manual. For instance,<br />
<strong>on</strong>e can write these statements to create some synthetic<br />
SPARC instructi<strong>on</strong>s [20]:<br />
import s t a t i c . . . asm . s p a r c .GPR. ∗ ;<br />
SPARC32Assembler asm = new SPARC32Assembler ( . . . ) ;<br />
asm . nop ( ) ;<br />
asm . s e t ( 5 5 , G3 ) ;<br />
asm . i n c ( 4 , G7 ) ;<br />
asm . r e t l ( ) ;<br />
. . .<br />
Let’s take a look at the generated source code of <strong>on</strong>e of these<br />
methods:<br />
5 In our current implementati<strong>on</strong>, labels always generate PCrelative<br />
code, i.e. absolute addressing is <strong>on</strong>ly supported by<br />
the raw assemblers.<br />
5