4th International Conference on Principles and Practices ... - MADOC
4th International Conference on Principles and Practices ... - MADOC
4th International Conference on Principles and Practices ... - MADOC
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Here, we specify a new parameter field n <strong>and</strong> cause the<br />
generated assembly method to assign its respective argument<br />
to field sh. The fields mb <strong>and</strong> me become c<strong>on</strong>stant<br />
with the given predefined values.<br />
Furthermore, fields in synthetic instructi<strong>on</strong>s can be specified<br />
by arithmetic expressi<strong>on</strong>s composed of numeric c<strong>on</strong>stants<br />
<strong>and</strong> fields. For example, the values of the mb <strong>and</strong> me<br />
fields in the following instructi<strong>on</strong> descripti<strong>on</strong> are the result<br />
of subtracti<strong>on</strong> expressi<strong>on</strong>s.<br />
s y n t h e s i z e ( ” c l r l s l w i ” , ” rlwinm ” , sh ( n ) ,<br />
mb(SUB( b , n ) ) , me(SUB( 3 1 , n ) ) ,<br />
b , n , LE( n , b ) , LT( b , 3 2 ) ) ;<br />
The repeated use of field n exemplifies how <strong>on</strong>e oper<strong>and</strong><br />
may c<strong>on</strong>tribute to the values of several fields.<br />
5.1.2 x86 Instructi<strong>on</strong> Descripti<strong>on</strong>s<br />
The number of possible instructi<strong>on</strong>s in x86 ISAs is about<br />
an order of magnitude larger than in the given RISC ISAs.<br />
If <strong>on</strong>e tried to follow the same approach to create instructi<strong>on</strong><br />
descripti<strong>on</strong>s, <strong>on</strong>e would spend an enormous amount of time<br />
just writing the descripti<strong>on</strong> listings. More importantly, our<br />
primitives to specify RISC instructi<strong>on</strong>s are insufficient to<br />
express instructi<strong>on</strong> prefixes, suffixes, intricate mod r/m relati<strong>on</strong>ships,<br />
etc. Instead of a rich bit-field structure, x86 instructi<strong>on</strong>s<br />
tend to have a byte-wise compositi<strong>on</strong> determined<br />
by numerous not quite orthog<strong>on</strong>al features.<br />
As opcode tables provide the densest, most complete,<br />
well-publicized instructi<strong>on</strong> set descripti<strong>on</strong>s available for x86,<br />
we decided to build our descripti<strong>on</strong>s <strong>and</strong> generators around<br />
those. For an x86 ISA, the symbolic c<strong>on</strong>stant values of the<br />
following descripti<strong>on</strong> object types are verbatim from opcode<br />
tables found in x86 reference manuals (e.g., [12]):<br />
AddressingMethodCode: We allow M to be used in lieu<br />
of the oper<strong>and</strong> code Mv to faithfully mirror published<br />
opcode tables in our instructi<strong>on</strong> descripti<strong>on</strong>s.<br />
Oper<strong>and</strong>TypeCode: e.g. b, d, v, z. Specifies a<br />
mnem<strong>on</strong>ic suffix for the external syntax.<br />
Oper<strong>and</strong>Code: the c<strong>on</strong>catenati<strong>on</strong> of an addressing mode<br />
code with an oper<strong>and</strong> type code, e.g. Eb, Gv, Iz, specifies<br />
explicit oper<strong>and</strong>s, resulting in assembler method<br />
parameters.<br />
RegisterOper<strong>and</strong>Code: e.g. eAX, rDX.<br />
GeneralRegister: e.g. BL, AX, ECX, R10.<br />
SegmentRegister: e.g. ES, DS, GS.<br />
StackRegister: e.g. ST, ST 1, ST 2.<br />
The latter three result in implicit oper<strong>and</strong>s, i.e. the generated<br />
assembler methods do not represent them by parameters.<br />
Instead we append an underscore <strong>and</strong> the respective<br />
oper<strong>and</strong> to the method name. For example, the external assembly<br />
instructi<strong>on</strong> add EAX, 10 becomes add EAX(10) when<br />
using the generated assembler. We also generate the variant<br />
with an explicit parameter that can be used as add(EAX,<br />
10), but that is a different instructi<strong>on</strong>, which is <strong>on</strong>e byte<br />
l<strong>on</strong>ger in the resulting binary form. External textual assemblers<br />
typically do not provide any way to express such<br />
choices.<br />
In additi<strong>on</strong>, these object types are used to describe x86<br />
instructi<strong>on</strong>s:<br />
HexByte: an enum providing hexadecimal unsigned byte<br />
values, used to specify an opcode. Every x86 instructi<strong>on</strong><br />
has either <strong>on</strong>e or two of these. In case of two, the<br />
first opcode must be 0F.<br />
ModRMGroup: specifies a table in which alternative additi<strong>on</strong>al<br />
sets of instructi<strong>on</strong> descripti<strong>on</strong> objects are located,<br />
indexed by the respective 3-bit opcode field in<br />
the mod r/m byte of each generated instructi<strong>on</strong>.<br />
ModCase: a 2-bit value to which the mod field of the mod<br />
r/m byte is then c<strong>on</strong>strained.<br />
FloatingPointOper<strong>and</strong>Code: a floating point oper<strong>and</strong><br />
not further described here.<br />
Integer: an implicit byte oper<strong>and</strong> to be appended to the<br />
instructi<strong>on</strong>, typically 1.<br />
Oper<strong>and</strong>C<strong>on</strong>straint: same as for RISC above, but much<br />
more rarely used, since almost all integral x86 oper<strong>and</strong><br />
value ranges coincide with Java primitive types.<br />
Given these features, we can almost trivially transcribe the<br />
“One Byte Opcode Map” for IA32:<br />
d e f i n e ( 00 , ”ADD” , Eb , Gb ) ;<br />
d e f i n e ( 01 , ”ADD” , Ev , Gv ) ;<br />
. . .<br />
d e f i n e ( 15 , ”ADC” , eAX, Iv ) ;<br />
d e f i n e ( 16 , ”PUSH” , SS ) ;<br />
. . .<br />
d e f i n e ( 80 , GROUP 1, b ,<br />
Eb . excludeExternalTestArgs (AL) , Ib ) ;<br />
. . .<br />
d e f i n e ( CA , ”RETF” ,<br />
Iw ) . b e N o t E x t e r n a l l y T e s t a b l e ( ) ;<br />
// gas does not support segments<br />
. . .<br />
d e f i n e ( 6B , ”IMUL” , Gv, Ev ,<br />
Ib . externalRange ( 0 , 0 x7f ) ) ;<br />
. . .<br />
Many descripti<strong>on</strong> objects <strong>and</strong> the respective result value of<br />
define have modificati<strong>on</strong> methods that c<strong>on</strong>vey special informati<strong>on</strong><br />
to the generator <strong>and</strong> the tester. In the example<br />
above we see the exclusi<strong>on</strong> of a register from testing, the exclusi<strong>on</strong><br />
of an entire instructi<strong>on</strong> from testing <strong>and</strong> the restricti<strong>on</strong><br />
of an integer test argument to a certain value range.<br />
These features suppress already known testing errors that<br />
are merely due to restricti<strong>on</strong>s, limited capabilities, or bugs<br />
in a given external assembler.<br />
Analogous methods to the above are available for RISC<br />
instructi<strong>on</strong> descripti<strong>on</strong>s. For x86, however, there are additi<strong>on</strong>al<br />
methods that modify generator behavior to match<br />
details of the ISA specificati<strong>on</strong> which are not explicit in the<br />
opcode table. This occurs for example in the “Two Byte<br />
Opcode Table” for AMD64:<br />
d e f i n e ( 0F , 80 , ”JO” ,<br />
Jz ) . s e t D e f a u l t O p e r a n d S i z e ( BITS 64 ) ;<br />
. . .<br />
d e f i n e ( 0F , C7 ,<br />
GROUP 9a ) . r e q u i r e A d d r e s s S i z e ( BITS 32 ) ;<br />
8