15.01.2015 Views

4th International Conference on Principles and Practices ... - MADOC

4th International Conference on Principles and Practices ... - MADOC

4th International Conference on Principles and Practices ... - MADOC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Here, we specify a new parameter field n <strong>and</strong> cause the<br />

generated assembly method to assign its respective argument<br />

to field sh. The fields mb <strong>and</strong> me become c<strong>on</strong>stant<br />

with the given predefined values.<br />

Furthermore, fields in synthetic instructi<strong>on</strong>s can be specified<br />

by arithmetic expressi<strong>on</strong>s composed of numeric c<strong>on</strong>stants<br />

<strong>and</strong> fields. For example, the values of the mb <strong>and</strong> me<br />

fields in the following instructi<strong>on</strong> descripti<strong>on</strong> are the result<br />

of subtracti<strong>on</strong> expressi<strong>on</strong>s.<br />

s y n t h e s i z e ( ” c l r l s l w i ” , ” rlwinm ” , sh ( n ) ,<br />

mb(SUB( b , n ) ) , me(SUB( 3 1 , n ) ) ,<br />

b , n , LE( n , b ) , LT( b , 3 2 ) ) ;<br />

The repeated use of field n exemplifies how <strong>on</strong>e oper<strong>and</strong><br />

may c<strong>on</strong>tribute to the values of several fields.<br />

5.1.2 x86 Instructi<strong>on</strong> Descripti<strong>on</strong>s<br />

The number of possible instructi<strong>on</strong>s in x86 ISAs is about<br />

an order of magnitude larger than in the given RISC ISAs.<br />

If <strong>on</strong>e tried to follow the same approach to create instructi<strong>on</strong><br />

descripti<strong>on</strong>s, <strong>on</strong>e would spend an enormous amount of time<br />

just writing the descripti<strong>on</strong> listings. More importantly, our<br />

primitives to specify RISC instructi<strong>on</strong>s are insufficient to<br />

express instructi<strong>on</strong> prefixes, suffixes, intricate mod r/m relati<strong>on</strong>ships,<br />

etc. Instead of a rich bit-field structure, x86 instructi<strong>on</strong>s<br />

tend to have a byte-wise compositi<strong>on</strong> determined<br />

by numerous not quite orthog<strong>on</strong>al features.<br />

As opcode tables provide the densest, most complete,<br />

well-publicized instructi<strong>on</strong> set descripti<strong>on</strong>s available for x86,<br />

we decided to build our descripti<strong>on</strong>s <strong>and</strong> generators around<br />

those. For an x86 ISA, the symbolic c<strong>on</strong>stant values of the<br />

following descripti<strong>on</strong> object types are verbatim from opcode<br />

tables found in x86 reference manuals (e.g., [12]):<br />

AddressingMethodCode: We allow M to be used in lieu<br />

of the oper<strong>and</strong> code Mv to faithfully mirror published<br />

opcode tables in our instructi<strong>on</strong> descripti<strong>on</strong>s.<br />

Oper<strong>and</strong>TypeCode: e.g. b, d, v, z. Specifies a<br />

mnem<strong>on</strong>ic suffix for the external syntax.<br />

Oper<strong>and</strong>Code: the c<strong>on</strong>catenati<strong>on</strong> of an addressing mode<br />

code with an oper<strong>and</strong> type code, e.g. Eb, Gv, Iz, specifies<br />

explicit oper<strong>and</strong>s, resulting in assembler method<br />

parameters.<br />

RegisterOper<strong>and</strong>Code: e.g. eAX, rDX.<br />

GeneralRegister: e.g. BL, AX, ECX, R10.<br />

SegmentRegister: e.g. ES, DS, GS.<br />

StackRegister: e.g. ST, ST 1, ST 2.<br />

The latter three result in implicit oper<strong>and</strong>s, i.e. the generated<br />

assembler methods do not represent them by parameters.<br />

Instead we append an underscore <strong>and</strong> the respective<br />

oper<strong>and</strong> to the method name. For example, the external assembly<br />

instructi<strong>on</strong> add EAX, 10 becomes add EAX(10) when<br />

using the generated assembler. We also generate the variant<br />

with an explicit parameter that can be used as add(EAX,<br />

10), but that is a different instructi<strong>on</strong>, which is <strong>on</strong>e byte<br />

l<strong>on</strong>ger in the resulting binary form. External textual assemblers<br />

typically do not provide any way to express such<br />

choices.<br />

In additi<strong>on</strong>, these object types are used to describe x86<br />

instructi<strong>on</strong>s:<br />

HexByte: an enum providing hexadecimal unsigned byte<br />

values, used to specify an opcode. Every x86 instructi<strong>on</strong><br />

has either <strong>on</strong>e or two of these. In case of two, the<br />

first opcode must be 0F.<br />

ModRMGroup: specifies a table in which alternative additi<strong>on</strong>al<br />

sets of instructi<strong>on</strong> descripti<strong>on</strong> objects are located,<br />

indexed by the respective 3-bit opcode field in<br />

the mod r/m byte of each generated instructi<strong>on</strong>.<br />

ModCase: a 2-bit value to which the mod field of the mod<br />

r/m byte is then c<strong>on</strong>strained.<br />

FloatingPointOper<strong>and</strong>Code: a floating point oper<strong>and</strong><br />

not further described here.<br />

Integer: an implicit byte oper<strong>and</strong> to be appended to the<br />

instructi<strong>on</strong>, typically 1.<br />

Oper<strong>and</strong>C<strong>on</strong>straint: same as for RISC above, but much<br />

more rarely used, since almost all integral x86 oper<strong>and</strong><br />

value ranges coincide with Java primitive types.<br />

Given these features, we can almost trivially transcribe the<br />

“One Byte Opcode Map” for IA32:<br />

d e f i n e ( 00 , ”ADD” , Eb , Gb ) ;<br />

d e f i n e ( 01 , ”ADD” , Ev , Gv ) ;<br />

. . .<br />

d e f i n e ( 15 , ”ADC” , eAX, Iv ) ;<br />

d e f i n e ( 16 , ”PUSH” , SS ) ;<br />

. . .<br />

d e f i n e ( 80 , GROUP 1, b ,<br />

Eb . excludeExternalTestArgs (AL) , Ib ) ;<br />

. . .<br />

d e f i n e ( CA , ”RETF” ,<br />

Iw ) . b e N o t E x t e r n a l l y T e s t a b l e ( ) ;<br />

// gas does not support segments<br />

. . .<br />

d e f i n e ( 6B , ”IMUL” , Gv, Ev ,<br />

Ib . externalRange ( 0 , 0 x7f ) ) ;<br />

. . .<br />

Many descripti<strong>on</strong> objects <strong>and</strong> the respective result value of<br />

define have modificati<strong>on</strong> methods that c<strong>on</strong>vey special informati<strong>on</strong><br />

to the generator <strong>and</strong> the tester. In the example<br />

above we see the exclusi<strong>on</strong> of a register from testing, the exclusi<strong>on</strong><br />

of an entire instructi<strong>on</strong> from testing <strong>and</strong> the restricti<strong>on</strong><br />

of an integer test argument to a certain value range.<br />

These features suppress already known testing errors that<br />

are merely due to restricti<strong>on</strong>s, limited capabilities, or bugs<br />

in a given external assembler.<br />

Analogous methods to the above are available for RISC<br />

instructi<strong>on</strong> descripti<strong>on</strong>s. For x86, however, there are additi<strong>on</strong>al<br />

methods that modify generator behavior to match<br />

details of the ISA specificati<strong>on</strong> which are not explicit in the<br />

opcode table. This occurs for example in the “Two Byte<br />

Opcode Table” for AMD64:<br />

d e f i n e ( 0F , 80 , ”JO” ,<br />

Jz ) . s e t D e f a u l t O p e r a n d S i z e ( BITS 64 ) ;<br />

. . .<br />

d e f i n e ( 0F , C7 ,<br />

GROUP 9a ) . r e q u i r e A d d r e s s S i z e ( BITS 32 ) ;<br />

8

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!