15.01.2015 Views

4th International Conference on Principles and Practices ... - MADOC

4th International Conference on Principles and Practices ... - MADOC

4th International Conference on Principles and Practices ... - MADOC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

opcode mask: the combined (not necessarily c<strong>on</strong>tiguous)<br />

bit range of all c<strong>on</strong>stant fields in an instructi<strong>on</strong>,<br />

opcode: a binding of bit values to a given opcode mask,<br />

opcode mask group: a collecti<strong>on</strong> of templates that share<br />

the same opcode mask,<br />

specificity: the number of bits in an opcode mask,<br />

specificity group: a collecti<strong>on</strong> of opcode mask groups<br />

with the same specificity (but different opcode bit positi<strong>on</strong>s).<br />

The disassembler keeps all instructi<strong>on</strong> templates sorted by<br />

specificity group. To find the template with the most specific<br />

opcode mask it will iterate over all specificity groups in order<br />

of decreasing specificity. Opti<strong>on</strong>ally, it can do the opposite.<br />

During the iterati<strong>on</strong>, the disassembler tries the following<br />

with each opcode mask group in the given specificity group.<br />

A logical <strong>and</strong> of the opcode mask group’s opcode mask with<br />

the corresp<strong>on</strong>ding bits in the instructi<strong>on</strong> yields a hypothetical<br />

opcode. Next, every template in the opcode mask group<br />

that has this opcode is regarded as a c<strong>and</strong>idate. For each<br />

such c<strong>and</strong>idate, the disassembler tries to disassemble the instructi<strong>on</strong>’s<br />

encoded arguments.<br />

If this succeeds, we reassemble an instructi<strong>on</strong> from the<br />

c<strong>and</strong>idate template with these arguments. This simple<br />

trick assures that we <strong>on</strong>ly report decodings that match all<br />

oper<strong>and</strong> c<strong>on</strong>straints. Finally, if the resulting bits are the<br />

same as the <strong>on</strong>es we have originally read, we have a result.<br />

5.4.2 x86 Disassemblers<br />

An AMD64 <strong>and</strong> IA32 assembler must determine the instructi<strong>on</strong><br />

length <strong>on</strong> the fly, sometimes backtracking a little<br />

bit. An instructi<strong>on</strong> is first scanned byte by byte, gathering<br />

potential prefixes, the first opcode, <strong>and</strong>, if present, the sec<strong>on</strong>d<br />

opcode. The disassembler can then determine a group<br />

of templates that matches the given opcode combinati<strong>on</strong>,<br />

ignoring any prefixes at the moment.<br />

The disassembler iterates over all templates in the same<br />

opcode group, extracts oper<strong>and</strong> values <strong>and</strong> reassembles as<br />

described above in case of RISC. In short, some effort is<br />

made to exclude certain predictably unnecessary decoding<br />

attempts, but overall, the x86 disassembler algorithm uses<br />

an even more brute force approach than the RISC disassemblers.<br />

5.5 Fully Automated Self-Testing<br />

The same template generators used to create assemblers<br />

<strong>and</strong> disassemblers are reused <strong>on</strong>ce again for fully automated<br />

testing of these artifacts. The respective test generator creates<br />

an exhaustive set of test cases by iterating over a cross<br />

product of legal values for each parameter of an assembler<br />

method. For symbolic parameters, the legal values amount<br />

to the set of all predefined c<strong>on</strong>stants of the given symbol<br />

type. For integral parameters, if the number of legal values<br />

is greater than some plausible threshold (currently 32), <strong>on</strong>ly<br />

a selecti<strong>on</strong> of values representing all important boundary<br />

cases is used.<br />

The above represent positive test cases, i.e., they are expected<br />

to result in valid instructi<strong>on</strong> encodings. In additi<strong>on</strong>,<br />

the testing framework generates negative test cases, i.e., test<br />

cases that should cause an assembler to display error behavior<br />

(e.g., return an error code or throw an excepti<strong>on</strong>). There<br />

Template<br />

Generator<br />

Template<br />

External Syntax<br />

Writer<br />

asmTest.s<br />

External<br />

Assembler<br />

asmTest.o<br />

Reader<br />

Test Oper<strong>and</strong><br />

Generator<br />

Oper<strong>and</strong>s<br />

Assembler<br />

bits<br />

==<br />

bits<br />

Figure 4: Testing<br />

==<br />

==<br />

Oper<strong>and</strong>s<br />

Disassembler<br />

Template<br />

will be far fewer negative test cases than positive test cases<br />

as our use of Java’s static typing leaves very few opportunities<br />

for specifying illegal arguments. By far most negative<br />

test cases in the ISAs implemented so far are due to RISC<br />

integral fields whose ranges of legal values are not exactly<br />

matched by a Java primitive type (e.g., int, short, char, etc).<br />

For complete testing of an assembler <strong>and</strong> its corresp<strong>on</strong>ding<br />

disassembler, the template generator creates all templates<br />

for an ISA <strong>and</strong> presents them <strong>on</strong>e by <strong>on</strong>e to the following<br />

testing procedure. 9<br />

First, an assembler instance is created <strong>and</strong> the assembler’s<br />

Java method that corresp<strong>on</strong>ds to the given template is identified<br />

by means of Java reflecti<strong>on</strong>. Then the test generator<br />

creates all test case oper<strong>and</strong> sets for the given template.<br />

Figure 4 illustrates the further steps taken for each<br />

oper<strong>and</strong> set. The assembler method is invoked with the<br />

oper<strong>and</strong> set as arguments <strong>and</strong> the identical oper<strong>and</strong> set is<br />

also passed together with the template to the external syntax<br />

writer, which creates a corresp<strong>on</strong>ding textual assembler<br />

code line <strong>and</strong> writes it into an assembler source code file.<br />

The latter is thereup<strong>on</strong> translated by an external assembler<br />

program (e.g., gas [6]), producing an object file. By noticing<br />

certain prearranged markers in the object file, a reader<br />

utility is able to extract those bits from the object file that<br />

corresp<strong>on</strong>d to output of the external syntax writer into another<br />

byte array.<br />

Now the two byte arrays are compared. Next, <strong>on</strong>e of the<br />

byte arrays is passed to the disassembler, which reuses the<br />

same set of templates from above <strong>and</strong> determines, which<br />

template exactly matches this binary representati<strong>on</strong>. Furthermore,<br />

the disassembler extracts oper<strong>and</strong> values.<br />

Eventually, the template <strong>and</strong> oper<strong>and</strong> values determined<br />

by the disassembler are compared to the original template<br />

<strong>and</strong> oper<strong>and</strong> values.<br />

Probing all variants of even a single mnem<strong>on</strong>ic may take<br />

minutes. Once a test failure has been detected, we can ar-<br />

9 The descripti<strong>on</strong> is slightly simplified: the actual program<br />

does not write a new assembler source file per instructi<strong>on</strong>,<br />

but accumulates those for the same template.<br />

10

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!