4th International Conference on Principles and Practices ... - MADOC
4th International Conference on Principles and Practices ... - MADOC
4th International Conference on Principles and Practices ... - MADOC
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
opcode mask: the combined (not necessarily c<strong>on</strong>tiguous)<br />
bit range of all c<strong>on</strong>stant fields in an instructi<strong>on</strong>,<br />
opcode: a binding of bit values to a given opcode mask,<br />
opcode mask group: a collecti<strong>on</strong> of templates that share<br />
the same opcode mask,<br />
specificity: the number of bits in an opcode mask,<br />
specificity group: a collecti<strong>on</strong> of opcode mask groups<br />
with the same specificity (but different opcode bit positi<strong>on</strong>s).<br />
The disassembler keeps all instructi<strong>on</strong> templates sorted by<br />
specificity group. To find the template with the most specific<br />
opcode mask it will iterate over all specificity groups in order<br />
of decreasing specificity. Opti<strong>on</strong>ally, it can do the opposite.<br />
During the iterati<strong>on</strong>, the disassembler tries the following<br />
with each opcode mask group in the given specificity group.<br />
A logical <strong>and</strong> of the opcode mask group’s opcode mask with<br />
the corresp<strong>on</strong>ding bits in the instructi<strong>on</strong> yields a hypothetical<br />
opcode. Next, every template in the opcode mask group<br />
that has this opcode is regarded as a c<strong>and</strong>idate. For each<br />
such c<strong>and</strong>idate, the disassembler tries to disassemble the instructi<strong>on</strong>’s<br />
encoded arguments.<br />
If this succeeds, we reassemble an instructi<strong>on</strong> from the<br />
c<strong>and</strong>idate template with these arguments. This simple<br />
trick assures that we <strong>on</strong>ly report decodings that match all<br />
oper<strong>and</strong> c<strong>on</strong>straints. Finally, if the resulting bits are the<br />
same as the <strong>on</strong>es we have originally read, we have a result.<br />
5.4.2 x86 Disassemblers<br />
An AMD64 <strong>and</strong> IA32 assembler must determine the instructi<strong>on</strong><br />
length <strong>on</strong> the fly, sometimes backtracking a little<br />
bit. An instructi<strong>on</strong> is first scanned byte by byte, gathering<br />
potential prefixes, the first opcode, <strong>and</strong>, if present, the sec<strong>on</strong>d<br />
opcode. The disassembler can then determine a group<br />
of templates that matches the given opcode combinati<strong>on</strong>,<br />
ignoring any prefixes at the moment.<br />
The disassembler iterates over all templates in the same<br />
opcode group, extracts oper<strong>and</strong> values <strong>and</strong> reassembles as<br />
described above in case of RISC. In short, some effort is<br />
made to exclude certain predictably unnecessary decoding<br />
attempts, but overall, the x86 disassembler algorithm uses<br />
an even more brute force approach than the RISC disassemblers.<br />
5.5 Fully Automated Self-Testing<br />
The same template generators used to create assemblers<br />
<strong>and</strong> disassemblers are reused <strong>on</strong>ce again for fully automated<br />
testing of these artifacts. The respective test generator creates<br />
an exhaustive set of test cases by iterating over a cross<br />
product of legal values for each parameter of an assembler<br />
method. For symbolic parameters, the legal values amount<br />
to the set of all predefined c<strong>on</strong>stants of the given symbol<br />
type. For integral parameters, if the number of legal values<br />
is greater than some plausible threshold (currently 32), <strong>on</strong>ly<br />
a selecti<strong>on</strong> of values representing all important boundary<br />
cases is used.<br />
The above represent positive test cases, i.e., they are expected<br />
to result in valid instructi<strong>on</strong> encodings. In additi<strong>on</strong>,<br />
the testing framework generates negative test cases, i.e., test<br />
cases that should cause an assembler to display error behavior<br />
(e.g., return an error code or throw an excepti<strong>on</strong>). There<br />
Template<br />
Generator<br />
Template<br />
External Syntax<br />
Writer<br />
asmTest.s<br />
External<br />
Assembler<br />
asmTest.o<br />
Reader<br />
Test Oper<strong>and</strong><br />
Generator<br />
Oper<strong>and</strong>s<br />
Assembler<br />
bits<br />
==<br />
bits<br />
Figure 4: Testing<br />
==<br />
==<br />
Oper<strong>and</strong>s<br />
Disassembler<br />
Template<br />
will be far fewer negative test cases than positive test cases<br />
as our use of Java’s static typing leaves very few opportunities<br />
for specifying illegal arguments. By far most negative<br />
test cases in the ISAs implemented so far are due to RISC<br />
integral fields whose ranges of legal values are not exactly<br />
matched by a Java primitive type (e.g., int, short, char, etc).<br />
For complete testing of an assembler <strong>and</strong> its corresp<strong>on</strong>ding<br />
disassembler, the template generator creates all templates<br />
for an ISA <strong>and</strong> presents them <strong>on</strong>e by <strong>on</strong>e to the following<br />
testing procedure. 9<br />
First, an assembler instance is created <strong>and</strong> the assembler’s<br />
Java method that corresp<strong>on</strong>ds to the given template is identified<br />
by means of Java reflecti<strong>on</strong>. Then the test generator<br />
creates all test case oper<strong>and</strong> sets for the given template.<br />
Figure 4 illustrates the further steps taken for each<br />
oper<strong>and</strong> set. The assembler method is invoked with the<br />
oper<strong>and</strong> set as arguments <strong>and</strong> the identical oper<strong>and</strong> set is<br />
also passed together with the template to the external syntax<br />
writer, which creates a corresp<strong>on</strong>ding textual assembler<br />
code line <strong>and</strong> writes it into an assembler source code file.<br />
The latter is thereup<strong>on</strong> translated by an external assembler<br />
program (e.g., gas [6]), producing an object file. By noticing<br />
certain prearranged markers in the object file, a reader<br />
utility is able to extract those bits from the object file that<br />
corresp<strong>on</strong>d to output of the external syntax writer into another<br />
byte array.<br />
Now the two byte arrays are compared. Next, <strong>on</strong>e of the<br />
byte arrays is passed to the disassembler, which reuses the<br />
same set of templates from above <strong>and</strong> determines, which<br />
template exactly matches this binary representati<strong>on</strong>. Furthermore,<br />
the disassembler extracts oper<strong>and</strong> values.<br />
Eventually, the template <strong>and</strong> oper<strong>and</strong> values determined<br />
by the disassembler are compared to the original template<br />
<strong>and</strong> oper<strong>and</strong> values.<br />
Probing all variants of even a single mnem<strong>on</strong>ic may take<br />
minutes. Once a test failure has been detected, we can ar-<br />
9 The descripti<strong>on</strong> is slightly simplified: the actual program<br />
does not write a new assembler source file per instructi<strong>on</strong>,<br />
but accumulates those for the same template.<br />
10