How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...
How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...
How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<str<strong>on</strong>g>How</str<strong>on</strong>g> <str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>Benchmark</str<strong>on</strong>g> <str<strong>on</strong>g>Code</str<strong>on</strong>g> <str<strong>on</strong>g>Executi<strong>on</strong></str<strong>on</strong>g> <str<strong>on</strong>g>Times</str<strong>on</strong>g> <strong>on</strong> <strong>Intel</strong> ® <strong>IA</strong>-<strong>32</strong><br />
<strong>and</strong> <strong>IA</strong>-<strong>64</strong> Instructi<strong>on</strong> Set Architectures<br />
Figure 2. Variance Behavior Graph 2<br />
graph2<br />
250<br />
200<br />
clock cycles<br />
150<br />
100<br />
Series1<br />
50<br />
0<br />
1 51 101 151 201 251 301 351 401 451 501 551 601 651 701 751 801 851 901 951<br />
ensembles<br />
3.2 Improvements Using RDTSCP Instructi<strong>on</strong><br />
The RDTSCP instructi<strong>on</strong> is described in the <strong>Intel</strong> ® <strong>64</strong> <strong>and</strong> <strong>IA</strong>-<strong>32</strong> Architectures<br />
Software Developer’s Manual Volume 2B ([3]) as an assembly instructi<strong>on</strong> that, at<br />
the same time, reads the timestamp register <strong>and</strong> the CPU identifier. The value of<br />
the timestamp register is s<str<strong>on</strong>g>to</str<strong>on</strong>g>red in<str<strong>on</strong>g>to</str<strong>on</strong>g> the EDX <strong>and</strong> EAX registers; the value of the<br />
CPU id is s<str<strong>on</strong>g>to</str<strong>on</strong>g>red in<str<strong>on</strong>g>to</str<strong>on</strong>g> the ECX register (“On processors that support the <strong>Intel</strong> <strong>64</strong><br />
architecture, the high order <strong>32</strong> bits of each of RAX, RDX, <strong>and</strong> RCX are cleared”).<br />
What is interesting in this case is the “pseudo” serializing property of RDTSCP. The<br />
manual states:<br />
“The RDTSCP instructi<strong>on</strong> waits until all previous instructi<strong>on</strong>s have been executed<br />
before reading the counter. <str<strong>on</strong>g>How</str<strong>on</strong>g>ever, subsequent instructi<strong>on</strong>s may begin executi<strong>on</strong><br />
before the read operati<strong>on</strong> is performed.”<br />
This means that this instructi<strong>on</strong> guarantees that everything that is above its call in<br />
the source code is executed before the instructi<strong>on</strong> itself is called. It cannot,<br />
however, guarantee that for optimizati<strong>on</strong> purposes the CPU will not execute,<br />
before the RDTSCP call, instructi<strong>on</strong>s that, in the source code, are placed after the<br />
RDTSCP functi<strong>on</strong> call itself. If this happens, a c<strong>on</strong>taminati<strong>on</strong> caused by instructi<strong>on</strong>s<br />
in the source code that come after the RDTSCP will occur in the code under<br />
measurement. .<br />
15