How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...
How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...
How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<str<strong>on</strong>g>How</str<strong>on</strong>g> <str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>Benchmark</str<strong>on</strong>g> <str<strong>on</strong>g>Code</str<strong>on</strong>g> <str<strong>on</strong>g>Executi<strong>on</strong></str<strong>on</strong>g> <str<strong>on</strong>g>Times</str<strong>on</strong>g> <strong>on</strong> <strong>Intel</strong> ® <strong>IA</strong>-<strong>32</strong><br />
<strong>and</strong> <strong>IA</strong>-<strong>64</strong> Instructi<strong>on</strong> Set Architectures<br />
Figure 4. Variance Behavior Graph 4<br />
graph4<br />
clock cycles<br />
24<br />
22<br />
20<br />
18<br />
16<br />
14<br />
12<br />
10<br />
8<br />
6<br />
4<br />
2<br />
0<br />
1 51 101 151 201 251 301 351 401 451 501 551 601 651 701 751 801 851 901 951<br />
ensembles<br />
Variance<br />
In Figure 3 we can see that the minimum value is perfectly c<strong>on</strong>stant between<br />
ensembles; in Figure 4 the variance is either equal <str<strong>on</strong>g>to</str<strong>on</strong>g> 2 or 3 clock cycles.<br />
3.2.3 An Alternative Method for Architecture Not Supporting<br />
RDTSCP<br />
This secti<strong>on</strong> presents an alternative method <str<strong>on</strong>g>to</str<strong>on</strong>g> benchmark code executi<strong>on</strong> cycles<br />
for architectures that do not support the RDTSCP instructi<strong>on</strong>. Such a method is not<br />
as good as the <strong>on</strong>e presented in Secti<strong>on</strong> 3.2.1, but it is still much better than the<br />
<strong>on</strong>e using CPUID <str<strong>on</strong>g>to</str<strong>on</strong>g> serialize code executi<strong>on</strong>. In this method between the two<br />
timestamp register reads we serialize the code executi<strong>on</strong> by writing the c<strong>on</strong>trol<br />
register CR0.<br />
Regarding the code in the Appendix, the developer should replace ln19 <str<strong>on</strong>g>to</str<strong>on</strong>g> ln54 with<br />
the following:<br />
asm volatile( "CPUID\n\t"<br />
"RDTSC\n\t"<br />
"mov %%edx, %0\n\t"<br />
"mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low)::<br />
"%rax", "%rbx", "%rcx", "%rdx");<br />
asm volatile( "mov %%cr0, %%rax\n\t"<br />
"mov %%rax, %%cr0\n\t"<br />
"RDTSC\n\t"<br />
"mov %%edx, %0\n\t"<br />
"mov %%eax, %1\n\t": "=r" (cycles_high1), "=r" (cycles_low1)::<br />
"%rax", "%rdx");<br />
20