How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...
How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...
How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<str<strong>on</strong>g>How</str<strong>on</strong>g> <str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>Benchmark</str<strong>on</strong>g> <str<strong>on</strong>g>Code</str<strong>on</strong>g> <str<strong>on</strong>g>Executi<strong>on</strong></str<strong>on</strong>g> <str<strong>on</strong>g>Times</str<strong>on</strong>g> <strong>on</strong> <strong>Intel</strong> ® <strong>IA</strong>-<strong>32</strong><br />
<strong>and</strong> <strong>IA</strong>-<strong>64</strong> Instructi<strong>on</strong> Set Architectures<br />
The RDTSCP instructi<strong>on</strong> reads the timestamp register for the sec<strong>on</strong>d time <strong>and</strong><br />
guarantees that the executi<strong>on</strong> of all the code we wanted <str<strong>on</strong>g>to</str<strong>on</strong>g> measure is completed.<br />
The two “mov” instructi<strong>on</strong>s coming afterwards s<str<strong>on</strong>g>to</str<strong>on</strong>g>re the edx <strong>and</strong> eax registers<br />
values in<str<strong>on</strong>g>to</str<strong>on</strong>g> memory. Both instructi<strong>on</strong>s are guaranteed <str<strong>on</strong>g>to</str<strong>on</strong>g> be executed after RDTSC<br />
(i.e., they d<strong>on</strong>’t corrupt the measure) since there is a logical dependency between<br />
RDTSCP <strong>and</strong> the register edx <strong>and</strong> eax (RDTSCP is writing those registers <strong>and</strong> the<br />
CPU is obliged <str<strong>on</strong>g>to</str<strong>on</strong>g> wait for RDTSCP <str<strong>on</strong>g>to</str<strong>on</strong>g> be finished before executing the two “mov”).<br />
Finally a CPUID call guarantees that a barrier is implemented again so that it is<br />
impossible that any instructi<strong>on</strong> coming afterwards is executed before CPUID itself<br />
(<strong>and</strong> logically also before RDTSCP).<br />
With this method we avoid <str<strong>on</strong>g>to</str<strong>on</strong>g> call a CPUID instructi<strong>on</strong> in between the reads of the<br />
real-time registers (avoiding all the problems described in Secti<strong>on</strong> 3.1).<br />
3.2.2 Evaluati<strong>on</strong> of the Improved <str<strong>on</strong>g>Benchmark</str<strong>on</strong>g>ing Method<br />
In reference <str<strong>on</strong>g>to</str<strong>on</strong>g> the code presented in Secti<strong>on</strong> 3.1, we replace the previous<br />
benchmarking method with the new <strong>on</strong>e, i.e., we replace ln19 <str<strong>on</strong>g>to</str<strong>on</strong>g> ln54 in the<br />
Appendix with the following code:<br />
asm volatile ("CPUID\n\t"<br />
"RDTSC\n\t"<br />
"mov %%edx, %0\n\t"<br />
"mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low)::<br />
"%rax", "%rbx", "%rcx", "%rdx");<br />
asm volatile("RDTSCP\n\t"<br />
"mov %%edx, %0\n\t"<br />
"mov %%eax, %1\n\t"<br />
"CPUID\n\t": "=r" (cycles_high1), "=r" (cycles_low1):: "%rax",<br />
"%rbx", "%rcx", "%rdx");<br />
asm volatile ("CPUID\n\t"<br />
"RDTSC\n\t"<br />
"mov %%edx, %0\n\t"<br />
"mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low)::<br />
"%rax", "%rbx", "%rcx", "%rdx");<br />
asm volatile("RDTSCP\n\t"<br />
"mov %%edx, %0\n\t"<br />
"mov %%eax, %1\n\t"<br />
"CPUID\n\t": "=r" (cycles_high1), "=r" (cycles_low1):: "%rax",<br />
"%rbx", "%rcx", "%rdx");<br />
for (j=0; j