11.01.2015 Views

How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...

How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...

How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<str<strong>on</strong>g>How</str<strong>on</strong>g> <str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>Benchmark</str<strong>on</strong>g> <str<strong>on</strong>g>Code</str<strong>on</strong>g> <str<strong>on</strong>g>Executi<strong>on</strong></str<strong>on</strong>g> <str<strong>on</strong>g>Times</str<strong>on</strong>g> <strong>on</strong> <strong>Intel</strong> ® <strong>IA</strong>-<strong>32</strong><br />

<strong>and</strong> <strong>IA</strong>-<strong>64</strong> Instructi<strong>on</strong> Set Architectures<br />

The RDTSCP instructi<strong>on</strong> reads the timestamp register for the sec<strong>on</strong>d time <strong>and</strong><br />

guarantees that the executi<strong>on</strong> of all the code we wanted <str<strong>on</strong>g>to</str<strong>on</strong>g> measure is completed.<br />

The two “mov” instructi<strong>on</strong>s coming afterwards s<str<strong>on</strong>g>to</str<strong>on</strong>g>re the edx <strong>and</strong> eax registers<br />

values in<str<strong>on</strong>g>to</str<strong>on</strong>g> memory. Both instructi<strong>on</strong>s are guaranteed <str<strong>on</strong>g>to</str<strong>on</strong>g> be executed after RDTSC<br />

(i.e., they d<strong>on</strong>’t corrupt the measure) since there is a logical dependency between<br />

RDTSCP <strong>and</strong> the register edx <strong>and</strong> eax (RDTSCP is writing those registers <strong>and</strong> the<br />

CPU is obliged <str<strong>on</strong>g>to</str<strong>on</strong>g> wait for RDTSCP <str<strong>on</strong>g>to</str<strong>on</strong>g> be finished before executing the two “mov”).<br />

Finally a CPUID call guarantees that a barrier is implemented again so that it is<br />

impossible that any instructi<strong>on</strong> coming afterwards is executed before CPUID itself<br />

(<strong>and</strong> logically also before RDTSCP).<br />

With this method we avoid <str<strong>on</strong>g>to</str<strong>on</strong>g> call a CPUID instructi<strong>on</strong> in between the reads of the<br />

real-time registers (avoiding all the problems described in Secti<strong>on</strong> 3.1).<br />

3.2.2 Evaluati<strong>on</strong> of the Improved <str<strong>on</strong>g>Benchmark</str<strong>on</strong>g>ing Method<br />

In reference <str<strong>on</strong>g>to</str<strong>on</strong>g> the code presented in Secti<strong>on</strong> 3.1, we replace the previous<br />

benchmarking method with the new <strong>on</strong>e, i.e., we replace ln19 <str<strong>on</strong>g>to</str<strong>on</strong>g> ln54 in the<br />

Appendix with the following code:<br />

asm volatile ("CPUID\n\t"<br />

"RDTSC\n\t"<br />

"mov %%edx, %0\n\t"<br />

"mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low)::<br />

"%rax", "%rbx", "%rcx", "%rdx");<br />

asm volatile("RDTSCP\n\t"<br />

"mov %%edx, %0\n\t"<br />

"mov %%eax, %1\n\t"<br />

"CPUID\n\t": "=r" (cycles_high1), "=r" (cycles_low1):: "%rax",<br />

"%rbx", "%rcx", "%rdx");<br />

asm volatile ("CPUID\n\t"<br />

"RDTSC\n\t"<br />

"mov %%edx, %0\n\t"<br />

"mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low)::<br />

"%rax", "%rbx", "%rcx", "%rdx");<br />

asm volatile("RDTSCP\n\t"<br />

"mov %%edx, %0\n\t"<br />

"mov %%eax, %1\n\t"<br />

"CPUID\n\t": "=r" (cycles_high1), "=r" (cycles_low1):: "%rax",<br />

"%rbx", "%rcx", "%rdx");<br />

for (j=0; j

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!