11.01.2015 Views

How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...

How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...

How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<str<strong>on</strong>g>How</str<strong>on</strong>g> <str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>Benchmark</str<strong>on</strong>g> <str<strong>on</strong>g>Code</str<strong>on</strong>g> <str<strong>on</strong>g>Executi<strong>on</strong></str<strong>on</strong>g> <str<strong>on</strong>g>Times</str<strong>on</strong>g> <strong>on</strong> <strong>Intel</strong> ® <strong>IA</strong>-<strong>32</strong><br />

<strong>and</strong> <strong>IA</strong>-<strong>64</strong> Instructi<strong>on</strong> Set Architectures<br />

2 Problem Descripti<strong>on</strong><br />

This secti<strong>on</strong> explains the issues involved in reading the timestamp register <strong>and</strong><br />

discusses the correct methodology <str<strong>on</strong>g>to</str<strong>on</strong>g> return precise <strong>and</strong> reliable clock cycles<br />

measurements. It is expected that readers have knowledge of basic GCC <strong>and</strong> ICC<br />

compiling techniques, basic <strong>Intel</strong> assembly syntax, <strong>and</strong> AT&T* assembly syntax.<br />

Those not interested in the problem descripti<strong>on</strong> <strong>and</strong> method justificati<strong>on</strong> can skip<br />

this secti<strong>on</strong> <strong>and</strong> go <str<strong>on</strong>g>to</str<strong>on</strong>g> Secti<strong>on</strong> 3.2.1 (if their platform supports the RDTSCP<br />

instructi<strong>on</strong>) or Secti<strong>on</strong> 3.2.3 (if not) <str<strong>on</strong>g>to</str<strong>on</strong>g> acquire the code.<br />

2.1 Introducti<strong>on</strong><br />

<strong>Intel</strong> CPUs have a timestamp counter <str<strong>on</strong>g>to</str<strong>on</strong>g> keep track of every cycle that occurs <strong>on</strong><br />

the CPU. Starting with the <strong>Intel</strong> Pentium ® processor, the devices have included a<br />

per-core timestamp register that s<str<strong>on</strong>g>to</str<strong>on</strong>g>res the value of the timestamp counter <strong>and</strong><br />

that can be accessed by the RDTSC <strong>and</strong> RDTSCP assembly instructi<strong>on</strong>s.<br />

When running a Linux OS, the developer can check if his CPU supports the RDTSCP<br />

instructi<strong>on</strong> by looking at the flags field of “/proc/cpuinfo”; if rdtscp is <strong>on</strong>e of the<br />

flags, then it is supported.<br />

2.2 Problems with RDTSC Instructi<strong>on</strong> in C Inline<br />

Assembly<br />

Assume that you are working in a Linux envir<strong>on</strong>ment, <strong>and</strong> are compiling by using<br />

GCC. You have C code <strong>and</strong> want <str<strong>on</strong>g>to</str<strong>on</strong>g> know how many clock cycles are spent <str<strong>on</strong>g>to</str<strong>on</strong>g><br />

execute the code itself or just a part of it. To make sure that our measurements<br />

are not tainted by any kind of interrupt (including scheduling preempti<strong>on</strong>), we are<br />

going <str<strong>on</strong>g>to</str<strong>on</strong>g> write a kernel module where we guarantee the exclusive ownership of the<br />

CPU when executing the code that we want <str<strong>on</strong>g>to</str<strong>on</strong>g> benchmark.<br />

To underst<strong>and</strong> the practical implementati<strong>on</strong>, let’s c<strong>on</strong>sider the following dummy<br />

kernel module; it simply calls a functi<strong>on</strong> that is taking a pointer as input <strong>and</strong> is<br />

setting the pointed value <str<strong>on</strong>g>to</str<strong>on</strong>g> “1”. We want <str<strong>on</strong>g>to</str<strong>on</strong>g> measure how many clock cycles it<br />

takes <str<strong>on</strong>g>to</str<strong>on</strong>g> call such a functi<strong>on</strong>:<br />

#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

void inline measured_functi<strong>on</strong>(volatile int *var)<br />

{<br />

(*var) = 1;<br />

7

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!