11.01.2015 Views

How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...

How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...

How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<str<strong>on</strong>g>How</str<strong>on</strong>g> <str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>Benchmark</str<strong>on</strong>g> <str<strong>on</strong>g>Code</str<strong>on</strong>g> <str<strong>on</strong>g>Executi<strong>on</strong></str<strong>on</strong>g> <str<strong>on</strong>g>Times</str<strong>on</strong>g> <strong>on</strong> <strong>Intel</strong> ® <strong>IA</strong>-<strong>32</strong><br />

<strong>and</strong> <strong>IA</strong>-<strong>64</strong> Instructi<strong>on</strong> Set Architectures<br />

Figure 2. Variance Behavior Graph 2<br />

graph2<br />

250<br />

200<br />

clock cycles<br />

150<br />

100<br />

Series1<br />

50<br />

0<br />

1 51 101 151 201 251 301 351 401 451 501 551 601 651 701 751 801 851 901 951<br />

ensembles<br />

3.2 Improvements Using RDTSCP Instructi<strong>on</strong><br />

The RDTSCP instructi<strong>on</strong> is described in the <strong>Intel</strong> ® <strong>64</strong> <strong>and</strong> <strong>IA</strong>-<strong>32</strong> Architectures<br />

Software Developer’s Manual Volume 2B ([3]) as an assembly instructi<strong>on</strong> that, at<br />

the same time, reads the timestamp register <strong>and</strong> the CPU identifier. The value of<br />

the timestamp register is s<str<strong>on</strong>g>to</str<strong>on</strong>g>red in<str<strong>on</strong>g>to</str<strong>on</strong>g> the EDX <strong>and</strong> EAX registers; the value of the<br />

CPU id is s<str<strong>on</strong>g>to</str<strong>on</strong>g>red in<str<strong>on</strong>g>to</str<strong>on</strong>g> the ECX register (“On processors that support the <strong>Intel</strong> <strong>64</strong><br />

architecture, the high order <strong>32</strong> bits of each of RAX, RDX, <strong>and</strong> RCX are cleared”).<br />

What is interesting in this case is the “pseudo” serializing property of RDTSCP. The<br />

manual states:<br />

“The RDTSCP instructi<strong>on</strong> waits until all previous instructi<strong>on</strong>s have been executed<br />

before reading the counter. <str<strong>on</strong>g>How</str<strong>on</strong>g>ever, subsequent instructi<strong>on</strong>s may begin executi<strong>on</strong><br />

before the read operati<strong>on</strong> is performed.”<br />

This means that this instructi<strong>on</strong> guarantees that everything that is above its call in<br />

the source code is executed before the instructi<strong>on</strong> itself is called. It cannot,<br />

however, guarantee that for optimizati<strong>on</strong> purposes the CPU will not execute,<br />

before the RDTSCP call, instructi<strong>on</strong>s that, in the source code, are placed after the<br />

RDTSCP functi<strong>on</strong> call itself. If this happens, a c<strong>on</strong>taminati<strong>on</strong> caused by instructi<strong>on</strong>s<br />

in the source code that come after the RDTSCP will occur in the code under<br />

measurement. .<br />

15

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!