11.01.2015 Views

How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...

How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...

How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<str<strong>on</strong>g>How</str<strong>on</strong>g> <str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>Benchmark</str<strong>on</strong>g> <str<strong>on</strong>g>Code</str<strong>on</strong>g> <str<strong>on</strong>g>Executi<strong>on</strong></str<strong>on</strong>g> <str<strong>on</strong>g>Times</str<strong>on</strong>g> <strong>on</strong> <strong>Intel</strong> ® <strong>IA</strong>-<strong>32</strong><br />

<strong>and</strong> <strong>IA</strong>-<strong>64</strong> Instructi<strong>on</strong> Set Architectures<br />

loop_size:1 >>>> variance(cycles): 85; max_deviati<strong>on</strong>: 56 ;min time: 456<br />

loop_size:2 >>>> variance(cycles): 85; max_deviati<strong>on</strong>: 96 ;min time: 456<br />

…………………………………<br />

…………………………………<br />

loop_size:997 >>>> variance(cycles): 85; max_deviati<strong>on</strong>: 92 ;min time: 456<br />

loop_size:998 >>>> variance(cycles): 85; max_deviati<strong>on</strong>: 100 ;min time: 452<br />

loop_size:999 >>>> variance(cycles): 85; max_deviati<strong>on</strong>: 60 ;min time: 452<br />

<str<strong>on</strong>g>to</str<strong>on</strong>g>tal number of spurious min values = 262<br />

<str<strong>on</strong>g>to</str<strong>on</strong>g>tal variance = 48<br />

absolute max deviati<strong>on</strong> = 636<br />

variance of variances = 2306<br />

variance of minimum values = 118<br />

The “loop_size” index refers <str<strong>on</strong>g>to</str<strong>on</strong>g> the external loop (ln33); accordingly each row of the log<br />

shows, for a certain ensemble of measures, the variance, the maximum deviati<strong>on</strong> <strong>and</strong> the<br />

minimum measured time (all of them in clock cycles).<br />

At the end of the log there are: the number of “spurious” minimum values (that in<br />

this case is meaningless <strong>and</strong> can be neglected): the <str<strong>on</strong>g>to</str<strong>on</strong>g>tal variance (the average of<br />

the variances in each row); the absolute maximum deviati<strong>on</strong> (the maximum value<br />

am<strong>on</strong>gst the max deviati<strong>on</strong>s of all the rows); the variance of variances <strong>and</strong> the<br />

variance of the minimum values.<br />

Looking at the results, it is clear that this method is not reliable for benchmarking.<br />

There are different reas<strong>on</strong>s for this:<br />

• The minimum value is not c<strong>on</strong>stant between different ensembles (the variance<br />

of the minimum values is 118 cycles). This means that we cannot evaluate the<br />

cost of calling the benchmarking functi<strong>on</strong> itself. When we are benchmarking a<br />

functi<strong>on</strong> we want <str<strong>on</strong>g>to</str<strong>on</strong>g> be able <str<strong>on</strong>g>to</str<strong>on</strong>g> subtract the cost of calling the benchmarking<br />

functi<strong>on</strong> itself from the measurement of the functi<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> be benchmarked. This<br />

cost is the minimum possible number of cycles that it takes <str<strong>on</strong>g>to</str<strong>on</strong>g> call the<br />

benchmarking functi<strong>on</strong> (i.e., the min times in the rows of the kernel log<br />

above). Basically, in this case, each statistic is performed over 100,000<br />

samples. The fact that over 100,000 samples an absolute minimum cannot be<br />

determined means that we cannot calculate the cost <str<strong>on</strong>g>to</str<strong>on</strong>g> be subtracted when<br />

benchmarking any functi<strong>on</strong>. A soluti<strong>on</strong> could be <str<strong>on</strong>g>to</str<strong>on</strong>g> increase the number of<br />

samples till we always get the same minimum value, but this is practically <str<strong>on</strong>g>to</str<strong>on</strong>g>o<br />

costly since the developer would have <str<strong>on</strong>g>to</str<strong>on</strong>g> wait quite a lot for orders of<br />

magnitude greater than 10^5 samples.<br />

13

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!