How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...
How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...
How to Benchmark Code Execution Times on Intel IA-32 and IA-64 ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<str<strong>on</strong>g>How</str<strong>on</strong>g> <str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>Benchmark</str<strong>on</strong>g> <str<strong>on</strong>g>Code</str<strong>on</strong>g> <str<strong>on</strong>g>Executi<strong>on</strong></str<strong>on</strong>g> <str<strong>on</strong>g>Times</str<strong>on</strong>g> <strong>on</strong> <strong>Intel</strong> ® <strong>IA</strong>-<strong>32</strong><br />
<strong>and</strong> <strong>IA</strong>-<strong>64</strong> Instructi<strong>on</strong> Set Architectures<br />
loop_size:1 >>>> variance(cycles): 85; max_deviati<strong>on</strong>: 56 ;min time: 456<br />
loop_size:2 >>>> variance(cycles): 85; max_deviati<strong>on</strong>: 96 ;min time: 456<br />
…………………………………<br />
…………………………………<br />
loop_size:997 >>>> variance(cycles): 85; max_deviati<strong>on</strong>: 92 ;min time: 456<br />
loop_size:998 >>>> variance(cycles): 85; max_deviati<strong>on</strong>: 100 ;min time: 452<br />
loop_size:999 >>>> variance(cycles): 85; max_deviati<strong>on</strong>: 60 ;min time: 452<br />
<str<strong>on</strong>g>to</str<strong>on</strong>g>tal number of spurious min values = 262<br />
<str<strong>on</strong>g>to</str<strong>on</strong>g>tal variance = 48<br />
absolute max deviati<strong>on</strong> = 636<br />
variance of variances = 2306<br />
variance of minimum values = 118<br />
The “loop_size” index refers <str<strong>on</strong>g>to</str<strong>on</strong>g> the external loop (ln33); accordingly each row of the log<br />
shows, for a certain ensemble of measures, the variance, the maximum deviati<strong>on</strong> <strong>and</strong> the<br />
minimum measured time (all of them in clock cycles).<br />
At the end of the log there are: the number of “spurious” minimum values (that in<br />
this case is meaningless <strong>and</strong> can be neglected): the <str<strong>on</strong>g>to</str<strong>on</strong>g>tal variance (the average of<br />
the variances in each row); the absolute maximum deviati<strong>on</strong> (the maximum value<br />
am<strong>on</strong>gst the max deviati<strong>on</strong>s of all the rows); the variance of variances <strong>and</strong> the<br />
variance of the minimum values.<br />
Looking at the results, it is clear that this method is not reliable for benchmarking.<br />
There are different reas<strong>on</strong>s for this:<br />
• The minimum value is not c<strong>on</strong>stant between different ensembles (the variance<br />
of the minimum values is 118 cycles). This means that we cannot evaluate the<br />
cost of calling the benchmarking functi<strong>on</strong> itself. When we are benchmarking a<br />
functi<strong>on</strong> we want <str<strong>on</strong>g>to</str<strong>on</strong>g> be able <str<strong>on</strong>g>to</str<strong>on</strong>g> subtract the cost of calling the benchmarking<br />
functi<strong>on</strong> itself from the measurement of the functi<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> be benchmarked. This<br />
cost is the minimum possible number of cycles that it takes <str<strong>on</strong>g>to</str<strong>on</strong>g> call the<br />
benchmarking functi<strong>on</strong> (i.e., the min times in the rows of the kernel log<br />
above). Basically, in this case, each statistic is performed over 100,000<br />
samples. The fact that over 100,000 samples an absolute minimum cannot be<br />
determined means that we cannot calculate the cost <str<strong>on</strong>g>to</str<strong>on</strong>g> be subtracted when<br />
benchmarking any functi<strong>on</strong>. A soluti<strong>on</strong> could be <str<strong>on</strong>g>to</str<strong>on</strong>g> increase the number of<br />
samples till we always get the same minimum value, but this is practically <str<strong>on</strong>g>to</str<strong>on</strong>g>o<br />
costly since the developer would have <str<strong>on</strong>g>to</str<strong>on</strong>g> wait quite a lot for orders of<br />
magnitude greater than 10^5 samples.<br />
13