21.03.2013 Views

Problem - Kevin Tafuro

Problem - Kevin Tafuro

Problem - Kevin Tafuro

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

On Windows machines, you can read the same thing using<br />

QueryPerformanceCounter( ), which takes a pointer to a 64-bit integer (the LARGE_<br />

INTEGER or __int64 type).<br />

You can get fairly accurate timing just by subtracting two subsequent calls to<br />

current_stamp( ). For example, you can time how long an empty for loop with<br />

10,000 iterations takes:<br />

#include <br />

int main(int argc, char *argv[ ]) {<br />

spc_uint64_t start, finish, diff;<br />

volatile int i;<br />

current_stamp(&start);<br />

for (i = 0; i < 10000; i++);<br />

current_stamp(&finish);<br />

diff = finish - start;<br />

printf("That loop took %lld cycles.\n", diff);<br />

return 0;<br />

}<br />

On an Athlon XP, compiling with GCC 2.95.4, the previous code will consistently<br />

give 43–44 cycles without optimization turned on and 37–38 cycles with optimization<br />

turned on. Generally, if i is declared volatile, the compiler won’t eliminate the<br />

loop, even when it can figure out that there are no side effects.<br />

Note that you can expect some minimal overhead in gathering the timestamp to<br />

begin with. You can calculate the fixed timing overhead by timing nothing:<br />

int main(int argc, char *argv[ ]) {<br />

spc_uint64_t start, finish, diff;<br />

current_stamp(&start);<br />

current_stamp(&finish);<br />

diff = finish - start;<br />

printf("Timing overhead takes %lld cycles.\n", diff);<br />

return 0;<br />

}<br />

On an Athlon XP, the overhead is usually reported as 0 cycles and occasionally as 1<br />

cycle. This isn’t really accurate, because the two store operations in the first timestamp<br />

call take about 2 to 4 cycles. The problem is largely due to pipelining and<br />

other complex architectural issues, and it is hard to work around. You can explicitly<br />

introduce pipeline stalls, but we’ve found that doesn’t always work as well as<br />

expected. One thing to do is to time the processing of a large amount of data. Even<br />

then, you will get variances in timing because of things not under your control, such<br />

as context switches. In short, you can get within a few cycles of the truth, and<br />

beyond that you’ll probably have to take some sort of average.<br />

152 | Chapter 4: Symmetric Cryptography Fundamentals<br />

This is the Title of the Book, eMatter Edition<br />

Copyright © 2007 O’Reilly & Associates, Inc. All rights reserved.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!