13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

GENERAL OPTIMIZATION GUIDELINES— An iterative approach using the instruction with largest data granularity,where the overhead for SIMD feature detection, iteration overhead, <strong>and</strong>prolog/epilogue for alignment control can be minimized. The trade-offbetween these approaches may depend on the microarchitecture.An example of MEMSET() implemented using stosd for arbitrary counter valuewith the destination address aligned to doubleword boundary in <strong>32</strong>-bit mode isshown in Example 3-44.• When N is larger than half the size of the last level cache, using 16-bytegranularity streaming stores with prolog/epilog for address alignment will likelybe more efficient, if the destination addresses will not be referenced immediatelyafterwards.Example 3-44. REP STOSD with Arbitrary Count Size <strong>and</strong> 4-Byte-Aligned DestinationA ‘C’ example of Memset()Equivalent Implementation Using REP STOSDvoid memset(void *dst,int c,size_t size){char *d = (char *)dst;size_t i;for (i=0;i

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!