13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

OPTIMIZING FOR SIMD INTEGER APPLICATIONSExample 5-24 shows the traditional technique using four BSWAP instructions toreverse the bytes within a DWORD. Each BSWAP requires executing two μops. Inaddition, the code requires 4 loads <strong>and</strong> 4 stores for processing 4 DWORDs of data.Example 5-25 shows an SSSE3 implementation of endian conversion using PSHUFB.The reversing of four DWORDs requires one load, one store, <strong>and</strong> PSHUFB.On Intel Core microarchitecture, reversing 4 DWORDs using PSHUFB can be approximatelytwice as fast as using BSWAP.Example 5-24. Big-Endian to Little-Endian Conversion Using BSWAPlea eax, srclea ecx, dstmov edx, elCountstart:mov edi, [eax]mov esi, [eax+4]bswap edimov ebx, [eax+8]bswap esimov ebp, [eax+12]mov [ecx], edimov [ecx+4], esibswap ebxmov [ecx+8], ebxbswap ebpmov [ecx+12], ebpadd eax, 16add ecx, 16sub edx, 4jnz start5-24

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!