13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

OPTIMIZING FOR SIMD INTEGER APPLICATIONSExample 5-3. SSE2 <strong>and</strong> SSSE3 Implementation of FIR Processing Code (Contd.)Optimized for SSE2movups xmm1, xmmword ptr[eax+ecx+12]mulps xmm1, xmmwordptr[ebx+4*ecx+48]addps xmm0, xmm1Optimized for SSSE3movaps xmm2, xmm1palignr xmm2, xmm3, 12mulps xmm2, xmmword ptr[ebx+4*ecx+48]addps xmm0, xmm2add ecx, 16cmp ecx, 4*TAPjl inner_loopmov eax, dword ptr[output]movaps xmmword ptr[eax], xmm0add ecx, 16cmp ecx, 4*TAPjl inner_loopmov eax, dword ptr[output]movaps xmmword ptr[eax], xmm05.4 DATA MOVEMENT CODING TECHNIQUESIn general, better performance can be achieved if data is pre-arranged for SIMDcomputation (see Section 4.5, “Improving Memory Utilization”). This may not alwaysbe possible.This section covers techniques for gathering <strong>and</strong> arranging data for more efficientSIMD computation.5.4.1 Unsigned UnpackMMX technology provides several instructions that are used to pack <strong>and</strong> unpack datain the MMX registers. SSE2 extends these instructions so that they operate on128-bit source <strong>and</strong> destinations.The unpack instructions can be used to zero-extend an unsigned number.Example 5-4 assumes the source is a packed-word (16-bit) data type.5-6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!