13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

APPLICATION PERFORMANCE TOOLSExample 10-6. Auto-Generated Code of Data ConversionCompiler Switch QxWCompiler Switch QxT$B1$2:xor eax, eaxpxor xmm0, xmm0$B1$3:movd xmm1, _src[eax]punpcklbw xmm1, xmm0punpcklwd xmm1, xmm0movdqa _dst[eax*4], xmm1add eax, 4cmp eax, 1024jb $B1$3$B1$2:movdqa xmm0, _2il0fl2t$1DDxor eax, eax$B1$3:movd xmm1, _src[eax]pshufb xmm1, xmm0movdqa _dst[eax*4], xmm1add eax, 4cmp eax, 1024jb $B1$3…_2il0fl2t$1DD0ffffff00H,0ffffff01H,0ffffff02H,0ffffff03HExample 10-7. Un-aligned Data Operation__declspec(align(16)) float src[1024], dst[1024];for(i = 2; i < 1024-2; i++)dst[i] = src[i-2] - src[i-1] - src[i+2 ];Intel Compiler can use PALIGNR to generate code to avoid penalties associated withunaligned loads.A-9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!