13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

OPTIMIZING FOR SIMD INTEGER APPLICATIONSExample 5-21. Basic C Implementation of RGBA to BGRA Conversionvoid BGRA_RGBA_Convert(BGRA *source, RGBA *dest, int num_pixels){for(int i = 0; i < num_pixels; i++){dest[i].r = source[i].r;dest[i].g = source[i].g;dest[i].b = source[i].b;dest[i].a = source[i].a;}}Example 5-22 <strong>and</strong> Example 5-23 show SSE2 code <strong>and</strong> SSSE3 code for pixel formatconversion. In the SSSE3 example, PSHUFB replaces six SSE2 instructions.Example 5-22. Color Pixel Format Conversion Using SSE2; Optimized for SSE2mov esi, srcmov edi, destmov ecx, iterationsmovdqa xmm0, ag_mask //{0,ff,0,ff,0,ff,0,ff,0,ff,0,ff,0,ff,0,ff}movdqa xmm5, rb_mask //{ff,0,ff,0,ff,0,ff,0,ff,0,ff,0,ff,0,ff,0}mov eax, remainderconvert16Pixs: // 16 pixels, <strong>64</strong> byte per iterationmovdqa xmm1, [esi] // xmm1 = [r3g3b3a3,r2g2b2a2,r1g1b1a1,r0g0b0a0]movdqa xmm2, xmm1movdqa xmm7, xmm1 //xmm7 abgrpsrld xmm2, 16 //xmm2 00abpslld xmm1, 16 //xmm1 gr00por xmm1, xmm2 //xmm1 grabp<strong>and</strong> xmm7, xmm0 //xmm7 a0g0p<strong>and</strong> xmm1, xmm5 //xmm1 0r0bpor xmm1, xmm7 //xmm1 argbmovdqa [edi], xmm15-22

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!