13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CODING FOR SIMD ARCHITECTURESmoves can be emulated in SIMD computation by using masked compares <strong>and</strong> logicals,as shown in Example 4-19.Example 4-19. Emulation of Conditional MovesHigh-level code:short A[MAX_ELEMENT], B[MAX_ELEMENT], C[MAX_ELEMENT], D[MAX_ELEMENT],E[MAX_ELEMENT];for (i=0; i B[i]) {C[i] = D[i];} else {C[i] = E[i];}}Assembly code:xor eax, eaxtop_of_loop:movq mm0, [A + eax]pcmpgtwmm0, [B + eax]; Create compare maskmovq mm1, [D + eax]p<strong>and</strong> mm1, mm0; Drop elements where ABpor mm0, mm1; Crete single wordmovq [C + eax], mm0add eax, 8cmp eax, MAX_ELEMENT*2jle top_of_loopNote that this can be applied to both SIMD integer <strong>and</strong> SIMD floating-point code.If there are multiple consumers of an instance of a register, group the consumerstogether as closely as possible. However, the consumers should not be schedulednear the producer.4.6.1 SIMD <strong>Optimization</strong>s <strong>and</strong> MicroarchitecturesPentium M, Intel Core Solo <strong>and</strong> Intel Core Duo processors have a different microarchitecturethan Intel NetBurst microarchitecture. The following sub-section discussesoptimizing SIMD code targeting Intel Core Solo <strong>and</strong> Intel Core Duo processors.The register-register variant of the following instructions has improved performanceon Intel Core Solo <strong>and</strong> Intel Core Duo processor relative to Pentium M processors.4-26

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!