13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

OPTIMIZING FOR SIMD FLOATING-POINT APPLICATIONSxmm0xmm1xmm2xmm3A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 D1 D2 D3 D4MOVLHPSMOVHLPSMOVLHPSMOVHLPSA1 A2 B1 B2 A3 A4 B3 B4 C1 C2 D1 D2 C3 C4 D3 D4ADDPSADDPSA1+A3 A2+A4 B1+B3 B2+B4 C1+C3 C2+C4 D1+D3 D2+D4SHUFPSSHUFPSA1+A3 B1+B3 C1+C3 D1+D3 A2+A4 B2+B4 C2+C4 D2+D4ADDPSA1+A2+A3+A4 B1+B2+B3+B4 C1+C2+C3+C4 D1+D2+D3+D4OM15169Figure 6-4. Horizontal Add Using MOVHLPS/MOVLHPSExample 6-9. Horizontal Add Using MOVHLPS/MOVLHPSvoid horiz_add(Vertex_soa *in, float *out) {__asm {mov ecx, in // load structure addressesmov edx, outmovaps xmm0, [ecx] // load A1 A2 A3 A4 => xmm0movaps xmm1, [ecx+16] // load B1 B2 B3 B4 => xmm1movaps xmm2, [ecx+<strong>32</strong>] // load C1 C2 C3 C4 => xmm2movaps xmm3, [ecx+48] // load D1 D2 D3 D4 => xmm36-14

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!