13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

APPLICATION PERFORMANCE TOOLSExample 10-8. Auto-Generated Code to Avoid Unaligned LoadsCompiler Switch QxWCompiler Switch QxT$B2$2movups xmm0, _src[eax+4]movaps xmm1, _src[eax]movaps xmm4, _src[eax+16]movsd xmm3, _src[eax+20]subps xmm1, xmm0subps xmm1, _src[eax+16]movss xmm2, _src[eax+28]movhps xmm2, _src[eax+<strong>32</strong>]movups _dst[eax+8], xmm1shufps xmm3, xmm2, 1<strong>32</strong>subps xmm4, xmm3subps xmm4, _src[eax+<strong>32</strong>]movlps _dst[eax+24], xmm4movhps _dst[eax+<strong>32</strong>], xmm4add eax, <strong>32</strong>cmp eax, 40<strong>64</strong>jb $B2$2$B2$2:movaps xmm2, _src[eax+16]movaps xmm0, _src[eax]movdqa xmm3, _src[eax+<strong>32</strong>]movdqa xmm1, xmm2palignr xmm3, xmm2, 4palignr xmm1, xmm0, 4subps xmm0, xmm1subps xmm0, _src[eax+16]movups _dst[eax+8], xmm0subps xmm2, xmm3subps xmm2, _src[eax+<strong>32</strong>]movlps _dst[eax+24], xmm2movhps _dst[eax+<strong>32</strong>], xmm2add eax, <strong>32</strong>cmp eax, 40<strong>64</strong>jb $B2$2A.2 INTEL ® VTUNE PERFORMANCE ANALYZERThe Intel VTune Performance Analyzer is a powerful software-profiling tool forMicrosoft Windows <strong>and</strong> Linux. The VTune analyzer helps you underst<strong>and</strong> the performancecharacteristics of your software at all levels: system, application, microarchitecture.The sections that follow describe the major features of the VTune analyzer <strong>and</strong> brieflyexplain how to use them. For more details on these features, run the VTune analyzer<strong>and</strong> see the online documentation.All features are available for Microsoft Windows. On Linux, sampling <strong>and</strong> call graphare available.A.2.1SamplingSampling allows you to profile all active software on your system, including operatingsystem, device driver, <strong>and</strong> application software. It works by occasionally interruptingthe processor <strong>and</strong> collecting the instruction address, process ID, <strong>and</strong> thread ID. Afterthe sampling activity completes, the VTune analyzer displays the data by process,thread, software module, function, or line of source. There are two methods forgenerating samples: Time-based sampling <strong>and</strong> Event-based sampling.A-10

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!