13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CODING FOR SIMD ARCHITECTURESThis is because the instructions consist of two micro-ops instead of three. Relevantinstructions are: unpcklps, unpckhps, packsswb, packuswb, packssdw, pshufd,shuffps <strong>and</strong> shuffpd.Recommendation: When targeting code generation for Intel Core Solo <strong>and</strong> IntelCore Duo processors, favor instructions consisting of two μops over those with morethan two μops.Intel Core microarchitecture generally executes SIMD instructions more efficientlythan previous microarchitectures in terms of latency <strong>and</strong> throughput, many of therestrictions specific to Intel Core Duo, Intel Core Solo processors do not apply. Thesame is true of Intel Core microarchitecture relative to Intel NetBurst microarchitectures.4.7 TUNING THE FINAL APPLICATIONThe best way to tune your application once it is functioning correctly is to use aprofiler that measures the application while it is running on a system. VTune analyzercan help you determine where to make changes in your application to improveperformance. Using the VTune analyzer can help you with various phases required foroptimized performance. See Appendix A.2, “Intel® VTune Performance Analyzer,”for details. After every effort to optimize, you should check the performance gains tosee where you are making your major optimization gains.4-27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!