13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

INDEXPMAXSW, 5-28PMAXUB, 5-28PMINSW, 5-28PMINUB, 5-28PMOVMSKB, 5-14PMULHUW, 5-28PMULHW, 5-28PMULUDQ, 5-30PSADBW, 5-28PSHUF, 5-15PSHUFB, 5-21, 5-23PSHUFLW, 5-17PSLLDQ, 5-31PSRLDQ, 5-31PSUBQ, 5-30PUNPCHQDQ, 5-18PUNPCKLQDQ, 5-18simplified 3D geometry pipeline, 8-15simplified clipping to an arbitrary signed range, 5-27single vs multi-pass execution, 8-27sleep transitions, 10-7smart cache, 2-36SoA format, 4-20software write-combining, 8-30spin-loops, 10-6optimization, 3-9PAUSE instruction, 3-9related information, 1-3SSE, 2-48SSE2, 2-48SSE3, 2-49SSSE3, 2-49stackaligned EDP-based frames, D-4aligned ESP-based frames, D-3alignment 128-bit SIMD, 4-15alignment stack, 3-59dynamic alignment, 3-59frame optimizations, D-6inlined assembly & EBX, D-7Intel C++ Compiler support for, D-1overview, D-1state transitions, 10-2static branch prediction algorithm, 3-10static power, 10-1static prediction, 3-9streaming stores, 8-7coherent requests, 8-9improving performance, 8-7non-coherent requests, 8-9strip-mining, 4-22, 4-23, 8-24, 8-25prefetch considerations, 8-26structuresaligning, 3-56suggestions, E-1summary of coding rules, E-1swizzling dataSee data swizzling.system bus optimization, 7-23Ttagging, B-2tagging mechanismsexecution_event, B-37front_end_event, B-37replay_event, B-35time-based sampling, A-11time-consuming innermost loops, 8-5time-stamp counter, B-5non-sleep clock ticks, B-5RDTSC instruction, B-5sleep pin, B-5TLB. See transaction lookaside buffertrace cacheevents, B-30transaction lookaside buffer, 8-<strong>32</strong>transcendental functions, 3-86Uunpack instructions, 5-10UNPACKHPS instruction, 6-7UNPACKLPS instruction, 6-7UNPCKHPS instruction, 6-10UNPCKLPS instruction, 6-10unrolling loopsbenefits of, 3-15code examples, 3-16limitation of, 3-15unsigned unpack, 5-6using MMX code for copy, shuffling, 6-12Vvector class library, 4-12vectorized codeauto generation, A-6automatic vectorization, 4-12high-level examples, A-6parallelism, 4-7SIMD architecture, 4-7switch options, A-4vertical vs horizontal computation, 6-3WWaitForSingleObject(), 10-6WaitMessage(), 10-6weakly ordered stores, 8-7WiFi, 10-7WLAN, 10-7workload characterizationretirement throughput, A-11Index-8

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!