13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

INDEXNumerics<strong>64</strong>-bit modearithmetic, 9-3coding guidelines, 9-1compiler settings, A-2CVTSI2SD instruction, 9-4CVTSI2SS instruction, 9-4default oper<strong>and</strong> size, 9-1introduction, 2-45legacy instructions, 9-1multiplication notes, 9-2register usage, 9-2, 9-3REX prefix, 9-1sign-extension, 9-2software prefetch, 9-5Aabsolute difference of signed numbers, 5-20absolute difference of unsigned numbers, 5-20absolute value, 5-21active power, 10-1ADDSUBPD instruction, 6-17ADDSUBPS instruction, 6-17, 6-19algorithm to avoid changing the rounding mode, 3-82alignmentarrays, 3-56code, 3-12stack, 3-59structures, 3-56Amdahl’s law, 7-2AoS format, 4-20application performance tools, A-1arraysaligning, 3-56assembler/compiler coding rules, E-1assist, B-2automatic vectorization, 4-12, 4-13Bbattery lifeguidelines for extending, 10-5mobile optimization, 10-1OS APIs, 10-6quality trade-offs, 10-5bogus, non-bogus, retire, B-1branch predictionchoosing types, 3-13code examples, 3-8eliminating branches, 3-7optimizing, 3-6unrolling loops, 3-15bus ratio, B-2CC4-state, 10-4cache managementblocking techniques, 8-22cache level, 8-5CLFLUSH instruction, 8-12coding guidelines, 8-1compiler choices, 8-2compiler intrinsics, 8-2CPUID instruction, 3-5, 8-37function leaf, 3-5optimizing, 8-1simple memory copy, 8-<strong>32</strong>smart cache, 2-36video decoder, 8-31video encoder, 8-31See also: optimizing cache utilizationcall graph profiling, A-11CD/DVD, 10-7changing the rounding mode, 3-82classes (C/C++), 4-11CLFLUSH instruction, 8-12clipping to an arbitrary signed range, 5-25clipping to an arbitrary unsigned range, 5-27clock ticksin performance matrics, B-6nominal CPI, B-3non-halted clock ticks, B-3non-halted CPI, B-3non-sleep clock ticks, B-3time-stamp counter, B-3See also: performance monitoring eventscoding techniques, 4-7, 7-23<strong>64</strong>-bit guidelines, 9-1absolute difference of signed numbers, 5-20absolute difference of unsigned numbers, 5-20absolute value, 5-21clipping to an arbitrary signed range, 5-25clipping to an arbitrary unsigned range, 5-27conserving power, 10-7data in segment, 3-63generating constants, 5-19interleaved pack with saturation, 5-8interleaved pack without saturation, 5-10latency <strong>and</strong> throughput, C-1methodologies, 4-8non-interleaved unpack, 5-10optimization options, A-2rules, 3-5, E-1signed unpack, 5-7simplified clip to arbitrary signed range, 5-26sleep transitions, 10-7suggestions, 3-5, E-1Index-1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!