13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

INTEL® <strong>64</strong> AND <strong>IA</strong>-<strong>32</strong> PROCESSOR ARCHITECTURESµops. The remaining two decoders each decode a one µop instruction in each clockcycle.The front end can issue multiple µops per cycle, in original program order, to the outof-ordercore.The Intel Pentium M processor incorporates sophisticated branch prediction hardwareto support the out-of-order core. The branch prediction hardware includesdynamic prediction, <strong>and</strong> branch target buffers.The Intel Pentium M processor has enhanced dynamic branch prediction hardware.Branch target buffers (BTB) predict the direction <strong>and</strong> target of branches based on aninstruction’s address.The Pentium M Processor includes two techniques to reduce the execution time ofcertain operations:• ESP folding — This eliminates the ESP manipulation μops in stack-relatedinstructions such as PUSH, POP, CALL <strong>and</strong> RET. It increases decode rename <strong>and</strong>retirement throughput. ESP folding also increases execution b<strong>and</strong>width byeliminating µops which would have required execution resources.• Micro-ops (µops) fusion — Some of the most frequent pairs of µops derivedfrom the same instruction can be fused into a single µops. The followingcategories of fused µops have been implemented in the Pentium M processor:— “Store address” <strong>and</strong> “store data” μops are fused into a single “Store” μop.This holds for all types of store operations, including integer, floating-point,MMX technology, <strong>and</strong> Streaming SIMD Extensions (SSE <strong>and</strong> SSE2)operations.— A load μop in most cases can be fused with a successive execution μop.Thisholds for integer, floating-point <strong>and</strong> MMX technology loads <strong>and</strong> for most kindsof successive execution operations. Note that SSE Loads can not be fused.2.3.2 Data PrefetchingThe Intel Pentium M processor supports three prefetching mechanisms:• The first mechanism is a hardware instruction fetcher <strong>and</strong> is described in theprevious section.• The second mechanism automatically fetches data into the second-level cache.The implementation of automatic hardware prefetching in Pentium M processorfamily is basically similar to those described for NetBurst microarchitecture. Thetrigger threshold distance for each relevant processor models is shown inTable 2-6. The third mechanism is a software mechanism that fetches data intothe caches using the prefetch instructions.2-34

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!