13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

INTEL® <strong>64</strong> AND <strong>IA</strong>-<strong>32</strong> PROCESSOR ARCHITECTURES2.5.1.1 Replicated ResourcesThe architectural state is replicated for each logical processor. The architecture stateconsists of registers that are used by the operating system <strong>and</strong> application code tocontrol program behavior <strong>and</strong> store data for computations. This state includes theeight general-purpose registers, the control registers, machine state registers,debug registers, <strong>and</strong> others. There are a few exceptions, most notably the memorytype range registers (MTRRs) <strong>and</strong> the performance monitoring resources. For acomplete list of the architecture state <strong>and</strong> exceptions, see the Intel® <strong>64</strong> <strong>and</strong> <strong>IA</strong>-<strong>32</strong><strong>Architectures</strong> Software Developer’s <strong>Manual</strong>, Volumes 3A & 3B.Other resources such as instruction pointers <strong>and</strong> register renaming tables were replicatedto simultaneously track execution <strong>and</strong> state changes of the two logical processors.The return stack predictor is replicated to improve branch prediction of returninstructions.In addition, a few buffers (for example, the 2-entry instruction streaming buffers)were replicated to reduce complexity.2.5.1.2 Partitioned ResourcesSeveral buffers are shared by limiting the use of each logical processor to half theentries. These are referred to as partitioned resources. Reasons for this partitioninginclude:• Operational fairness• Permitting the ability to allow operations from one logical processor to bypassoperations of the other logical processor that may have stalledFor example: a cache miss, a branch misprediction, or instruction dependencies mayprevent a logical processor from making forward progress for some number ofcycles. The partitioning prevents the stalled logical processor from blocking forwardprogress.In general, the buffers for staging instructions between major pipe stages are partitioned.These buffers include µop queues after the execution trace cache, the queuesafter the register rename stage, the reorder buffer which stages instructions forretirement, <strong>and</strong> the load <strong>and</strong> store buffers.In the case of load <strong>and</strong> store buffers, partitioning also provided an easier implementationto maintain memory ordering for each logical processor <strong>and</strong> detect memoryordering violations.2.5.1.3 Shared ResourcesMost resources in a physical processor are fully shared to improve the dynamic utilizationof the resource, including caches <strong>and</strong> all the execution units. Some sharedresources which are linearly addressed, like the DTLB, include a logical processor IDbit to distinguish whether the entry belongs to one logical processor or the other.2-39

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!