13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

MULTICORE AND HYPER-THREADING TECHNOLOGYExample 8-10. Adding a Pseudo-r<strong>and</strong>om Offset to the Stack Pointer in the Entry Functionvoid main(){char * pPrivate = NULL;long myOffset = GetMod7Kr<strong>and</strong>om128X()// A pseudo-r<strong>and</strong>om number that is a multiple// of 128 <strong>and</strong> less than 7K.// Use runtime library routine to reposition._alloca(myOffset); // The stack pointer.}// The rest of application code below, stack accesses in descendant// functions (e.g. do_foo) are less likely to cause data cache// evictions because of the stack offsets.do_foo();}8.7 FRONT-END OPTIMIZATIONIn the Intel NetBurst microarchitecture family of processors, the instructions aredecoded into μops <strong>and</strong> sequences of μops called traces are stored in the ExecutionTrace Cache. The Trace Cache is the primary sub-system in the front end of theprocessor that delivers μop traces to the execution engine. <strong>Optimization</strong> guidelinesfor front-end operation in single-threaded applications are discussed in Chapter 3.For dual-core processors where the second-level unified cache (for data <strong>and</strong> code) isduplicated for each core (Pentium Processor Extreme Edition, Pentium D processor),there are no special considerations for front-end optimization on behalf of twoprocessor cores in a physical processor.For dual-core processors where the second-level unified cache is shared by twoprocessor cores (Intel Core Duo processor <strong>and</strong> processors based on Intel Coremicroarchitecture), multi-threaded software should consider the increase in codeworking set due to two threads fetching code from the unified cache as part of frontend<strong>and</strong> cache optimization. For quad-core processors based on Intel Core microarchitecture,the considerations that applies to Intel Core 2 Duo processors also applyto quad-core processors.This next two sub-sections discuss guidelines for optimizing the operation of theExecution Trace Cache on processors supporting HT Technology.8-33

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!