13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

MULTICORE AND HYPER-THREADING TECHNOLOGYExample 8-9. Adding an Offset to the Stack Pointer of Three Threads (Contd.)Stack_offset = 2048;ID_Thread2 = CreateThread(Func_thread_entry, &Stack_offset);Stack_offset = 3072;ID_Thread3 = CreateThread(Func_thread_entry, &Stack_offset);}8.6.4.2 Per-instance Stack OffsetEach instance an application runs in its own linear address space; but the addresslayout of data for stack segments is identical for the both instances. When theinstances are running in lock step, stack accesses are likely to cause of excessiveevictions of cache lines in the first-level data cache for some early implementationsof HT Technology in <strong>IA</strong>-<strong>32</strong> processors.Although this situation (two copies of an application running in lock step) is seldoman objective for multithreaded software or a multiprocessor platform, it can happenby an end-user’s direction. One solution is to allow application instance to add a suitablelinear address-offset for its stack. Once this offset is added at start-up, a bufferof linear addresses is established even when two copies of the same application areexecuting using two logical processors in the same physical processor package. Thespace has negligible impact on running dissimilar applications <strong>and</strong> on executingmultiple copies of the same application.However, the buffer space does enable the first-level data cache to be shared cooperativelywhen two copies of the same application are executing on the two logicalprocessors in a physical processor package.To establish a suitable stack offset for two instances of the same application runningon two logical processors in the same physical processor package, the stack pointercan be adjusted in the entry function of the application using the technique shown inExample 8-10. The size of stack offsets should also be a multiple of a reference offsetthat may depend on the characteristics of the application’s data access pattern. Oneway to determine the per-instance value of the stack offsets is to choose a pseudor<strong>and</strong>omnumber that is also a multiple of the reference offset or 128 bytes. Usually,this per-instance pseudo-r<strong>and</strong>om offset can be less than 7 KByte. Example 8-10provides a code fragment for adjusting the stack pointer in an application entry function.User/Source Coding Rule 35. (M impact, L generality) Add per-instance stackoffset when two instances of the same application are executing in lock steps toavoid memory accesses that are offset by multiples of <strong>64</strong> KByte or 1 MByte, whentargeting Intel processors supporting HT Technology.8-<strong>32</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!