13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

MULTICORE AND HYPER-THREADING TECHNOLOGY8.6.4.1 Per-thread Stack OffsetTo prevent private stack accesses in concurrent threads from thrashing the first-leveldata cache, an application can use a per-thread stack offset for each of its threads.The size of these offsets should be multiples of a common base offset. The optimumchoice of this common base offset may depend on the memory access characteristicsof the threads; but it should be multiples of 128 bytes.One effective technique for choosing a per-thread stack offset in an application is toadd an equal amount of stack offset each time a new thread is created in a threadpool. 7 Example 8-9 shows a code fragment that implements per-thread stack offsetfor three threads using a reference offset of 1024 bytes.User/Source Coding Rule 34. (H impact, M generality) Adjust the privatestack of each thread in an application so that the spacing between these stacks isnot offset by multiples of <strong>64</strong> KBytes or 1 MByte to prevent unnecessary cache lineevictions (when using Intel processors supporting HT Technology).Example 8-9. Adding an Offset to the Stack Pointer of Three ThreadsVoid Func_thread_entry(DWORD *pArg){DWORD StackOffset = *pArg;DWORD var1; // The local variable at this scope may not benefitDWORD var2; // from the adjustment of the stack pointer that ensue.// Call runtime library routine to offset stack pointer._alloca(StackOffset) ;}// Managing per-thread stack offset to create three threads:// * Code for the thread function// * Stack accesses within descendant functions (do_foo1, do_foo2)// are less likely to cause data cache evictions because of the// stack offset.do_foo1();do_foo2();}main (){DWORD Stack_offset, ID_Thread1, ID_Thread2, ID_Thread3;Stack_offset = 1024;// Stack offset between parent thread <strong>and</strong> the first child thread.ID_Thread1 = CreateThread(Func_thread_entry, &Stack_offset);// Call OS thread API.7. For parallel applications written to run with OpenMP, the OpenMP runtime library inIntel ® KAP/Pro Toolset automatically provides the stack offset adjustment for each thread.8-31

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!