13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

MULTICORE AND HYPER-THREADING TECHNOLOGYExample 8-4. Spin-wait Loop <strong>and</strong> PAUSE Instructions(a) An un-optimized spin-wait loop experiences performance penalty when exiting the loop. Itconsumes execution resources without contributing computational work.do {// This loop can run faster than the speed of memory access,// other worker threads cannot finish modifying sync_var until// outst<strong>and</strong>ing loads from the spinning loops are resolved.} while( sync_var != constant_value);(b) Inserting the PAUSE instruction in a fast spin-wait loop prevents performance-penalty to thespinning thread <strong>and</strong> the worker threaddo {_asm pause// Ensure this loop is de-pipelined, i.e. preventing more than one// load request to sync_var to be outst<strong>and</strong>ing,// avoiding performance penalty when the worker thread updates// sync_var <strong>and</strong> the spinning thread exiting the loop.}while( sync_var != constant_value);(c) A spin-wait loop using a “test, test-<strong>and</strong>-set” technique to determine the availability of thesynchronization variable. This technique is recommended when writing spin-wait loops to run onIntel <strong>64</strong> <strong>and</strong> <strong>IA</strong>-<strong>32</strong> architecture processors.Spin_Lock:CMP lockvar, 0 ;// Check if lock is free.JE Get_lockPAUSE;// Short delay.JMP Spin_Lock;Get_Lock:MOV EAX, 1;XCHG EAX, lockvar; // Try to get lock.CMP EAX, 0; // Test if successful.JNE Spin_Lock;Critical_Section:MOV lockvar, 0; // Release lock.User/Source Coding Rule 20. (M impact, H generality) Insert the PAUSEinstruction in fast spin loops <strong>and</strong> keep the number of loop repetitions to a minimumto improve overall system performance.On processors that use the Intel NetBurst microarchitecture core, the penalty ofexiting from a spin-wait loop can be avoided by inserting a PAUSE instruction in theloop. In spite of the name, the PAUSE instruction improves performance by introducinga slight delay in the loop <strong>and</strong> effectively causing the memory read requests to8-17

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!