03.03.2013 Views

Intel® Architecture Instruction Set Extensions Programming Reference

Intel® Architecture Instruction Set Extensions Programming Reference

Intel® Architecture Instruction Set Extensions Programming Reference

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

INTEL® TRANSACTIONAL SYNCHRONIZATION EXTENSIONS<br />

• An XRELEASE prefixed instruction must restore the value of the elided lock to the value it had before the lock<br />

acquisition. This allows hardware to safely elide locks by not adding them to the write-set. The data size and<br />

data address of the lock release (XRELEASE prefixed) instruction must match that of the lock acquire<br />

(XACQUIRE prefixed) and the lock must not cross a cache line boundary.<br />

• Software should not write to the elided lock inside a transactional HLE region with any instruction other than an<br />

XRELEASE prefixed instruction, otherwise it may cause a transactional abort. In addition, recursive locks<br />

(where a thread acquires the same lock multiple times without first releasing the lock) may also cause a transactional<br />

abort. Note that software can observe the result of the elided lock acquire inside the critical section.<br />

Such a read operation will return the value of the write to the lock.<br />

The processor automatically detects violations to these guidelines, and safely transitions to a non-transactional<br />

execution without elision. Since Intel TSX detects conflicts at the granularity of a cache line, writes to data collocated<br />

on the same cache line as the elided lock may be detected as data conflicts by other logical processors eliding<br />

the same lock.<br />

8.3.4 Transactional Nesting<br />

Both HLE and RTM support nested transactional regions. However, a transactional abort restores state to the operation<br />

that started transactional execution: either the outermost XACQUIRE prefixed HLE eligible instruction or the<br />

outermost XBEGIN instruction. The processor treats all nested transactions as one monolithic transaction.<br />

8.3.4.1 HLE Nesting and Elision<br />

Programmers can nest HLE regions up to an implementation specific depth of MAX_HLE_NEST_COUNT. Each logical<br />

processor tracks the nesting count internally but this count is not available to software. An XACQUIRE prefixed HLEeligible<br />

instruction increments the nesting count, and an XRELEASE prefixed HLE-eligible instruction decrements it.<br />

The logical processor enters transactional execution when the nesting count goes from zero to one. The logical<br />

processor attempts to commit only when the nesting count becomes zero. A transactional abort may occur if the<br />

nesting count exceeds MAX_HLE_NEST_COUNT.<br />

In addition to supporting nested HLE regions, the processor can also elide multiple nested locks. The processor<br />

tracks a lock for elision beginning with the XACQUIRE prefixed HLE eligible instruction for that lock and ending with<br />

the XRELEASE prefixed HLE eligible instruction for that same lock. The processor can, at any one time, track up to<br />

a MAX_HLE_ELIDED_LOCKS number of locks. For example, if the implementation supports a<br />

MAX_HLE_ELIDED_LOCKS value of two and if the programmer nests three HLE identified critical sections (by<br />

performing XACQUIRE prefixed HLE eligible instructions on three distinct locks without performing an intervening<br />

XRELEASE prefixed HLE eligible instruction on any one of the locks), then the first two locks will be elided, but the<br />

third won't be elided (but will be added to the transaction’s write-set). However, the execution will still continue<br />

transactionally. Once an XRELEASE for one of the two elided locks is encountered, a subsequent lock acquired<br />

through the XACQUIRE prefixed HLE eligible instruction will be elided.<br />

The processor attempts to commit the HLE execution when all elided XACQUIRE and XRELEASE pairs have been<br />

matched, the nesting count goes to zero, and the locks have satisfied the requirements described earlier. If execution<br />

cannot commit atomically, then execution transitions to a non-transactional execution without elision as if the<br />

first instruction did not have an XACQUIRE prefix.<br />

8.3.4.2 RTM Nesting<br />

Programmers can nest RTM regions up to an implementation specific MAX_RTM_NEST_COUNT. The logical<br />

processor tracks the nesting count internally but this count is not available to software. An XBEGIN instruction<br />

increments the nesting count, and an XEND instruction decrements it. The logical processor attempts to commit<br />

only if the nesting count becomes zero. A transactional abort occurs if the nesting count exceeds<br />

MAX_RTM_NEST_COUNT.<br />

8.3.4.3 Nesting HLE and RTM<br />

HLE and RTM provide two alternative software interfaces to a common transactional execution capability. The<br />

behavior when HLE and RTM are nested together—HLE inside RTM or RTM inside HLE—is implementation specific.<br />

However, in all cases, the implementation will maintain HLE and RTM semantics. An implementation may choose to<br />

8-4 Ref. # 319433-014

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!