03.03.2013 Views

Intel® Architecture Instruction Set Extensions Programming Reference

Intel® Architecture Instruction Set Extensions Programming Reference

Intel® Architecture Instruction Set Extensions Programming Reference

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

APPLICATION PROGRAMMING MODEL<br />

Upper bits of YMM registers (255:128) can be read and written by many instructions with a VEX.256 prefix.<br />

XSAVE and XRSTOR may be used to save and restore the upper bits of the YMM registers.<br />

2.5 MEMORY ALIGNMENT<br />

Memory alignment requirements on VEX-encoded instruction differ from non-VEX-encoded instructions. Memory<br />

alignment applies to non-VEX-encoded SIMD instructions in three categories:<br />

• Explicitly-aligned SIMD load and store instructions accessing 16 bytes of memory (e.g. MOVAPD, MOVAPS,<br />

MOVDQA, etc.). These instructions always require memory address to be aligned on 16-byte boundary.<br />

• Explicitly-unaligned SIMD load and store instructions accessing 16 bytes or less of data from memory (e.g.<br />

MOVUPD, MOVUPS, MOVDQU, MOVQ, MOVD, etc.). These instructions do not require memory address to be<br />

aligned on 16-byte boundary.<br />

• The vast majority of arithmetic and data processing instructions in legacy SSE instructions (non-VEX-encoded<br />

SIMD instructions) support memory access semantics. When these instructions access 16 bytes of data from<br />

memory, the memory address must be aligned on 16-byte boundary.<br />

Most arithmetic and data processing instructions encoded using the VEX prefix and performing memory accesses<br />

have more flexible memory alignment requirements than instructions that are encoded without the VEX prefix.<br />

Specifically,<br />

• With the exception of explicitly aligned 16 or 32 byte SIMD load/store instructions, most VEX-encoded,<br />

arithmetic and data processing instructions operate in a flexible environment regarding memory address<br />

alignment, i.e. VEX-encoded instruction with 32-byte or 16-byte load semantics will support unaligned load<br />

operation by default. Memory arguments for most instructions with VEX prefix operate normally without<br />

causing #GP(0) on any byte-granularity alignment (unlike Legacy SSE instructions). The instructions that<br />

require explicit memory alignment requirements are listed in Table 2-4.<br />

Software may see performance penalties when unaligned accesses cross cacheline boundaries, so reasonable<br />

attempts to align commonly used data sets should continue to be pursued.<br />

Atomic memory operation in Intel 64 and IA-32 architecture is guaranteed only for a subset of memory operand<br />

sizes and alignment scenarios. The list of guaranteed atomic operations are described in Section 7.1.1 of IA-32<br />

<strong>Intel®</strong> <strong>Architecture</strong> Software Developer’s Manual, Volumes 3A. AVX and FMA instructions do not introduce any new<br />

guaranteed atomic memory operations.<br />

AVX and FMA will generate an #AC(0) fault on misaligned 4 or 8-byte memory references in Ring-3 when<br />

CR0.AM=1. 16 and 32-byte memory references will not generate #AC(0) fault. See Table 2-3 for details.<br />

Certain AVX instructions always require 16- or 32-byte alignment (see the complete list of such instructions in<br />

Table 2-4). These instructions will #GP(0) if not aligned to 16-byte boundaries (for 16-byte granularity loads and<br />

stores) or 32-byte boundaries (for 32-byte loads and stores).<br />

Ref. # 319433-014 2-9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!