13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CODING FOR SIMD ARCHITECTURESpurpose instruction LDDQU that can avoid cache line splits is discussed inSection 5.7.1.1, “Supplemental Techniques for Avoiding Cache Line Splits.”4.4.1.1 Using Padding to Align DataHowever, when accessing SIMD data using SIMD operations, access to data can beimproved simply by a change in the declaration. For example, consider a declarationof a structure, which represents a point in space plus an attribute.typedef struct {short x,y,z; char a} Point;Point pt[N];Assume we will be performing a number of computations on X, Y, Z in three of thefour elements of a SIMD word; see Section 4.5.1, “Data Structure Layout,” for anexample. Even if the first element in array PT is aligned, the second element will start7 bytes later <strong>and</strong> not be aligned (3 shorts at two bytes each plus a single byte = 7bytes).By adding the padding variable PAD, the structure is now 8 bytes, <strong>and</strong> if the firstelement is aligned to 8 bytes (<strong>64</strong> bits), all following elements will also be aligned. Thesample declaration follows:typedef struct {short x,y,z; char a; char pad;} Point;Point pt[N];4.4.1.2 Using Arrays to Make Data ContiguousIn the following code,for (i=0; i

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!