13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

GENERAL OPTIMIZATION GUIDELINESExample 3-36. Decomposing an Array (Contd.)struct { /* 200 bytes */char b, d;} hybrid_struct_of_array_bd[100];The efficiency of such optimizations depends on usage patterns. If the elements ofthe structure are all accessed together but the access pattern of the array is r<strong>and</strong>om,then ARRAY_OF_STRUCT avoids unnecessary prefetch even though it wastesmemory.However, if the access pattern of the array exhibits locality (for example, if the arrayindex is being swept through) then processors with hardware prefetchers willprefetch data from STRUCT_OF_ARRAY, even if the elements of the structure areaccessed together.When the elements of the structure are not accessed with equal frequency, such aswhen element A is accessed ten times more often than the other entries, thenSTRUCT_OF_ARRAY not only saves memory, but it also prevents fetching unnecessarydata items B, C, D, <strong>and</strong> E.Using STRUCT_OF_ARRAY also enables the use of the SIMD data types by theprogrammer <strong>and</strong> the compiler.Note that STRUCT_OF_ARRAY can have the disadvantage of requiring more independentmemory stream references. This can require the use of more prefetches <strong>and</strong>additional address generation calculations. It can also have an impact on DRAM pageaccess efficiency. An alternative, HYBRID_STRUCT_OF_ARRAY blends the twoapproaches. In this case, only 2 separate address streams are generated <strong>and</strong> referenced:1 for HYBRID_STRUCT_OF_ARRAY_ACE <strong>and</strong> 1 forHYBRID_STRUCT_OF_ARRAY_BD. The second alterative also prevents fetchingunnecessary data — assuming that (1) the variables A, C <strong>and</strong> E are always usedtogether, <strong>and</strong> (2) the variables B <strong>and</strong> D are always used together, but not at the sametime as A, C <strong>and</strong> E.The hybrid approach ensures:• Simpler/fewer address generations than STRUCT_OF_ARRAY• Fewer streams, which reduces DRAM page misses• Fewer prefetches due to fewer streams• Efficient cache line packing of data elements that are used concurrentlyAssembly/Compiler Coding Rule 53. (H impact, M generality) Try to arrangedata structures such that they permit sequential access.If the data is arranged into a set of streams, the automatic hardware prefetcher canprefetch data that will be needed by the application, reducing the effective memorylatency. If the data is accessed in a non-sequential manner, the automatic hardwareprefetcher cannot prefetch the data. The prefetcher can recognize up to eight3-58

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!