13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

GENERAL OPTIMIZATION GUIDELINESExample 3-41. Using DCU Hardware PrefetchOriginal codemov ebx, DWORD PTR [First]xor eax, eaxscan_list:mov eax, [ebx+4]mov ecx, 60do_some_work_1:add eax, eax<strong>and</strong> eax, 6sub ecx, 1jnz do_some_work_1Modified sequence benefit from prefetchmov ebx, DWORD PTR [First]xor eax, eaxscan_list:mov eax, [ebx+4]mov eax, [ebx+4]mov eax, [ebx+4]mov ecx, 60do_some_work_1:add eax, eax<strong>and</strong> eax, 6sub ecx, 1jnz do_some_work_1mov eax, [ebx+<strong>64</strong>]mov ecx, 30do_some_work_2:add eax, eax<strong>and</strong> eax, 6sub ecx, 1jnz do_some_work_2mov eax, [ebx+<strong>64</strong>]mov ecx, 30do_some_work_2:add eax, eax<strong>and</strong> eax, 6sub ecx, 1jnz do_some_work_2mov ebx, [ebx]test ebx, ebxjnz scan_listmov ebx, [ebx]test ebx, ebxjnz scan_listThe additional instructions to load data from one member in the modified sequencecan trigger the DCU hardware prefetch mechanisms to prefetch data in the nextcache line, enabling the work on the second member to complete sooner.Software can gain from the first-level data cache prefetchers in two cases:• If data is not in the second-level cache, the first-level data cache prefetcherenables early trigger of the second-level cache prefetcher.• If data is in the second-level cache <strong>and</strong> not in the first-level data cache, then thefirst-level data cache prefetcher triggers earlier data bring-up of sequential cacheline to the first-level data cache.There are situations that software should pay attention to a potential side effect oftriggering unnecessary DCU hardware prefetches. If a large data structure with manymembers spanning many cache lines is accessed in ways that only a few of itsmembers are actually referenced, but there are multiple pair accesses to the samecache line. The DCU hardware prefetcher can trigger fetching of cache lines that arenot needed. In Example , references to the “Pts” array <strong>and</strong> “AltPts” will trigger DCU3-71

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!