13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

OPTIMIZING CACHE USAGE9.7.3.1 Cache Sharing Using Deterministic Cache ParametersImproving cache locality is an important part of software optimization. For example,a cache blocking algorithm can be designed to optimize block size at runtime forsingle-processor implementations <strong>and</strong> a variety of multiprocessor execution environments(including processors supporting HT Technology, or multicore processors).The basic technique is to place an upper limit of the blocksize to be less than the sizeof the target cache level divided by the number of logical processors serviced by thetarget level of cache. This technique is applicable to multithreaded applicationprogramming. The technique can also benefit single-threaded applications that arepart of a multi-tasking workloads.9.7.3.2 Cache Sharing in Single-Core or MulticoreDeterministic cache parameters are useful for managing shared cache hierarchy inmultithreaded applications for more sophisticated situations. A given cache level maybe shared by logical processors in a processor or it may be implemented to be sharedby logical processors in a physical processor package.Using the deterministic cache parameter leaf <strong>and</strong> initial APIC_ID associated witheach logical processor in the platform, software can extract information on thenumber <strong>and</strong> the topological relationship of logical processors sharing a cache level.See also: Section 8.9.1, “Using Shared Execution Resources in a Processor Core.”9.7.3.3 Determine Prefetch StrideThe prefetch stride (see description of CPUID.01H.EBX) provides the length of theregion that the processor will prefetch with the PREFETCHh instructions(PREFETCHT0, PREFETCHT1, PREFETCHT2 <strong>and</strong> PREFETCHNTA). Software will use thelength as the stride when prefetching into a particular level of the cache hierarchy asidentified by the instruction used. The prefetch size is relevant for cache types ofData Cache (1) <strong>and</strong> Unified Cache (3); it should be ignored for other cache types.Software should not assume that the coherency line size is the prefetch stride.If the prefetch stride field is zero, then software should assume a default size of<strong>64</strong> bytes is the prefetch stride. Software should use the following algorithm to determinewhat prefetch size to use depending on whether the deterministic cache parametermechanism is supported or the legacy mechanism:• If a processor supports the deterministic cache parameters <strong>and</strong> provides a nonzeroprefetch size, then that prefetch size is used.• If a processor supports the deterministic cache parameters <strong>and</strong> does notprovides a prefetch size then default size for each level of the cache hierarchy is<strong>64</strong> bytes.• If a processor does not support the deterministic cache parameters but providesa legacy prefetch size descriptor (0xF0 - <strong>64</strong> byte, 0xF1 - 128 byte) will be theprefetch size for all levels of the cache hierarchy.9-39

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!