13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

MULTICORE AND HYPER-THREADING TECHNOLOGY• Scheduling two functional-decomposition threads to use shared executionresources cooperatively• Scheduling pairs of memory-intensive threads <strong>and</strong> compute-intensive threads tomaximize processor scaling <strong>and</strong> avoid resource contentions in the same coreAn example using the 3-level hierarchy <strong>and</strong> relationships between the initial APIC_ID<strong>and</strong> the affinity mask to manage thread affinity binding is shown in Example 8-12.The example shows an implementation of building a lookup table so that thesequence of thread scheduling is mapped to an array of affinity masks such thatthreads are scheduled first to the primary logical processor of each processor core.This example is also optimized to the situations of scheduling two memory-intensivethreads to run on separate cores <strong>and</strong> scheduling two compute-intensive threads onseparate cores.User/Source Coding Rule 38. (M impact, L generality) Consider using threadaffinity to optimize sharing resources cooperatively in the same core <strong>and</strong>subscribing dedicated resource in separate processor cores.Some multicore processor implementation may have a shared cache topology that isnot uniform across different cache levels. The deterministic cache parameter leaf ofCPUID will report such cache-sharing topology. The 3-level hierarchy <strong>and</strong> relationshipsbetween the initial APIC_ID <strong>and</strong> affinity mask can also be used to manage sucha topology.Example 8-13 illustrates the steps of discovering sibling logical processors in a physicalpackage sharing a target level cache. The algorithm assumes initial APIC IDs areassigned in a manner that respect bit field boundaries, with respect to the modularboundary of the subset of logical processor sharing that cache level. Software canquery the number of logical processors in hardware sharing a cache using the deterministiccache parameter leaf of CPUID. By comparing the relevant bits in the initialAPIC_ID, one can construct a mask to represent sibling logical processors that aresharing the same cache.Note the bit field boundary of the cache-sharing topology is not necessarily the sameas the core boundary. Some cache levels can be shared across core boundary.Example 8-12. Assembling a Look up Table to Manage Affinity Masks<strong>and</strong> Schedule Threads to Each Core FirstAFFINITYMASK LuT[<strong>64</strong>]; // A Lookup table to retrieve the affinity// mask we want to use from the thread// scheduling sequence index.int index =0; // Index to scheduling sequence.j = 0;8-37

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!