28.07.2013 Views

GetLogicalProcessorInformation - AMD Developer Central

GetLogicalProcessorInformation - AMD Developer Central

GetLogicalProcessorInformation - AMD Developer Central

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

This article explains how applications can use OS APIs and CPUID on a system with <strong>AMD</strong> processors to<br />

discover the number of logical and physical processors, the number of cores, and the association between<br />

cores and physical processors. Sample code that can be ported across different operating systems is<br />

provided.<br />

Tracy Carver 7/9/2009<br />

The advent of multi-core x86 processors has increased interest in software parallelization. Introductory<br />

discussions to parallelization on <strong>AMD</strong>64 processors often begin by describing <strong>AMD</strong>’s Direct Connect<br />

Architecture, for example here in Frequently Asked Questions: NUMA, SMP and <strong>AMD</strong>s Direct Connect<br />

Architecture. To sum up, the Direct Connect Architecture means that each physical <strong>AMD</strong> processor is a<br />

NUMA node. Each processor has one or more physical CPU cores, and those cores are directly connected<br />

through a high-speed memory controller to a physical bank of memory. Latency is lowest when accessing<br />

local memory and somewhat higher when accessing remote memory. At the OS level, each core is seen as a<br />

“logical processor.” Questions that may arise from this are:<br />

1. Which “logical processor” corresponds with which core?<br />

2. Which “logical processor” or core is associated with each physical processor?<br />

3. How many physical processors are in the system?<br />

These questions also affect licensing considerations, as some software are licensed by the number<br />

of CPU cores, and some are licensed by the number of physical processors. This is discussed<br />

further in Making Multi-Cores Count: An ISV Licensing Primer.<br />

On current and legacy operating systems that run on x86-based processors, there is no common set of APIs<br />

across the operating systems that allow applications to discover the topology of a system. In general, there<br />

are APIs to discover how many logical processors exist, as well as to affinitize a thread to one or more<br />

logical processors. The example source program provided here shows how to take these APIs, combined<br />

with the CPUID instruction, and use them to answer the above questions. The program is expected to work<br />

on Linux®, Solaris, and Windows® operating systems. You should be able to use it with gcc, cc, and<br />

Visual Studio. For convenience, we also supply a 32-bit console binary of the program compiled for<br />

Windows.<br />

• Download the sample program<br />

A basic understanding of CPUID and BIOS functionality is useful before proceeding. Please refer to the<br />

CPUID Specification and the BIOS and Kernel <strong>Developer</strong>s Guide for <strong>AMD</strong> Family 10h Processors for<br />

background.<br />

This example is provided for illustrative purposes only. It is limited by the capabilities of the operating<br />

system and permissions of the user running the program. For example, the usage of processor sets on a<br />

Unix-based system is likely to change the output by restricting the set of available logical processors. In<br />

addition, this applies to current and near-future <strong>AMD</strong> processors only, and we make no assertions to what it<br />

would do on future CPUs from <strong>AMD</strong>.<br />

Of course, developers have a variety of other options to discover this kind of information. Newer versions<br />

of Windows (or older Windows versions with service packs) provide the<br />

<strong>GetLogicalProcessorInformation</strong> API. On Linux, one could write code to parse /proc/cpuinfoor<br />

possibly newer additions to the virtual /proc filesystem. And, Solaris provides the psrinfo and<br />

prtdiagutilities. The advantage of this example is portability across different operating systems. This is<br />

useful if you develop on one platform and deploy on another, or if you work on multiple platforms.<br />

How does this program work? First, a call is made to an OS routine to supply the number of logical


processors. Then the running thread pins itself to each logical processor in succession, and while pinned, it<br />

invokes CPUID a number of times. An array of structs is populated with the information from CPUID; one<br />

struct per logical processor. Then the array is scanned to create a map of physical processors.<br />

While affinitized to each core, we need to use CPUID to find out these crucial fields:<br />

1. LocalApicId – each logical processor has its own APIC ID that uniquely identifies it within the<br />

system. This contains an identifier for each physical processor, as well as identifiers for each core<br />

within a processor. The different cores are always in the least significant bits in the APIC ID.<br />

Obtain this with CPUID function Fn0000_0001_EBX. Set EAX to 0000_0001 before calling<br />

CPUID, and the value returned is in bits 31:24 of the EBX register.<br />

2. ApicIdCoreIdSize – On current “non-legacy” processors, this is the number of least significant<br />

bits in the APIC ID that indicates CPU core ID within a processor. On a legacy processor, this<br />

value is 0.<br />

Obtain this with extended CPUID function Fn8000_0008_ECX, which provides physical<br />

core count information. Set EAX to 8000_0008 before calling CPUID, and the value returned is in<br />

bits 15:12 of the ECX register.<br />

In the 8 bits available in LocalApicId, a specific number of least significant bits are allocated to identifying<br />

individual cores, and the remaining upper most significant bits identify the physical processor. The task<br />

then is to figure out how many bits to use for which piece. If ApicIdCoreIdSize is zero, then we’re on a<br />

legacy processor. The physical processor ID is obtained by shifting the upper bits of LocalApicId to the<br />

right by the number of bits specified by ApicIdCoreIdSize.<br />

The physical processor ID retrieved in this way may not start at 0, because the physical processor bits in the<br />

LocalApicId may be shifted up to account for IOAPIC devices. This is determined by BIOS and is<br />

discussed in the CPUID specification.<br />

Sample output of this program from a Linux system with two Quad-Core <strong>AMD</strong> Opteron processors:<br />

# ./enum<br />

( ./enum –more gives you more details. )<br />

Physical Processor ID 0 has 4 cores<br />

as logical processors 0 1 2 3<br />

Physical Processor ID 1 has 4 cores<br />

as logical processors 4 5 6 7<br />

Number of active logical processors: 8<br />

Number of active physical processors: 2<br />

Number of cores per processor: 4<br />

Number of threads per processor core: 1<br />

Now that you don’t have to use different methods across different operating systems to discover this data,<br />

you should be able to save a little time in your day-to-day work.<br />

The information presented in this document is for informational purposes only and may contain technical<br />

inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and<br />

no endorsement is implied.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!