28.04.2013 Views

What Every Programmer Should Know About Memory

What Every Programmer Should Know About Memory

What Every Programmer Should Know About Memory

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

and must wait to access memory, despite the use of CPU<br />

caches. If multiple hyper-threads, cores, or processors<br />

access memory at the same time, the wait times for memory<br />

access are even longer. This is also true for DMA<br />

operations.<br />

There is more to accessing memory than concurrency,<br />

however. Access patterns themselves also greatly influence<br />

the performance of the memory subsystem, especially<br />

with multiple memory channels. In section 2.2 we<br />

wil cover more details of RAM access patterns.<br />

On some more expensive systems, the Northbridge does<br />

not actually contain the memory controller. Instead the<br />

Northbridge can be connected to a number of external<br />

memory controllers (in the following example, four of<br />

them).<br />

RAM<br />

RAM<br />

MC1<br />

MC2<br />

PCI-E<br />

CPU1<br />

CPU2<br />

Northbridge<br />

Southbridge<br />

MC3<br />

MC4<br />

SATA<br />

USB<br />

RAM<br />

RAM<br />

Figure 2.2: Northbridge with External Controllers<br />

The advantage of this architecture is that more than one<br />

memory bus exists and therefore total available bandwidth<br />

increases. This design also supports more memory.<br />

Concurrent memory access patterns reduce delays by simultaneously<br />

accessing different memory banks. This<br />

is especially true when multiple processors are directly<br />

connected to the Northbridge, as in Figure 2.2. For such<br />

a design, the primary limitation is the internal bandwidth<br />

of the Northbridge, which is phenomenal for this architecture<br />

(from Intel). 4<br />

Using multiple external memory controllers is not the<br />

only way to increase memory bandwidth. One other increasingly<br />

popular way is to integrate memory controllers<br />

into the CPUs and attach memory to each CPU. This<br />

architecture is made popular by SMP systems based on<br />

AMD’s Opteron processor. Figure 2.3 shows such a system.<br />

Intel will have support for the Common System Interface<br />

(CSI) starting with the Nehalem processors; this<br />

is basically the same approach: an integrated memory<br />

controller with the possibility of local memory for each<br />

processor.<br />

With an architecture like this there are as many memory<br />

banks available as there are processors. On a quad-CPU<br />

machine the memory bandwidth is quadrupled without<br />

the need for a complicated Northbridge with enormous<br />

bandwidth. Having a memory controller integrated into<br />

the CPU has some additional advantages; we will not dig<br />

4 For completeness it should be mentioned that such a memory controller<br />

arrangement can be used for other purposes such as “memory<br />

RAID” which is useful in combination with hotplug memory.<br />

RAM<br />

RAM<br />

PCI-E<br />

CPU1<br />

CPU3<br />

CPU2<br />

CPU4<br />

Southbridge<br />

RAM<br />

RAM<br />

SATA<br />

USB<br />

Figure 2.3: Integrated <strong>Memory</strong> Controller<br />

deeper into this technology here.<br />

There are disadvantages to this architecture, too. First of<br />

all, because the machine still has to make all the memory<br />

of the system accessible to all processors, the memory<br />

is not uniform anymore (hence the name NUMA -<br />

Non-Uniform <strong>Memory</strong> Architecture - for such an architecture).<br />

Local memory (memory attached to a processor)<br />

can be accessed with the usual speed. The situation<br />

is different when memory attached to another processor<br />

is accessed. In this case the interconnects between the<br />

processors have to be used. To access memory attached<br />

to CPU2 from CPU1 requires communication across one<br />

interconnect. When the same CPU accesses memory attached<br />

to CPU4 two interconnects have to be crossed.<br />

Each such communication has an associated cost. We<br />

talk about “NUMA factors” when we describe the extra<br />

time needed to access remote memory. The example<br />

architecture in Figure 2.3 has two levels for each CPU:<br />

immediately adjacent CPUs and one CPU which is two<br />

interconnects away. With more complicated machines<br />

the number of levels can grow significantly. There are<br />

also machine architectures (for instance IBM’s x445 and<br />

SGI’s Altix series) where there is more than one type<br />

of connection. CPUs are organized into nodes; within a<br />

node the time to access the memory might be uniform or<br />

have only small NUMA factors. The connection between<br />

nodes can be very expensive, though, and the NUMA<br />

factor can be quite high.<br />

Commodity NUMA machines exist today and will likely<br />

play an even greater role in the future. It is expected that,<br />

from late 2008 on, every SMP machine will use NUMA.<br />

The costs associated with NUMA make it important to<br />

recognize when a program is running on a NUMA machine.<br />

In section 5 we will discuss more machine architectures<br />

and some technologies the Linux kernel provides<br />

for these programs.<br />

Beyond the technical details described in the remainder<br />

of this section, there are several additional factors which<br />

influence the performance of RAM. They are not controllable<br />

by software, which is why they are not covered<br />

in this section. The interested reader can learn about<br />

some of these factors in section 2.1. They are really only<br />

needed to get a more complete picture of RAM technology<br />

and possibly to make better decisions when purchasing<br />

computers.<br />

4 Version 1.0 <strong>What</strong> <strong>Every</strong> <strong>Programmer</strong> <strong>Should</strong> <strong>Know</strong> <strong>About</strong> <strong>Memory</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!