12.07.2015 Views

COPYRIGHT 2008, PRINCETON UNIVERSITY PRESS

COPYRIGHT 2008, PRINCETON UNIVERSITY PRESS

COPYRIGHT 2008, PRINCETON UNIVERSITY PRESS

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

high-performance computing hardware 353• Multiple cores.• Fast central registers.• Very large, fast memories.• Very fast communication among functional units.• Vector, video, or array processors.• Software that integrates the above effectively.As a simple example, it makes little sense to have a CPU of incredibly high speedcoupled to a memory system and software that cannot keep up with it (the presentstate of affairs).14.2 Memory HierarchyAn idealized model of computer architecture is a CPU sequentially executing astream of instructions and reading from a continuous block of memory. To illustrate,in Figure 14.1 we see a vector A[ ] and an array M[ ][ ] loaded in memory andabout to be processed. The real world is more complicated than this. First, matricesare not stored in blocks but rather in linear order. For instance, in Fortran it is incolumn-major order:M(1,1) M(2,1) M(3,1) M(1,2) M(2,2) M(3,2) M(1,3) M(2,3) M(3,3),while in Java and C it is in row-major order:M(0,0) M(0,1) M(0,2) M(1,0) M(1,1) M(1,2) M(2,0) M(2,1) M(2,2).Second, the values for the matrix elements may not even be in the same physicalplace. Some may be in RAM, some on the disk, some in cache, and some in theCPU. To give some of these words more meaning, in Figures 14.2 and 14.3 we showsimple models of the memory architecture of a high-performance computer. Thishierarchical arrangement arises from an effort to balance speed and cost with fast,expensive memory supplemented by slow, less expensive memory. The memoryarchitecture may include the following elements:CPU: Central processing unit, the fastest part of the computer. The CPU consistsof a number of very-high-speed memory units called registers containing theinstructions sent to the hardware to do things like fetch, store, and operateon data. There are usually separate registers for instructions, addresses, andoperands (current data). In many cases the CPU also contains some specializedparts for accelerating the processing of floating-point numbers.Cache (high-speed buffer): A small, very fast bit of memory that holds instructions,addresses, and data in their passage between the very fast CPU registersand the slower RAM. This is seen in the next level down the pyramid inFigure 14.3. The main memory is also called dynamic RAM (DRAM), while the−101<strong>COPYRIGHT</strong> <strong>2008</strong>, PRINCET O N UNIVE R S I T Y P R E S SEVALUATION COPY ONLY. NOT FOR USE IN COURSES.ALLpup_06.04 — <strong>2008</strong>/2/15 — Page 353

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!