12.07.2015 Views

COPYRIGHT 2008, PRINCETON UNIVERSITY PRESS

COPYRIGHT 2008, PRINCETON UNIVERSITY PRESS

COPYRIGHT 2008, PRINCETON UNIVERSITY PRESS

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

high-performance computing hardware 359Dual CPU Core ChipCPU CoreandL1 CachesCPU CoreandL1 CachesBus InterfaceandL2 CachesFigure 14.5 Left: A generic view of the Intel core-2 dual-core processor, with CPU-local level-1caches and a shared, on-die level-2 cache (courtesy of D. Schmitz). Right: The AMD Athlon 64X2 3600 dual-core CPU (Wikimedia Commons).chips, multicore chips use fewer transistors per CPU and are thus simpler to makeand cooler to run.Parallelism is built into a multicore chip because each core can run a differenttask. However, since the cores usually share the same communication channel andlevel-2 cache, there is the possibility of a communication bottleneck if both CPUsuse the bus at the same time. Usually the user need not worry about this, but thewriters of compilers and software must so that your code will run in parallel. Asindicated in our MPI tutorial in Appendix D, modern Intel compilers make use ofeach multiple core and even have MPI treat each core as a separate processor.14.6 CPU Design: Vector ProcessorOften the most demanding part of a scientific computation involves matrix operations.On a classic (von Neumann) scalar computer, the addition of two vectorsof physical length 99 to form a third ultimately requires 99 sequential additions(Table 14.2). There is actually much behind-the-scenes work here. For each elementi there is the fetch of a(i) from its location in memory, the fetch of b(i) from its locationin memory, the addition of the numerical values of these two elements in a CPUregister, and the storage in memory of the sum in c(i). This fetching uses up timeand is wasteful in the sense that the computer is being told again and again to dothe same thing.When we speak of a computer doing vector processing, we mean that there arehardware components that perform mathematical operations on entire rows orcolumns of matrices as opposed to individual elements. (This hardware can alsohandle single-subscripted matrices, that is, mathematical vectors.) In the vector−101<strong>COPYRIGHT</strong> <strong>2008</strong>, PRINCET O N UNIVE R S I T Y P R E S SEVALUATION COPY ONLY. NOT FOR USE IN COURSES.ALLpup_06.04 — <strong>2008</strong>/2/15 — Page 359

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!