Computer Organisation and Design (2014)

Recommendations

Info

xviii Preface learning more about floating-point arithmetic. Some will skip parts of Chapter 3, either because they don’t need them or because they offer a review. However, we introduce the running example of matrix multiply in this chapter, showing how subword parallels offers a fourfold improvement, so don’t skip sections 3.6 to 3.8. Chapter 4 explains pipelined processors. Sections 4.1, 4.5, and 4.10 give overviews and Section 4.12 gives the next performance boost for matrix multiply for those with a software focus. Those with a hardware focus, however, will find that this chapter presents core material; they may also, depending on their background, want to read Appendix C on logic design first. The last chapter on multicores, multiprocessors, and clusters, is mostly new content and should be read by everyone. It was significantly reorganized in this edition to make the flow of ideas more natural and to include much more depth on GPUs, warehouse scale computers, and the hardware-software interface of network interface cards that are key to clusters. The first of the six goals for this firth edition was to demonstrate the importance of understanding modern hardware to get good performance and energy efficiency with a concrete example. As mentioned above, we start with subword parallelism in Chapter 3 to improve matrix multiply by a factor of 4. We double performance in Chapter 4 by unrolling the loop to demonstrate the value of instruction level parallelism. Chapter 5 doubles performance again by optimizing for caches using blocking. Finally, Chapter 6 demonstrates a speedup of 14 from 16 processors by using thread-level parallelism. All four optimizations in total add just 24 lines of C code to our initial matrix multiply example. The second goal was to help readers separate the forest from the trees by identifying eight great ideas of computer architecture early and then pointing out all the places they occur throughout the rest of the book. We use (hopefully) easy to remember margin icons and highlight the corresponding word in the text to remind readers of these eight themes. There are nearly 100 citations in the book. No chapter has less than seven examples of great ideas, and no idea is cited less than five times. Performance via parallelism, pipelining, and prediction are the three most popular great ideas, followed closely by Moore’s Law. The processor chapter (4) is the one with the most examples, which is not a surprise since it probably received the most attention from computer architects. The one great idea found in every chapter is performance via parallelism, which is a pleasant observation given the recent emphasis in parallelism in the field and in editions of this book. The third goal was to recognize the generation change in computing from the PC era to the PostPC era by this edition with our examples and material. Thus, Chapter 1 dives into the guts of a tablet computer rather than a PC, and Chapter 6 describes the computing infrastructure of the cloud. We also feature the ARM, which is the instruction set of choice in the personal mobile devices of the PostPC era, as well as the x86 instruction set that dominated the PC Era and (so far) dominates cloud computing. The fourth goal was to spread the I/O material throughout the book rather than have it in its own chapter, much as we spread parallelism throughout all the chapters in the fourth edition. Hence, I/O material in this edition can be found in
Page 2 and 3: In Praise of Computer Organization
Page 4 and 5: F I F T H E D I T I O N Computer Or
Page 8 and 9: To Linda, who has been, is, and alw
Page 16 and 17: Preface The most beautiful thing we
Page 20 and 21: Preface xix Sections 1.4, 4.9, 5.2,
Page 22 and 23: I am grateful to the many instructo
Page 24 and 25: This page intentionally left blank
Page 27 and 28: 4 Chapter 1 Computer Abstractions a
Page 31: 8 Chapter 1 Computer Abstractions a
Page 37: 14 Chapter 1 Computer Abstractions
Page 40 and 41: 1.4 Under the Covers 17 computer, a
Page 42 and 43: 1.4 Under the Covers 19 Touchscreen
Page 44: 1.4 Under the Covers 21 cache memor
Page 51 and 52: 28 Chapter 1 Computer Abstractions
Page 71 and 72:
48 Chapter 1 Computer Abstractions
Page 73 and 74:
50 Chapter 1 Computer Abstractions
Page 78 and 79:
1.13 Exercises 55 e. Library reserv
Page 80 and 81:
1.13 Exercises 57 1.9.2 [10] If th
Page 82 and 83:
1.13 Exercises 59 1.13.2 [5] By ho
Page 88:
2.2 Operations of the Computer Hard
Page 92 and 93:
2.3 Operands of the Computer Hardwa
Page 94:
2.3 Operands of the Computer Hardwa
Page 98 and 99:
2.4 Signed and Unsigned Numbers 75
Page 100 and 101:
Page 102:
Page 105 and 106:
82 Chapter 2 Instructions: Language
Page 107:
Page 112:
2.6 Logical Operations 89 To place
Page 115 and 116:
Page 117 and 118:
Page 120 and 121:
2.8 Supporting Procedures in Comput
Page 122 and 123:
Page 124 and 125:
Page 126:
Page 130 and 131:
2.9 Communicating with People 107 A
Page 132 and 133:
2.9 Communicating with People 109 a
Page 135 and 136:
112 Chapter 2 Instructions: Languag
Page 137:
Page 141:
Page 144 and 145:
2.11 Parallelism and Instructions:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153:
Page 158 and 159:
2.13 A C Sort Example to Put It All
Page 160 and 161:
2.13 A C Sort Example to Put It All
Page 165 and 166:
Page 170 and 171:
2.16 Real Stuff: ARMv7 (32-bit) Ins
Page 174 and 175:
2.17 Real Stuff: x86 Instructions 1
Page 176 and 177:
Page 178 and 179:
Page 180 and 181:
Page 183:
Page 188 and 189:
2.22 Exercises 165 2.3 [5] For the
Page 190 and 191:
2.22 Exercises 167 2.12 Assume that
Page 192 and 193:
2.22 Exercises 169 2.20 [5] Find t
Page 194 and 195:
2.22 Exercises 171 addi $t1, $t1, 1
Page 196 and 197:
2.22 Exercises 173 2.42 [5] If the
Page 198 and 199:
This page intentionally left blank
Page 202:
3.2 Addition and Subtraction 179 (0
Page 206:
3.3 Multiplication 183 3.3 Multipli
Page 210:
3.3 Multiplication 187 Iteration St
Page 214:
3.4 Division 191 Start 1. Subtract
Page 219 and 220:
196 Chapter 3 Arithmetic for Comput
Page 221 and 222:
Page 223 and 224:
Page 225 and 226:
Page 227:
Page 230 and 231:
3.5 Floating Point 207 Sign Exponen
Page 233 and 234:
Page 236 and 237:
3.5 Floating Point 213 op(31:26): 2
Page 238 and 239:
3.5 Floating Point 215 (Many compil
Page 240 and 241:
3.5 Floating Point 217 sll $t0, $s0
Page 242:
3.5 Floating Point 219 Rounding wit
Page 249 and 250:
Page 251 and 252:
Page 253 and 254:
Page 255 and 256:
Page 258:
3.10 Concluding Remarks 235 Remaini
Page 261 and 262:
Page 263 and 264:
Page 265:
4 The Processor In a major matter,
Page 269 and 270:
246 Chapter 4 The Processor First,
Page 277:
254 Chapter 4 The Processor sign-ex
Page 280 and 281:
4.3 Building a Datapath 257 Buildin
Page 285 and 286:
262 Chapter 4 The Processor Field 0
Page 288:
4.4 A Simple Implementation Scheme
Page 293 and 294:
270 Chapter 4 The Processor single-
Page 295:
272 Chapter 4 The Processor compute
Page 298:
4.5 An Overview of Pipelining 275 t
Page 301 and 302:
278 Chapter 4 The Processor As we s
Page 304:
4.5 An Overview of Pipelining 281 F
Page 311 and 312:
288 Chapter 4 The Processor the fir
Page 313 and 314:
290 Chapter 4 The Processor less ti
Page 315 and 316:
292 Chapter 4 The Processor Iw Exec
Page 317 and 318:
294 Chapter 4 The Processor sw Exec
Page 319 and 320:
296 Chapter 4 The Processor instruc
Page 322 and 323:
4.6 Pipelined Datapath and Control
Page 324 and 325:
4.6 Pipelined Datapath and Control
Page 327 and 328:
304 Chapter 4 The Processor PCSrc I
Page 329 and 330:
306 Chapter 4 The Processor OR inst
Page 332 and 333:
4.7 Data Hazards: Forwarding versus
Page 334:
4.7 Data Hazards: Forwarding versus
Page 339:
316 Chapter 4 The Processor Hazard
Page 346 and 347:
4.8 Control Hazards 323 The limitat
Page 348:
4.9 Exceptions 325 IF.Flush Hazard
Page 352 and 353:
4.9 Exceptions 329 IF.Flush EX.Flus
Page 354:
4.9 Exceptions 331 We mentioned fiv
Page 358 and 359:
4.10 Parallelism via Instructions 3
Page 360 and 361:
Page 362 and 363:
Page 364:
Page 367 and 368:
344 Chapter 4 The Processor Micropr
Page 372 and 373:
4.11 Real Stuff: The ARM Cortex-A8
Page 375 and 376:
352 Chapter 4 The Processor 1 #incl
Page 378:
4.14 Fallacies and Pitfalls 355 4.1
Page 382 and 383:
4.17 Exercises 359 4.5 For the prob
Page 386:
4.17 Exercises 363 4.10.6 [10] Ass
Page 389 and 390:
366 Chapter 4 The Processor Section
Page 391 and 392:
368 Chapter 4 The Processor 4.17.2
Page 393 and 394:
370 Chapter 4 The Processor Answers
Page 395:
5 Ideally one would desire an indef
Page 399 and 400:
376 Chapter 5 Large and Fast: Explo
Page 402:
5.2 Memory Technologies 379 SRAM Te
Page 405 and 406:
Page 407:
Page 413:
Page 416 and 417:
5.3 The Basics of Caches 393 for me
Page 418:
5.3 The Basics of Caches 395 In a w
Page 422 and 423:
5.4 Measuring and Improving Cache P
Page 424 and 425:
Page 426 and 427:
Page 429 and 430:
Page 432 and 433:
Page 434 and 435:
Page 436 and 437:
Page 438 and 439:
Page 440:
Page 445:
Page 449 and 450:
Page 451:
Page 454 and 455:
5.7 Virtual Memory 431 The next few
Page 457:
Page 460:
5.7 Virtual Memory 437 segment tabl
Page 463:
Page 466 and 467:
5.7 Virtual Memory 443 Overall Oper
Page 468 and 469:
5.7 Virtual Memory 445 TLB. To writ
Page 470:
5.7 Virtual Memory 447 In addition,
Page 473:
Page 476 and 477:
5.7 Virtual Memory 453 Writes to se
Page 480 and 481:
5.8 A Common Framework for Memory H
Page 482:
5.8 A Common Framework for Memory H
Page 487:
Page 492:
5.10 Parallelism and Memory Hierarc
Page 496:
5.13 Real Stuff: The ARM Cortex-A8
Page 500:
5.14 Going Faster: Cache Blocking a
Page 504:
5.15 Fallacies and Pitfalls 481 Pro
Page 516:
5.18 Exercises 493 Consider the fol
Page 520 and 521:
5.18 Exercises 497 5.19 In this exe
Page 522 and 523:
This page intentionally left blank
Page 526 and 527:
6.1 Introduction 503 multicore micr
Page 528 and 529:
6.2 The Difficulty of Creating Para
Page 530:
6.2 The Difficulty of Creating Para
Page 535 and 536:
512 Chapter 6 Parallel Processors f
Page 537:
Page 543 and 544:
Page 545 and 546:
Page 549 and 550:
Page 551:
Page 556 and 557:
6.7 Clusters, Warehouse Scale Compu
Page 558:
6.7 Clusters, Warehouse Scale Compu
Page 563:
Page 567 and 568:
Page 573 and 574:
Page 575 and 576:
Page 577 and 578:
Page 579:
Page 585:
Page 588:
6.16 Exercises 565 6.2.3 [5] Assum
Page 591:
Page 594 and 595:
6.16 Exercises 571 6.12 A systolic
Page 596:
6.16 Exercises 573 arrays. Let R be
Page 599:
A A P P E N D I X Fear of serious i
Page 602 and 603:
A.1 Introduction A-5 00100111101111
Page 604 and 605:
A.1 Introduction A-7 FIGURE A.1.4 T
Page 606 and 607:
A.1 Introduction A-9 This improveme
Page 608 and 609:
A.2 Assemblers A-11 be referenced f
Page 610 and 611:
A.2 Assemblers A-13 Elaboration: If
Page 612 and 613:
A.2 Assemblers A-15 specify data in
Page 614:
A.2 Assemblers A-17 la $a0, int_str
Page 618 and 619:
A.5 Memory Usage A-21 7fffffff hex
Page 620:
A.6 Procedure Call Convention A-23
Page 623 and 624:
A-26 Appendix A Assemblers, Linkers
Page 625 and 626:
Page 627 and 628:
Page 629:
Page 633 and 634:
Page 635:
Page 638:
A.9 SPIM A-41 MIPS programs. It con
Page 644 and 645:
A.10 MIPS R2000 Assembly Language A
Page 646:
A.10 MIPS R2000 Assembly Language A
Page 649 and 650:
Page 651 and 652:
Page 653 and 654:
Page 655 and 656:
Page 657 and 658:
Page 659 and 660:
Page 661 and 662:
Page 663 and 664:
Page 665 and 666:
Page 667 and 668:
Page 669 and 670:
Page 671 and 672:
Page 673 and 674:
Page 675 and 676:
Page 677 and 678:
Page 680 and 681:
A.12 Exercises A-83 A.10 [10] Usin
Page 685 and 686:
B-6 Appendix B The Basics of Logic
Page 687 and 688:
Page 689 and 690:
Page 691 and 692:
Page 693:
Page 700 and 701:
B.4 Using a Hardware Description La
Page 702 and 703:
B.4 Using a Hardware Description La
Page 704 and 705:
B.5 Constructing a Basic Arithmetic
Page 709:
Page 714 and 715:
B.1 Introduction B-35 Alas, the tes
Page 716:
B.5 Constructing a Basic Arithmetic
Page 719 and 720:
Page 723 and 724:
Page 725 and 726:
Page 728:
B.7 Clocks B-49 clock edge occurs.
Page 731 and 732:
Page 733 and 734:
Page 735 and 736:
Page 737 and 738:
Page 739:
Page 743 and 744:
Page 749:
Page 752 and 753:
B.11 Timing Methodologies B-73 the
Page 754 and 755:
B.11 Timing Methodologies B-75 Leve
Page 756:
B.13 Concluding Remarks B-77 the fl
Page 761 and 762:
Page 763:
Page 766 and 767:
B.14 Exercises B-87 B.39 [15] §§B
Page 768 and 769:
Index Note: Online information is l
Page 770 and 771:
Index I-3 ASCII versus, 107 convers
Page 772 and 773:
Index I-5 Commercial computer devel
Page 774 and 775:
Index I-7 addressing modes, E-6 arc
Page 776 and 777:
Index I-9 Floating point, 196-222,
Page 778 and 779:
Index I-11 control line, 300 load i
Page 780 and 781:
Index I-13 Local labels, A-11 Local
Page 782 and 783:
Index I-15 Move instructions, A-70-
Page 784 and 785:
Index I-17 size, 430 virtual number
Page 786 and 787:
Index I-19 instruction set, 235 Pse
Page 788 and 789:
Index I-21 sltiu (Set Less Than Imm
Page 790 and 791:
Index I-23 Throughput defined, 30-3
show all

Computer Organisation and Design (2014)

Create successful ePaper yourself

Delete template?

Save as template?