03.03.2013 Views

Uniprocessor Computer Architecture MP Example: Intel Pentium Pro ...

Uniprocessor Computer Architecture MP Example: Intel Pentium Pro ...

Uniprocessor Computer Architecture MP Example: Intel Pentium Pro ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 2: <strong>Computer</strong>-System Structures<br />

■ Last lecture: why study operating systems?<br />

■ Purpose of this lecture: general knowledge of the<br />

structure of a computer system and understanding<br />

technology trends<br />

■ Key issues in a computer system<br />

✦ General System <strong>Architecture</strong> (CPU, $s, MM, disk, bus, IO<br />

devices and controllers), Uni vs. Multi <strong>Pro</strong>cessors<br />

✦ I/O Structure (IO interrupts, IO methods, HW support, e.g.,<br />

DMA)<br />

✦ Storage Structure (CPU regs, $, MM, disk)<br />

✦ Storage Hierarchy (why? expensive cheap; small large)<br />

✦ Hardware <strong>Pro</strong>tection (user/system, IO protection, Mem<br />

protection)<br />

<strong>Uniprocessor</strong> <strong>Computer</strong> <strong>Architecture</strong><br />

<strong>Example</strong>: SUN Enterprise<br />

P P<br />

$ $<br />

✦ 16 cards of either type: processors + memory, or I/O<br />

$ 2<br />

Gigaplane bus (256 data, 41 address, 83 MHz)<br />

100bT, SCSI<br />

$2 Mem ctrl<br />

Bus interface/switch<br />

Bus interface<br />

SBUS<br />

SBUS<br />

SBUS<br />

2 FiberChannel<br />

✦ All memory accessed over bus, so symmetric multiproc. (S<strong>MP</strong>)<br />

✦ Higher bandwidth, higher latency bus<br />

CPU/mem<br />

cards<br />

I/O cards<br />

Hmm … this looks like a <strong>Computer</strong> System?<br />

Algorithms <strong>Pro</strong>gramming<br />

Languages<br />

Compiler<br />

The System<br />

Hardware<br />

Technology,<br />

<strong>Architecture</strong><br />

✦ Figure by courtesy of Anant Agarwal, MIT<br />

Runtime,<br />

Operating System<br />

<strong>MP</strong> <strong>Example</strong>: <strong>Intel</strong> <strong>Pentium</strong> <strong>Pro</strong> Quad<br />

CPU<br />

Interrupt<br />

controller<br />

256-KB<br />

L 2 $<br />

Bus interface<br />

P-<strong>Pro</strong><br />

module<br />

P-<strong>Pro</strong> bus (64-bit data, 36-bit address, 66 MHz)<br />

PCI<br />

I/O<br />

cards<br />

PCI<br />

bridge<br />

PCI bus<br />

PCI<br />

bridge<br />

PCI bus<br />

✦ Multiprocessor<br />

<strong>Example</strong>: Cray T3E<br />

P-<strong>Pro</strong><br />

module<br />

P-<strong>Pro</strong><br />

module<br />

Memory<br />

controller<br />

MIU<br />

1-, 2-, or 4-way<br />

interleaved<br />

DRAM<br />

✦ All coherence and<br />

multiprocessing glue in<br />

processor module<br />

✦ Highly integrated, targeted at<br />

high volume<br />

External I/O<br />

•Multiprocessor system<br />

•Scale up to 1024 processors, 480MB/s links<br />

XY<br />

P<br />

$<br />

Mem<br />

ctrl<br />

and NI<br />

Switch<br />

Z<br />

Mem<br />

1


Let’s look at trends, 1 st Technology Trends<br />

Performance<br />

100<br />

10<br />

1<br />

Mainframes<br />

Supercomputers<br />

Minicomputers<br />

Microprocessors<br />

0.1<br />

1965 1970 1975 1980 1985 1990 1995<br />

The natural building block for multiprocessors is now also about the fastest!<br />

Clock Frequency Growth Rate<br />

Clock rate (MHz)<br />

• 30% per year<br />

1,000<br />

◆<br />

◆<br />

◆◆ ◆<br />

◆◆ ◆◆<br />

◆<br />

◆ ◆<br />

◆ ◆<br />

◆<br />

◆ ◆◆<br />

◆<br />

◆<br />

◆<br />

◆<br />

◆<br />

◆<br />

◆<br />

◆◆<br />

◆<br />

◆<br />

◆<br />

◆<br />

◆◆<br />

◆<br />

◆<br />

◆<br />

◆<br />

◆<br />

◆<br />

◆<br />

◆<br />

◆<br />

◆ ◆◆◆<br />

◆<br />

◆<br />

◆<br />

◆<br />

◆<br />

◆◆ ◆<br />

◆ ◆<br />

◆<br />

◆<br />

◆ ◆◆◆◆ ◆<br />

◆<br />

◆<br />

100<br />

◆<br />

◆<br />

10 i8086 i80286<br />

i8080<br />

1<br />

i8008<br />

i4004<br />

i80386<br />

R10000<br />

<strong>Pentium</strong>100<br />

0.1<br />

1970 1980 1990 2000<br />

1975 1985 1995 2005<br />

Architectural Trends: Bus-based <strong>MP</strong>s<br />

•Micro on a chip makes it natural to connect many to shared memory<br />

– dominates server and enterprise market, moving down to desktop<br />

•Faster processors began to saturate bus, then bus technology advanced<br />

– today, range of sizes for bus-based systems, desktop to large servers<br />

Number of processors<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

Sequent B2100<br />

●<br />

● Sequent B8000<br />

Symmetry81<br />

●<br />

SGI Challenge<br />

●<br />

CRAY CS6400●<br />

●<br />

Sunı<br />

E10000<br />

SE60<br />

Sun E6000<br />

● ●<br />

SE70<br />

●<br />

Sun SC2000●<br />

● SC2000E<br />

● SGI PowerChallenge/XL<br />

AS8400<br />

Symmetry21<br />

●<br />

Power ●<br />

SE10<br />

●<br />

SS1000●<br />

●<br />

SE30<br />

●<br />

● SS1000E<br />

SS690<strong>MP</strong> 140 ● AS2100●<br />

HP K400 ● P-<strong>Pro</strong><br />

SGI PowerSeries ●<br />

SS690<strong>MP</strong> 120 ● SS10●<br />

● SS20<br />

0<br />

1984 1986 1988 1990 1992 1994 1996 1998<br />

General Technology Trends<br />

• Microprocessor performance increases 50% - 100% per year<br />

• Transistor count doubles every 3 years<br />

• DRAM size quadruples every 3 years<br />

• Huge investment per generation is carried by huge commodity market<br />

180<br />

160<br />

140<br />

120<br />

100<br />

DEC<br />

alpha<br />

80<br />

60<br />

40<br />

20<br />

0<br />

MIPS<br />

Sun 4<br />

M/120<br />

260<br />

IBM<br />

RS6000<br />

540<br />

MIPS<br />

M2000<br />

HP 9000<br />

750<br />

1987 1988 1989 1990 1991 1992<br />

Integer FP<br />

• Not that single-processor performance is plateauing, but that<br />

parallelism is a natural way to improve it.<br />

Transistor Count Growth Rate<br />

Transistors<br />

100,000,000<br />

10,000,000<br />

1,000,000<br />

100,000<br />

◆ ◆<br />

◆<br />

◆<br />

◆◆<br />

◆<br />

◆ ◆◆◆◆<br />

◆<br />

◆<br />

◆<br />

◆ ◆<br />

◆<br />

◆<br />

◆ ◆◆<br />

◆ ◆<br />

◆<br />

◆<br />

◆<br />

◆◆ ◆<br />

◆ R10000<br />

◆◆◆<br />

◆<br />

◆◆ ◆ <strong>Pentium</strong><br />

◆<br />

◆◆<br />

i80386<br />

i80286<br />

R3000<br />

R2000<br />

◆ ◆<br />

◆ ◆<br />

◆ i8086<br />

10,000<br />

◆<br />

i8080<br />

i8008<br />

i4004<br />

1,000<br />

1970 1980 1990 2000<br />

1975 1985 1995 2005<br />

• 100 million transistors on chip by early 2000’s A.D.<br />

• Transistor count grows much faster than clock rate<br />

- 40% per year, order of magnitude more contribution in 2 decades<br />

Shared bus bandwidth (MB/s)<br />

100,000<br />

10,000<br />

1,000<br />

100<br />

Bus Bandwidth<br />

Sun E10000<br />

●<br />

SGIı<br />

● Sun E6000<br />

PowerChı<br />

XL ● AS8400<br />

SGI Challenge ● ●<br />

● CS6400<br />

● HPK400<br />

● SC2000E<br />

● SC2000 ● AS2100 ● P-<strong>Pro</strong><br />

SS690<strong>MP</strong> 120ı<br />

●<br />

SS690<strong>MP</strong> 140<br />

SS1000 ●<br />

SS10/ı ●<br />

SE10/ı<br />

● SS1000E<br />

● SS20<br />

● SE70/SE30<br />

Symmetry81/21<br />

SE60<br />

●<br />

● SGI PowerSeries ● Power<br />

●<br />

Sequentı<br />

B8000<br />

● Sequent B2100<br />

10<br />

1984 1986 1988 1990 1992 1994 1996 1998<br />

2


Phases in VLSI Generation<br />

Transistors<br />

100,000,000<br />

10,000,000<br />

1,000,000<br />

100,000<br />

10,000<br />

i8080<br />

◆<br />

◆ i8008<br />

◆<br />

◆ i4004<br />

Bit-level parallelism Instruction-level Thread-level (?)<br />

◆<br />

i80286◆<br />

◆ ◆<br />

◆ i8086<br />

◆<br />

◆<br />

◆◆◆◆ ◆<br />

◆<br />

◆ ◆ ◆<br />

◆<br />

◆ ◆<br />

◆ ◆ ◆<br />

◆<br />

◆<br />

◆◆◆<br />

◆ ◆<br />

◆ ◆<br />

◆<br />

◆<br />

◆ ◆ ◆<br />

◆ R10000<br />

◆ ◆<br />

<strong>Pentium</strong><br />

i80386<br />

◆ ◆ R3000<br />

◆ R2000<br />

1,000<br />

1970 1975 1980 1985 1990 1995 2000 2005<br />

■ How good is instruction-level parallelism?<br />

■ Thread-level needed in microprocessors?<br />

<strong>Computer</strong>-System Operation<br />

■ I/O devices and the CPU can execute concurrently.<br />

■ Each device controller is in charge of a particular device<br />

type.<br />

■ Each device controller has a local buffer.<br />

■ CPU moves data from/to main memory to/from local<br />

buffers<br />

■ I/O is from the device to local buffer of controller.<br />

■ Device controller informs CPU that it has finished its<br />

operation by causing an interrupt.<br />

Interrupt Handling<br />

■ The operating system preserves the state of the CPU by<br />

storing registers and the program counter.<br />

■ Determines which type of interrupt has occurred:<br />

✦ polling<br />

✦ vectored interrupt system<br />

■ Separate segments of code determine what action should<br />

be taken for each type of interrupt<br />

◆<br />

Economics<br />

■ Commodity microprocessors not only fast but CHEAP<br />

• Development cost is tens of millions of dollars (5-100 typical)<br />

• BUT, many more are sold compared to supercomputers<br />

✦ Crucial to take advantage of the investment, and use the commodity<br />

building block<br />

✦ Exotic parallel architectures no more than special-purpose<br />

■ Multiprocessors being pushed by software vendors (e.g. database) as<br />

well as hardware vendors<br />

■ Standardization by <strong>Intel</strong> makes small, bus-based S<strong>MP</strong>s commodity<br />

■ Desktop: few smaller processors versus one larger one? Multiprocessor<br />

on a chip is here.<br />

Common Functions of Interrupts<br />

■ Interrupt transfers control to the interrupt service routine<br />

generally, through the interrupt vector, which contains the<br />

addresses of all the service routines.<br />

■ Interrupt architecture must save the address of the<br />

interrupted instruction.<br />

■ Incoming interrupts are disabled while another interrupt is<br />

being processed to prevent a lost interrupt.<br />

■ A trap is a software-generated interrupt caused either by<br />

an error or a user request.<br />

■ An operating system is interrupt driven.<br />

Interrupt Time Line For a Single <strong>Pro</strong>cess Doing Output<br />

3


I/O Structure<br />

■ After I/O starts, control returns to user program only upon<br />

I/O completion.<br />

✦ Wait instruction idles the CPU until the next interrupt<br />

✦ Wait loop (contention for memory access).<br />

✦ At most one I/O request is outstanding at a time, no<br />

simultaneous I/O processing.<br />

■ After I/O starts, control returns to user program without<br />

waiting for I/O completion.<br />

✦ System call – request to the operating system to allow user<br />

to wait for I/O completion.<br />

✦ Device-status table contains entry for each I/O device<br />

indicating its type, address, and state.<br />

✦ Operating system indexes into I/O device table to determine<br />

device status and to modify table entry to include interrupt.<br />

Two I/O Methods<br />

Synchronous Asynchronous<br />

Device-Status Table Direct Memory Access Structure<br />

Storage Structure<br />

■ Main memory – only large storage media that the CPU<br />

can access directly.<br />

■ Secondary storage – extension of main memory that<br />

provides large nonvolatile storage capacity.<br />

■ Magnetic disks – rigid metal or glass platters covered with<br />

magnetic recording material<br />

✦ Disk surface is logically divided into tracks, which are<br />

subdivided into sectors.<br />

✦ The disk controller determines the logical interaction<br />

between the device and the computer.<br />

■ Used for high-speed I/O devices able to transmit<br />

information at close to memory speeds.<br />

■ Device controller transfers blocks of data from buffer<br />

storage directly to main memory without CPU<br />

intervention.<br />

■ Only on interrupt is generated per block, rather than the<br />

one interrupt per byte.<br />

Moving-Head Disk Mechanism<br />

4


Storage Hierarchy<br />

■ Storage systems organized in hierarchy.<br />

✦ Speed<br />

✦ Cost<br />

✦ Volatility<br />

■ Caching – copying information into faster storage system;<br />

main memory can be viewed as a last cache for<br />

secondary storage.<br />

Caching<br />

■ Use of high-speed memory to hold recently-accessed<br />

data.<br />

■ Requires a cache management policy.<br />

■ Caching introduces another level in storage hierarchy.<br />

This requires data that is simultaneously stored in more<br />

than one level to be consistent.<br />

■ Caching is typically transparent to the OS<br />

■ Dual-Mode Operation<br />

■ I/O <strong>Pro</strong>tection<br />

■ Memory <strong>Pro</strong>tection<br />

■ CPU <strong>Pro</strong>tection<br />

Hardware <strong>Pro</strong>tection<br />

Storage-Device Hierarchy<br />

Migration of A From Disk to Register<br />

Dual-Mode Operation<br />

■ Sharing system resources requires operating system to<br />

ensure that an incorrect program cannot cause other<br />

programs to execute incorrectly.<br />

■ <strong>Pro</strong>vide hardware support to differentiate between at least<br />

two modes of operations.<br />

1. User mode – execution done on behalf of a user.<br />

2. Monitor mode (also kernel mode or system mode) –<br />

execution done on behalf of operating system.<br />

5


Dual-Mode Operation (Cont.)<br />

■ Mode bit added to computer hardware to indicate the<br />

current mode: monitor (0) or user (1).<br />

■ When an interrupt or fault occurs hardware switches to<br />

monitor mode.<br />

Interrupt/fault<br />

monitor user<br />

set user mode<br />

Privileged instructions can be issued only in monitor mode.<br />

I/O <strong>Pro</strong>tection<br />

■ All I/O instructions are privileged instructions.<br />

■ Must ensure that a user program could never gain control<br />

of the computer in monitor mode (I.e., a user program<br />

that, as part of its execution, stores a new address in the<br />

interrupt vector).<br />

Use of A System Call to Perform I/O Memory <strong>Pro</strong>tection<br />

■ Must provide memory protection at least for the interrupt<br />

vector and the interrupt service routines.<br />

■ In order to have memory protection, add two registers<br />

that determine the range of legal addresses a program<br />

may access:<br />

✦ Base register – holds the smallest legal physical memory<br />

address.<br />

✦ Limit register – contains the size of the range<br />

■ Memory outside the defined range is protected.<br />

Use of A Base and Limit Register Hardware Address <strong>Pro</strong>tection<br />

6


Hardware <strong>Pro</strong>tection<br />

■ When executing in monitor mode, the operating system<br />

has unrestricted access to both monitor and user’s<br />

memory.<br />

■ The load instructions for the base and limit registers are<br />

privileged instructions.<br />

■ Local Area Networks (LAN)<br />

■ Wide Area Networks (WAN)<br />

Network Structure<br />

Wide Area Network Structure<br />

CPU <strong>Pro</strong>tection<br />

■ Timer – interrupts computer after specified period to<br />

ensure operating system maintains control.<br />

✦ Timer is decremented every clock tick.<br />

✦ When timer reaches the value 0, an interrupt occurs.<br />

■ Timer commonly used to implement time sharing.<br />

■ Time also used to compute the current time.<br />

■ Load-timer is a privileged instruction.<br />

Local Area Network Structure<br />

7

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!