Uniprocessor Computer Architecture MP Example: Intel Pentium Pro ...
Uniprocessor Computer Architecture MP Example: Intel Pentium Pro ...
Uniprocessor Computer Architecture MP Example: Intel Pentium Pro ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Chapter 2: <strong>Computer</strong>-System Structures<br />
■ Last lecture: why study operating systems?<br />
■ Purpose of this lecture: general knowledge of the<br />
structure of a computer system and understanding<br />
technology trends<br />
■ Key issues in a computer system<br />
✦ General System <strong>Architecture</strong> (CPU, $s, MM, disk, bus, IO<br />
devices and controllers), Uni vs. Multi <strong>Pro</strong>cessors<br />
✦ I/O Structure (IO interrupts, IO methods, HW support, e.g.,<br />
DMA)<br />
✦ Storage Structure (CPU regs, $, MM, disk)<br />
✦ Storage Hierarchy (why? expensive cheap; small large)<br />
✦ Hardware <strong>Pro</strong>tection (user/system, IO protection, Mem<br />
protection)<br />
<strong>Uniprocessor</strong> <strong>Computer</strong> <strong>Architecture</strong><br />
<strong>Example</strong>: SUN Enterprise<br />
P P<br />
$ $<br />
✦ 16 cards of either type: processors + memory, or I/O<br />
$ 2<br />
Gigaplane bus (256 data, 41 address, 83 MHz)<br />
100bT, SCSI<br />
$2 Mem ctrl<br />
Bus interface/switch<br />
Bus interface<br />
SBUS<br />
SBUS<br />
SBUS<br />
2 FiberChannel<br />
✦ All memory accessed over bus, so symmetric multiproc. (S<strong>MP</strong>)<br />
✦ Higher bandwidth, higher latency bus<br />
CPU/mem<br />
cards<br />
I/O cards<br />
Hmm … this looks like a <strong>Computer</strong> System?<br />
Algorithms <strong>Pro</strong>gramming<br />
Languages<br />
Compiler<br />
The System<br />
Hardware<br />
Technology,<br />
<strong>Architecture</strong><br />
✦ Figure by courtesy of Anant Agarwal, MIT<br />
Runtime,<br />
Operating System<br />
<strong>MP</strong> <strong>Example</strong>: <strong>Intel</strong> <strong>Pentium</strong> <strong>Pro</strong> Quad<br />
CPU<br />
Interrupt<br />
controller<br />
256-KB<br />
L 2 $<br />
Bus interface<br />
P-<strong>Pro</strong><br />
module<br />
P-<strong>Pro</strong> bus (64-bit data, 36-bit address, 66 MHz)<br />
PCI<br />
I/O<br />
cards<br />
PCI<br />
bridge<br />
PCI bus<br />
PCI<br />
bridge<br />
PCI bus<br />
✦ Multiprocessor<br />
<strong>Example</strong>: Cray T3E<br />
P-<strong>Pro</strong><br />
module<br />
P-<strong>Pro</strong><br />
module<br />
Memory<br />
controller<br />
MIU<br />
1-, 2-, or 4-way<br />
interleaved<br />
DRAM<br />
✦ All coherence and<br />
multiprocessing glue in<br />
processor module<br />
✦ Highly integrated, targeted at<br />
high volume<br />
External I/O<br />
•Multiprocessor system<br />
•Scale up to 1024 processors, 480MB/s links<br />
XY<br />
P<br />
$<br />
Mem<br />
ctrl<br />
and NI<br />
Switch<br />
Z<br />
Mem<br />
1
Let’s look at trends, 1 st Technology Trends<br />
Performance<br />
100<br />
10<br />
1<br />
Mainframes<br />
Supercomputers<br />
Minicomputers<br />
Microprocessors<br />
0.1<br />
1965 1970 1975 1980 1985 1990 1995<br />
The natural building block for multiprocessors is now also about the fastest!<br />
Clock Frequency Growth Rate<br />
Clock rate (MHz)<br />
• 30% per year<br />
1,000<br />
◆<br />
◆<br />
◆◆ ◆<br />
◆◆ ◆◆<br />
◆<br />
◆ ◆<br />
◆ ◆<br />
◆<br />
◆ ◆◆<br />
◆<br />
◆<br />
◆<br />
◆<br />
◆<br />
◆<br />
◆<br />
◆◆<br />
◆<br />
◆<br />
◆<br />
◆<br />
◆◆<br />
◆<br />
◆<br />
◆<br />
◆<br />
◆<br />
◆<br />
◆<br />
◆<br />
◆<br />
◆ ◆◆◆<br />
◆<br />
◆<br />
◆<br />
◆<br />
◆<br />
◆◆ ◆<br />
◆ ◆<br />
◆<br />
◆<br />
◆ ◆◆◆◆ ◆<br />
◆<br />
◆<br />
100<br />
◆<br />
◆<br />
10 i8086 i80286<br />
i8080<br />
1<br />
i8008<br />
i4004<br />
i80386<br />
R10000<br />
<strong>Pentium</strong>100<br />
0.1<br />
1970 1980 1990 2000<br />
1975 1985 1995 2005<br />
Architectural Trends: Bus-based <strong>MP</strong>s<br />
•Micro on a chip makes it natural to connect many to shared memory<br />
– dominates server and enterprise market, moving down to desktop<br />
•Faster processors began to saturate bus, then bus technology advanced<br />
– today, range of sizes for bus-based systems, desktop to large servers<br />
Number of processors<br />
70<br />
60<br />
50<br />
40<br />
30<br />
20<br />
10<br />
Sequent B2100<br />
●<br />
● Sequent B8000<br />
Symmetry81<br />
●<br />
SGI Challenge<br />
●<br />
CRAY CS6400●<br />
●<br />
Sunı<br />
E10000<br />
SE60<br />
Sun E6000<br />
● ●<br />
SE70<br />
●<br />
Sun SC2000●<br />
● SC2000E<br />
● SGI PowerChallenge/XL<br />
AS8400<br />
Symmetry21<br />
●<br />
Power ●<br />
SE10<br />
●<br />
SS1000●<br />
●<br />
SE30<br />
●<br />
● SS1000E<br />
SS690<strong>MP</strong> 140 ● AS2100●<br />
HP K400 ● P-<strong>Pro</strong><br />
SGI PowerSeries ●<br />
SS690<strong>MP</strong> 120 ● SS10●<br />
● SS20<br />
0<br />
1984 1986 1988 1990 1992 1994 1996 1998<br />
General Technology Trends<br />
• Microprocessor performance increases 50% - 100% per year<br />
• Transistor count doubles every 3 years<br />
• DRAM size quadruples every 3 years<br />
• Huge investment per generation is carried by huge commodity market<br />
180<br />
160<br />
140<br />
120<br />
100<br />
DEC<br />
alpha<br />
80<br />
60<br />
40<br />
20<br />
0<br />
MIPS<br />
Sun 4<br />
M/120<br />
260<br />
IBM<br />
RS6000<br />
540<br />
MIPS<br />
M2000<br />
HP 9000<br />
750<br />
1987 1988 1989 1990 1991 1992<br />
Integer FP<br />
• Not that single-processor performance is plateauing, but that<br />
parallelism is a natural way to improve it.<br />
Transistor Count Growth Rate<br />
Transistors<br />
100,000,000<br />
10,000,000<br />
1,000,000<br />
100,000<br />
◆ ◆<br />
◆<br />
◆<br />
◆◆<br />
◆<br />
◆ ◆◆◆◆<br />
◆<br />
◆<br />
◆<br />
◆ ◆<br />
◆<br />
◆<br />
◆ ◆◆<br />
◆ ◆<br />
◆<br />
◆<br />
◆<br />
◆◆ ◆<br />
◆ R10000<br />
◆◆◆<br />
◆<br />
◆◆ ◆ <strong>Pentium</strong><br />
◆<br />
◆◆<br />
i80386<br />
i80286<br />
R3000<br />
R2000<br />
◆ ◆<br />
◆ ◆<br />
◆ i8086<br />
10,000<br />
◆<br />
i8080<br />
i8008<br />
i4004<br />
1,000<br />
1970 1980 1990 2000<br />
1975 1985 1995 2005<br />
• 100 million transistors on chip by early 2000’s A.D.<br />
• Transistor count grows much faster than clock rate<br />
- 40% per year, order of magnitude more contribution in 2 decades<br />
Shared bus bandwidth (MB/s)<br />
100,000<br />
10,000<br />
1,000<br />
100<br />
Bus Bandwidth<br />
Sun E10000<br />
●<br />
SGIı<br />
● Sun E6000<br />
PowerChı<br />
XL ● AS8400<br />
SGI Challenge ● ●<br />
● CS6400<br />
● HPK400<br />
● SC2000E<br />
● SC2000 ● AS2100 ● P-<strong>Pro</strong><br />
SS690<strong>MP</strong> 120ı<br />
●<br />
SS690<strong>MP</strong> 140<br />
SS1000 ●<br />
SS10/ı ●<br />
SE10/ı<br />
● SS1000E<br />
● SS20<br />
● SE70/SE30<br />
Symmetry81/21<br />
SE60<br />
●<br />
● SGI PowerSeries ● Power<br />
●<br />
Sequentı<br />
B8000<br />
● Sequent B2100<br />
10<br />
1984 1986 1988 1990 1992 1994 1996 1998<br />
2
Phases in VLSI Generation<br />
Transistors<br />
100,000,000<br />
10,000,000<br />
1,000,000<br />
100,000<br />
10,000<br />
i8080<br />
◆<br />
◆ i8008<br />
◆<br />
◆ i4004<br />
Bit-level parallelism Instruction-level Thread-level (?)<br />
◆<br />
i80286◆<br />
◆ ◆<br />
◆ i8086<br />
◆<br />
◆<br />
◆◆◆◆ ◆<br />
◆<br />
◆ ◆ ◆<br />
◆<br />
◆ ◆<br />
◆ ◆ ◆<br />
◆<br />
◆<br />
◆◆◆<br />
◆ ◆<br />
◆ ◆<br />
◆<br />
◆<br />
◆ ◆ ◆<br />
◆ R10000<br />
◆ ◆<br />
<strong>Pentium</strong><br />
i80386<br />
◆ ◆ R3000<br />
◆ R2000<br />
1,000<br />
1970 1975 1980 1985 1990 1995 2000 2005<br />
■ How good is instruction-level parallelism?<br />
■ Thread-level needed in microprocessors?<br />
<strong>Computer</strong>-System Operation<br />
■ I/O devices and the CPU can execute concurrently.<br />
■ Each device controller is in charge of a particular device<br />
type.<br />
■ Each device controller has a local buffer.<br />
■ CPU moves data from/to main memory to/from local<br />
buffers<br />
■ I/O is from the device to local buffer of controller.<br />
■ Device controller informs CPU that it has finished its<br />
operation by causing an interrupt.<br />
Interrupt Handling<br />
■ The operating system preserves the state of the CPU by<br />
storing registers and the program counter.<br />
■ Determines which type of interrupt has occurred:<br />
✦ polling<br />
✦ vectored interrupt system<br />
■ Separate segments of code determine what action should<br />
be taken for each type of interrupt<br />
◆<br />
Economics<br />
■ Commodity microprocessors not only fast but CHEAP<br />
• Development cost is tens of millions of dollars (5-100 typical)<br />
• BUT, many more are sold compared to supercomputers<br />
✦ Crucial to take advantage of the investment, and use the commodity<br />
building block<br />
✦ Exotic parallel architectures no more than special-purpose<br />
■ Multiprocessors being pushed by software vendors (e.g. database) as<br />
well as hardware vendors<br />
■ Standardization by <strong>Intel</strong> makes small, bus-based S<strong>MP</strong>s commodity<br />
■ Desktop: few smaller processors versus one larger one? Multiprocessor<br />
on a chip is here.<br />
Common Functions of Interrupts<br />
■ Interrupt transfers control to the interrupt service routine<br />
generally, through the interrupt vector, which contains the<br />
addresses of all the service routines.<br />
■ Interrupt architecture must save the address of the<br />
interrupted instruction.<br />
■ Incoming interrupts are disabled while another interrupt is<br />
being processed to prevent a lost interrupt.<br />
■ A trap is a software-generated interrupt caused either by<br />
an error or a user request.<br />
■ An operating system is interrupt driven.<br />
Interrupt Time Line For a Single <strong>Pro</strong>cess Doing Output<br />
3
I/O Structure<br />
■ After I/O starts, control returns to user program only upon<br />
I/O completion.<br />
✦ Wait instruction idles the CPU until the next interrupt<br />
✦ Wait loop (contention for memory access).<br />
✦ At most one I/O request is outstanding at a time, no<br />
simultaneous I/O processing.<br />
■ After I/O starts, control returns to user program without<br />
waiting for I/O completion.<br />
✦ System call – request to the operating system to allow user<br />
to wait for I/O completion.<br />
✦ Device-status table contains entry for each I/O device<br />
indicating its type, address, and state.<br />
✦ Operating system indexes into I/O device table to determine<br />
device status and to modify table entry to include interrupt.<br />
Two I/O Methods<br />
Synchronous Asynchronous<br />
Device-Status Table Direct Memory Access Structure<br />
Storage Structure<br />
■ Main memory – only large storage media that the CPU<br />
can access directly.<br />
■ Secondary storage – extension of main memory that<br />
provides large nonvolatile storage capacity.<br />
■ Magnetic disks – rigid metal or glass platters covered with<br />
magnetic recording material<br />
✦ Disk surface is logically divided into tracks, which are<br />
subdivided into sectors.<br />
✦ The disk controller determines the logical interaction<br />
between the device and the computer.<br />
■ Used for high-speed I/O devices able to transmit<br />
information at close to memory speeds.<br />
■ Device controller transfers blocks of data from buffer<br />
storage directly to main memory without CPU<br />
intervention.<br />
■ Only on interrupt is generated per block, rather than the<br />
one interrupt per byte.<br />
Moving-Head Disk Mechanism<br />
4
Storage Hierarchy<br />
■ Storage systems organized in hierarchy.<br />
✦ Speed<br />
✦ Cost<br />
✦ Volatility<br />
■ Caching – copying information into faster storage system;<br />
main memory can be viewed as a last cache for<br />
secondary storage.<br />
Caching<br />
■ Use of high-speed memory to hold recently-accessed<br />
data.<br />
■ Requires a cache management policy.<br />
■ Caching introduces another level in storage hierarchy.<br />
This requires data that is simultaneously stored in more<br />
than one level to be consistent.<br />
■ Caching is typically transparent to the OS<br />
■ Dual-Mode Operation<br />
■ I/O <strong>Pro</strong>tection<br />
■ Memory <strong>Pro</strong>tection<br />
■ CPU <strong>Pro</strong>tection<br />
Hardware <strong>Pro</strong>tection<br />
Storage-Device Hierarchy<br />
Migration of A From Disk to Register<br />
Dual-Mode Operation<br />
■ Sharing system resources requires operating system to<br />
ensure that an incorrect program cannot cause other<br />
programs to execute incorrectly.<br />
■ <strong>Pro</strong>vide hardware support to differentiate between at least<br />
two modes of operations.<br />
1. User mode – execution done on behalf of a user.<br />
2. Monitor mode (also kernel mode or system mode) –<br />
execution done on behalf of operating system.<br />
5
Dual-Mode Operation (Cont.)<br />
■ Mode bit added to computer hardware to indicate the<br />
current mode: monitor (0) or user (1).<br />
■ When an interrupt or fault occurs hardware switches to<br />
monitor mode.<br />
Interrupt/fault<br />
monitor user<br />
set user mode<br />
Privileged instructions can be issued only in monitor mode.<br />
I/O <strong>Pro</strong>tection<br />
■ All I/O instructions are privileged instructions.<br />
■ Must ensure that a user program could never gain control<br />
of the computer in monitor mode (I.e., a user program<br />
that, as part of its execution, stores a new address in the<br />
interrupt vector).<br />
Use of A System Call to Perform I/O Memory <strong>Pro</strong>tection<br />
■ Must provide memory protection at least for the interrupt<br />
vector and the interrupt service routines.<br />
■ In order to have memory protection, add two registers<br />
that determine the range of legal addresses a program<br />
may access:<br />
✦ Base register – holds the smallest legal physical memory<br />
address.<br />
✦ Limit register – contains the size of the range<br />
■ Memory outside the defined range is protected.<br />
Use of A Base and Limit Register Hardware Address <strong>Pro</strong>tection<br />
6
Hardware <strong>Pro</strong>tection<br />
■ When executing in monitor mode, the operating system<br />
has unrestricted access to both monitor and user’s<br />
memory.<br />
■ The load instructions for the base and limit registers are<br />
privileged instructions.<br />
■ Local Area Networks (LAN)<br />
■ Wide Area Networks (WAN)<br />
Network Structure<br />
Wide Area Network Structure<br />
CPU <strong>Pro</strong>tection<br />
■ Timer – interrupts computer after specified period to<br />
ensure operating system maintains control.<br />
✦ Timer is decremented every clock tick.<br />
✦ When timer reaches the value 0, an interrupt occurs.<br />
■ Timer commonly used to implement time sharing.<br />
■ Time also used to compute the current time.<br />
■ Load-timer is a privileged instruction.<br />
Local Area Network Structure<br />
7