The PowerPC 604 RISC Microprocessor - eisber.net

Recommendations

Info

Branch instructionsI FetchPredict'DecodeI PredictInteger instructionsFetchLoad/store instructionsDispatchPredictI Validate I Complete1 l Decode !Dispatch I Execute Complete 'Write backFetch I Decode DispatchFloating-point instructionsFetchDecode IDispatch !Multiply !AddFigure 1. Pipeline description.• Completion. An instruction issaid to be fimshed when itpasses the execute stage. A finishedinstruction can be completed1) if it does not cause anexception condition and 2)when all instructions thatappear earlier in programorder complete. This is knownas in-order completion.• Write back This stage writesthe results of completedinstructions to the architecturalstate (or the state that isvisible to programmer). Bypasslogic permits most instructions to complete andwrite back in one cycle.Although some designs use even deeper pipelines toachieve higher clock frequencies than the 604 does, we feltthat such a design point does not suit today's personal computers.It relies too heavily on one of, or a combination of, avery large on-chip cache, a wide data bus, or a fast memorysystem to deliver its performance. It would be less than competitivein today's cost-sensitive person -al computer market.Precise interrupts and register renaming. Most programmersexpect a pipelines) processor to behave as a nonpipeline('processor, in which one instruction goes throughthe fetch to write-back stages before the next one begins. Aprocessor meets that expectation if it supports precise interrupts,in which it stops at the first instruction that should notbe processed. When it stops (to process an interrupt), theprocessor's state reflects the results of executing all instructionsprior to the interrupt-causing instruction and none ofthe later instructions, including the interrupt-causing instruction.This is not a trivial problem to solve in multiple. out-oforderexecution pipelines. An earlier instruction executingafter a later instruction can change the processor's state tomake Later instruction processing illegal. Sohi gives a generaloverview of the design issues and solutions.The 604 uses a variant of the reorder buffer described bySmith and Pleszkun to implement precise interrupts." The16-entry reorder buffer keeps track of instruction order aswell as the state of the instructions. The dispatch stage assignseach instruction a reorder buffer entry as it is dispatched.When the instruction finishes execution, the execution unitrecords the instructions execution status in the assignedreorder buffer entry. Since the reorder buffer is managed asa first-in/first-out queue. its examining order matches theinstruction flow sequence. To enforce in-order completion.all prior instructions in the reorder buffer must completebefore an instruction can be considered for completion. Thereorder butler examines four entries every cycle to allowAddr I Cache Align CompleteICalcI I Write back'Rndinorrn CompleteI I Write backInstructionTable 1. 604 execution timings.Latency ThroughputMost integer 1 1integer multiply (32x32) 4 2Integer multiply (others) 1Integer divide 20 19Integer load 2 1Floating-point load 3 1Store 3 1Floating-point multiply-add 3 1Single-precision floating-point divide 18 18Double -precision floating -point divide 31 31completion of up to four instructions per cycle.Unlike Smith and Pleszkun's reorder buffer, the 604'sreorder buffer does not store instruction results. Temporarybuffers hold them until the instructions that generated themcomplete. At that time, the write-back stage copies the resultsto the architectural registers. The 604 renames registers toachieve this: instead of writing results directly to specified registers,they are wntten to rename buffers and later copied tospecified registers. Since instructions can execute out of order,their results can also be produced and written out of orderinto the rename buffers. The results are, however. copied fromthe buffers to the specified registers in program order. Registerrenaming minimizes architectural resource dependencies,namely the output-dependency (or write-after-write hazard)and anticlepenciency (or write-after-read hazard), that wouldotherwise limit opportunities for out-of-order execution.'Figure 2 (next page) depicts the format of a rename bufferentry The 604 contains a 12-entry rename buffer for thegeneral-purpose registers (GPRs) that are used for 32-bit integeroperations. The 604 allocates a GPR rename buffer entryupon dispatch of an instruction that modifies a GPR The dispatchstage writes a destination register number of theOctober 1994 9
Rename valid I Reg num Result Result validFigure 2. Rename buffer entry format.instruction to the Reg num field, sets a Rename valid bit, andclears the Result valid bit. When the instruction executes, theexecution unit writes its result to the Result field and sets theResult valid bit. After the instruction completes, the writebackstage copies its result from the rename buffer entry tothe GPR specified by the Reg num field, freeing the entry forreallocation. For a load-with-update instruction that modifiestwo GPRs, one for load data and another for address.the 60-4 allocates two rename buffer entries.Register renaming complicates the process of locating thesource operands for an instruction since they can also residein rename buffers. In dispatching an integer instruction, thedispatch stage searches its source operands simultaneouslyfrom the GPR file and its rename buffer. If a source operandhas not been renamed, the processor uses the value readfrom the GPR file. If a rename exists (indicated by an entrywith the Rename valid set and its Reg nurn field matchingthe source register number), the Result in the rename bufferis used. It is, however, possible that the result is not yet validbecause the instruction that produces the GPR has not yetexecuted. The dispatch stage still dispatches the instructionsince the operand will be supplied by the reservation stationwhen the result is produced. The dispatched instructioncontains the rename buffer entry identifier in place of theoperand. The GPR file and its rename buffer can use eightread ports for source operands to support dispatching of fourinteger instructions each cycle.The 604 also uses a rename buffer for floating-point registers(FPRs) and one more for the condition register (CR).The FPR rename buffer has eight 90-bit-wide entries to holda double-precision result with its data type and exceptionstatus. The FPR file and its rename buffer access three readports for dispatching one floating-point instruction per cycle.In addition to compare instructions, most integer and floating-pointinstructions can also generate negative, positive,zero, and overflow condition results. One of the eight fieldsin the 32-bit CR stores these 4-bit condition results. The 604treats each field as a 4-bit register and applies register renamingusing an eight-entry CR rename buffer.Branch prediction and speculative execution. Becausetoday's application software contain a high percentage ofbranch instructions, correctly predicting the outcome of theseinstructions is crucial to keeping the multiple instructionpipelines flowing and for achieving two to three times theexecution rate of scalar processors. The 604 uses dynamicbranch prediction in the fetch, decode. and dispatch stagesto predict as well as correct branch instructions early.The 604's speculative execution strategy complements itsbranch prediction mechanisms. The strategy is to fetch andexecute beyond two unresolved branch instructions. Theresults of these speculatively executed instructions reside inrename buffers and in other temporary registers. If the predictionis correct, the write-back stage copies the results ofspeculatively executed instructions to the specified registersafter the instructions complete.Upon detection of a branch misprediction. the 604 takesquick action to recover in one cycle. It selectively cancelsthe instructions that belong in the mispredicted path fromthe reservation stations, execution units. and memoryqueues. It also discards their results from the temporarybuffers. In addition, the processor resumes its previous stateto start executing from the correct path even before the mispredictedbranch and its earlier instructions have completed.Since the 604 detects a branch misprediction many cyclesbefore the branch instruction completes, its fast recoveryscheme helps to maintain performance of those applicationswith high data cache miss rates and whose branches are difficultto predict.Serialization. A serialization mechanism delays executionof certain instructions that would otherwise be expensiveto execute speculatively in the 604's multiple-pipeline,out-of-order execution design. This mechanism delays infrequentlyused instructions until they can safely execute whilepermitting later instructions to execute. Some examples arethe move to and from special-purpose register instructions,the extended arithmetic instructions that read the carry bit,and the instructions that directly operate on the CR, whichthe PowerPC architecture provides for calculating complexbranch conditions. This mechanism also controls storeinstructions since it is difficult to undo stores.The dispatch stage sends a serialized instruction to theproper execution unit with an indication that it should notbe executed. When all prior instructions have completed andupdated their results to the architectural state, the completionstage allows the serialized instruction to execute. Oncethe serialized instruction is dispatched, dispatch continuesto dispatch the following instructions so they can executebefore the serialized instruction. When the serialized instructionis completed, the later instructions also complete uponfinishing execution. This minimizes the penalty of serializedinstructions.Machine organizationFigure 3 shows the fetch address generation logic. Thefetch stage selects an address from the addresses generatedin the different pipeline stages each cycle. Since an addressgenerated in a later stage belongs to an earlier instruction, itsselection precedes an address from an earlier stage.The completion stage detects exception conditions andgenerates an exception handler address. This stage also10 IEEE Micro
Page 1: The PowerPC 604 RISC Microprocessor
Page 5 and 6: updated by speculatively executed m
Page 7 and 8: PowerPC 604Reservation stationRA (0
Page 9 and 10: Data addressInstruction address•M
Page 11 and 12: Threaded Codethreaded = aufgefadelt
Page 13 and 14: token threaded codeindirect token t
Page 15 and 16: Kosten auf dem MIPS 83000indirect t
Page 17 and 18: CTIL'Start / Restart' 14ZIP Funktio
Page 19 and 20: SystemparameierStackbefehleKarmen a
Page 21 and 22: KontrollstrukturenBEGIN ... END Sch
Page 23 and 24: Stack Framelocal stacklocalsparamet
Page 25 and 26: 122 The P-code Machine [Ch.Thus the
Page 27 and 28: 68Pascal Implementation: Compiler a
Page 29 and 30: 72 Pascal Implementation: Compiler
Page 35 and 36: Tables of Lexical Analysis123456frw
Page 37 and 38: JVIII ■ 111t14 /1.11111ySIS ai Y
Page 39 and 40: 4Pascal implementation: Compiler an
Page 41 and 42: 8 Pascal Implementation: Compiler a
Page 43 and 44: 12 Pascal implementation: Compiler
Page 45 and 46: 16Pascal Implementation: Compiler a
Page 47 and 48: 30 Pascal inipletnentatIon: Compile
Page 49 and 50: 34 Pascal implementation: Complier
Page 51 and 52: if) 1993, 1994, 1995 Sim Microsyste
Page 53 and 54:
0-PrefaceThis document describes ve
Page 55 and 56:
method calls into actual method cal
Page 57 and 58:
auper_el arsThis field is an index
Page 59 and 60:
ATtagbytesCONSTANT_Integer_infoul t
Page 61 and 62:
2.6.1 SourceFileThe "SourceFile" at
Page 63 and 64:
local ..verieble_table_lengthThis f
Page 65 and 66:
1dc2wPush long or double from const
Page 67 and 68:
I 3.4Storing Stack Values into Loca
Page 69 and 70:
anewarrayAllocate new array of refe
Page 71 and 72:
PastoreStore into single float arra
Page 73 and 74:
laddLong integer addfSubSingle floa
Page 75 and 76:
dreminegInegInegdnegDouble float re
Page 77 and 78:
IliLong integer to integerSyntax:L
Page 79 and 80:
ifleBranch if less than or equal to
Page 81 and 82:
dcmpgDouble float compare (1 on NaN
Page 83 and 84:
3.13 Table JumpingtableswitchAccess
Page 85 and 86:
the matched method is found. The me
Page 87 and 88:
Appendix A: An OptimizationA.2 Push
Page 89 and 90:
cpgetfield2_quickFetch held from ob
Page 91 and 92:
checkcast_quickMake sure object is
Page 93 and 94:
Efficient JavaVM Just-in-Time Compi
Page 95 and 96:
The second pass of the compiler tra
Page 97 and 98:
chain length 1 2 3 4 5 6 7 8 9 >9oc
Page 99 and 100:
sieve JavaLex javac espresso i Toba
Page 101 and 102:
Technical Overview of the Common La
Page 103 and 104:
In contrast to the JVM where all st
Page 105 and 106:
The ldind t instruction expects an
Page 107 and 108:
It should be obvious that having va
Page 109 and 110:
9 Interaction between value and ref
Page 111 and 112:
2. CLI Partition II: Metadata. http
Page 113 and 114:
Anforderungen an den Zwischencode
Page 115 and 116:
DieRoboterprogrammiersprache• kei
Page 117 and 118:
Implementierung des Interpreters•
Page 119 and 120:
Printed by andi from a0.complang.tu
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
power but retain the disciplined vi
Page 131 and 132:
I- be reached from the present curs
Page 133 and 134:
the editor will be invoked asking t
Page 135 and 136:
immediately that 'I' has type integ
Page 137 and 138:
--y-PASC7,1,.The useof a formal lan
Page 139 and 140:
Implementation Techniques for Prolo
Page 141 and 142:
x(X) a(A)a(C) b(C), c(C)b(s(0))c(s(
Page 143 and 144:
Unification in general consists of
Page 145 and 146:
for the classification of temporary
Page 147 and 148:
copy stack1trailenvironment stack1t
Page 149:
process reduction attempt are elimi
show all

The PowerPC 604 RISC Microprocessor - eisber.net

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?