The PowerPC 604 RISC Microprocessor - eisber.net

Recommendations

Info

tune. If a temporary variable is contained in morethan one structure, combined unification or copyinginstructions are generated.All necessary information for clause indexing is computedduring compile time. Some alternatives areeliminated because of failing constant combinations.The remaining alternatives are indexed on the argumentthat contains the most constants or structures.For compatibility reasons with the VAM2p a balancedbinary tree is used for clause selection.The VAM i p implements two versions of last-calloptimization. The first variant (we call it postoptimization)is identical to that of the VAM2P- Ifthe determinacy of a clause can be determined duringrun time, the registers containing the head variablesare stored in the callers stack frame. Head variableswhich reside in the stack frame due to the lack of registersare copied from the head (callee's) stack frameto the goal (caller's) stack frame.If the determinacy of a clause can be detected duringcompile time, the caller's and the callee's stackframes are equal. Now all unifications between variableswith the same offset can be eliminated. If not allhead variables are held in registers reading and writingvariables must be done in the right order. We callthis variant of last-call optimization pre-optimization.Loop optimization is done for a determinate recursivecall of the last and only subgoal. The restriction toa single subgoal is due to the use of registers for valuepassing and possible aliasing of variables. Unificationbetween two structures is performed by unifying thearguments directly. The code for the unification of avariable and a structure is split into unification codeand copy code.5 Optimizations5.1 Abstract InterpretationInformation about types, modes, trailing, referencechain length and aliasing of variables of a programcan be inferred using abstract interpretation. Abstractinterpretation is a technique of describing and implementingglobal flow analysis of programs. It was introducedby [C077] for datafiow analysis of imperativelanguages. This work was the basis of much of the recentwork in the field of logic programming [AH87)[Bru91] [Deb921 [Me185) [RD92] [Tay89]. Abstract interpretationexecutes programs over an abstract domain.Recursion is handled by computing fixpoints.To guarantee the termination and completeness of theexecution a suitable choice of the abstract domain isnecessary. Completeness is achieved by iterating theinterpretation until the computed information change.Termination is assured by bounding the size of thedomain. The previous cited systems all are metainterpreterswritten in Prolog and very slow.A practical implementation of abstract interpretationhas been done by Tan and Lin [TL92]. They modifieda WAM emulator implemented in C to executethe abstract operations on the abstract domain. Theyused this abstract emulator to infer mode, type andalias information. They analysed a set of small benchmarkprograms in few milliseconds which is about 150times faster than the previous systems.5.2 Sophisticated Clause IndexingThe standard indexing method used in WAM-basedProlog systems can create two choice points. Therefore,this method has been called two-level indexing.Carlson [Car87] introduced one-level indexing by delayingthe creation of a choice point as long as possible.By discriminating first on the type of the first argumentand when appropriate on its principal functor,the set of potentially matching clauses is filtered out.A choice point is then needed only for non singletonsets. In the worst case the number of indexing instructionscan be quadratic to the number of clauses. TheVAM2p uses pointers instead of indexing instructionsto avoid two-level indexing and to enable assert and retract[Kra88]. A similar strategy is used in [DMC89].The use of field encoded and superimposed codewords for clause indexing was proposed by Wise andPowers [WP84] and was refined by Colomb [CJ86][Co191]. The method is based on content addressablememory (CAM). The CAM consists of an array of bitcolumns. Logical operations on columns and lines ofthe CAM can be computed in one cycle. The resultsof operations can be held in result columns or lines.The idea is to hold hash values for the arguments of aclause in the CAM. The encoding scheme is based onm-in-n coding which sets m bits in a word of size n toI. Field encoding uses n/2-in-n coding and gives eachargument some bits of a line. Superimposed codinguses m-in-n coding, where n is the size of a whole lineand m so small that m times number of arguments isn/2. Variables are either represented by a special columnor by hash values with all bits set to 1 or 0. TheCAM is fast, but too special and expensive to be usedin general purpose computer systems.In [KS88] Kliger and Shapiro describe an algorithmfor the compilation of an FCP(—,:,?) procedure intoa control-flow decision tree that analyses the possibledata states in a procedure call. This tree is translatedto a header for the corresponding machine code of thepredicate. At run time the generated instructions controlthe flow which finally reaches the jump instructionpointing to the correct clause. Redundant tests in a1 0
process reduction attempt are eliminated and the candidateclause is found efficiently. The decision treemay need program space exponential in the numberof clauses and argument positions. Consequently in[KS90] they choose decision graphs rather than decisiontrees to encode the possible traces of each predicate.Hickey and Mudambi [1-1.M89J were the first who applieddecision trees as an indexing structure to Prolog.They compile a program as a whole and apply modeinference to determine which arguments are bound.The decision tree is compiled into switching instructionswhich can be combined with unification instructionsand primitive tests. So equivalent unificationswhich occur in different clauses are evaluated onlyonce. Reusing the result of such a unification requiresa consistent register use. A complete indexing schemegenerating algorithm is presented which takes into accounteffects of aliasing and gives a consistent registeruse. They also show that the size of the switchingtree is exponential in the worst case and that findingan optimal switching tree is NP-complete. For caseswhere the—size of the switching tree is a problem theyalso present a quadratic indexing algorithm. In generalthe size is no problem and the speedup is a factorof two.Palmer and Naish [PN911 and Hans [Han921 also noticedthe potential exponential size of decision trees.They compute the decision tree for each argument separatelyand store the set of applicable clauses for eachargument. At run time the arguments are evaluatedand the intersection of the applicable clause sets ofeach argument is computed. The disadvantage of thismethod is the high run time overhead. Furthermorethe size of the clause sets is quadratic to the number ofclauses, whereas decision trees are rarely exponentialwith respect to the number of arguments.5.3 Stack CheckingSince a Prolog system has many stacks, the run timechecking of stack overflow can be very time consuming.There are two methods to reduce this overhead.The more effective one uses the memory managementunit of the processor to perform the stack check. Awrite protected page of memory is allocated betweenthe stacks. Catching the trap of the operating systemcan be applied to promote a more meaningful errormessage to the user. A problem with this scheme occursin combination with garbage collection. The trapcan occur at a point in the program where the internalstate of the system is unclear so that it is difficult tostart garbage collection.The second idea is to reduce the scattered overflowchecks to one check per call. It is possible to computeat compile time the maximum number of cellsallocated during a single call on the copy and the environmentstack. If these stacks grow into one another(possible only if no references are on the environmentstack) both stacks can be tested with a single overflowcheck. The maximum use of the trail during a call cannot be determined at compile time.5.4 Instruction SchedulingModern processors can issue instructions while precedinginstructions are not yet finished and can issueseveral instructions in each cycle. It can happen thatan instruction has to wait for the results of anotherinstruction. Instruction scheduling tries to reorder instructionsso that they can be executed in the shortestpossible time.The simplest instruction schedulers work on basicblocks. The most common technique is list scheduling[War9O]. It is a heuristic method which yields nearlyoptimal results. It encompasses a class of algorithmsthat schedule operations one at a time from a list of operationsto be scheduled, using priorization to resolveconflicts. If there is a conflict between instructions fora processor resource, this conflict is resolved in favourof the instruction which lies on the longest executingpath to the end of the basic block. A problem withbasic block scheduling is that in Prolog basic blocksare small due to tag checks and dereferencing. So instructionscheduling relies on global program analysisto eliminate conditional instructions and increase basicblock sizes. Just as important is alias analysis. Loadsand stores can be moved around freely only if they donot address the same memory location.A technique called trace scheduling can be appliedto schedule the instructions for a complete inference[Fis811. A ,-trace is a possible path through a section ofcode. In general it would be the path from the entryof a call to the exit of a call. Trace scheduling uses listscheduling, starting with the most frequent path andcontinuing with less frequent paths. During schedulingit can happen that an instruction has to be movedover a branch or join. In this case compensation codehas to be inserted on the other path. In Prolog the lessfrequent path is often the branch to the backtrackingcode. In such cases it is often not necessary to compensatethe moved instruction.AcknowledgementWe express our thanks to Alexander Forst, FranzPuntigam and Jian Wang for their comments on earlierdrafts of this paper11
Page 1 and 2:
The PowerPC 604 RISC Microprocessor
Page 3 and 4:
Rename valid I Reg num Result Resul
Page 5 and 6:
updated by speculatively executed m
Page 7 and 8:
PowerPC 604Reservation stationRA (0
Page 9 and 10:
Data addressInstruction address•M
Page 11 and 12:
Threaded Codethreaded = aufgefadelt
Page 13 and 14:
token threaded codeindirect token t
Page 15 and 16:
Kosten auf dem MIPS 83000indirect t
Page 17 and 18:
CTIL'Start / Restart' 14ZIP Funktio
Page 19 and 20:
SystemparameierStackbefehleKarmen a
Page 21 and 22:
KontrollstrukturenBEGIN ... END Sch
Page 23 and 24:
Stack Framelocal stacklocalsparamet
Page 25 and 26:
122 The P-code Machine [Ch.Thus the
Page 27 and 28:
68Pascal Implementation: Compiler a
Page 29 and 30:
72 Pascal Implementation: Compiler
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Tables of Lexical Analysis123456frw
Page 37 and 38:
JVIII ■ 111t14 /1.11111ySIS ai Y
Page 39 and 40:
4Pascal implementation: Compiler an
Page 41 and 42:
8 Pascal Implementation: Compiler a
Page 43 and 44:
12 Pascal implementation: Compiler
Page 45 and 46:
16Pascal Implementation: Compiler a
Page 47 and 48:
30 Pascal inipletnentatIon: Compile
Page 49 and 50:
34 Pascal implementation: Complier
Page 51 and 52:
if) 1993, 1994, 1995 Sim Microsyste
Page 53 and 54:
0-PrefaceThis document describes ve
Page 55 and 56:
method calls into actual method cal
Page 57 and 58:
auper_el arsThis field is an index
Page 59 and 60:
ATtagbytesCONSTANT_Integer_infoul t
Page 61 and 62:
2.6.1 SourceFileThe "SourceFile" at
Page 63 and 64:
local ..verieble_table_lengthThis f
Page 65 and 66:
1dc2wPush long or double from const
Page 67 and 68:
I 3.4Storing Stack Values into Loca
Page 69 and 70:
anewarrayAllocate new array of refe
Page 71 and 72:
PastoreStore into single float arra
Page 73 and 74:
laddLong integer addfSubSingle floa
Page 75 and 76:
dreminegInegInegdnegDouble float re
Page 77 and 78:
IliLong integer to integerSyntax:L
Page 79 and 80:
ifleBranch if less than or equal to
Page 81 and 82:
dcmpgDouble float compare (1 on NaN
Page 83 and 84:
3.13 Table JumpingtableswitchAccess
Page 85 and 86:
the matched method is found. The me
Page 87 and 88:
Appendix A: An OptimizationA.2 Push
Page 89 and 90:
cpgetfield2_quickFetch held from ob
Page 91 and 92:
checkcast_quickMake sure object is
Page 93 and 94:
Efficient JavaVM Just-in-Time Compi
Page 95 and 96:
The second pass of the compiler tra
Page 97 and 98: chain length 1 2 3 4 5 6 7 8 9 >9oc
Page 99 and 100: sieve JavaLex javac espresso i Toba
Page 101 and 102: Technical Overview of the Common La
Page 103 and 104: In contrast to the JVM where all st
Page 105 and 106: The ldind t instruction expects an
Page 107 and 108: It should be obvious that having va
Page 109 and 110: 9 Interaction between value and ref
Page 111 and 112: 2. CLI Partition II: Metadata. http
Page 113 and 114: Anforderungen an den Zwischencode
Page 115 and 116: DieRoboterprogrammiersprache• kei
Page 117 and 118: Implementierung des Interpreters•
Page 119 and 120: Printed by andi from a0.complang.tu
Page 129 and 130: power but retain the disciplined vi
Page 131 and 132: I- be reached from the present curs
Page 133 and 134: the editor will be invoked asking t
Page 135 and 136: immediately that 'I' has type integ
Page 137 and 138: --y-PASC7,1,.The useof a formal lan
Page 139 and 140: Implementation Techniques for Prolo
Page 141 and 142: x(X) a(A)a(C) b(C), c(C)b(s(0))c(s(
Page 143 and 144: Unification in general consists of
Page 145 and 146: for the classification of temporary
Page 147: copy stack1trailenvironment stack1t
show all

The PowerPC 604 RISC Microprocessor - eisber.net

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?