15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

FIGURE 5.6<br />

Functional Units<br />

Defoe is a 64-bit architecture with the following functional units.<br />

• Two load/store units<br />

• Two simple ALUs that perform add, subtract, shift, and logical operations on 64-bit numbers and<br />

packed 32-, 16-, and 8-bit numbers; in addition, these units also support multiplication of packed<br />

16 and 8 bit numbers.<br />

• One complex ALU that can perform multiply and divide on 64 bit integers and packed 32-, 16-,<br />

and 8-bit integers<br />

• One branch unit that performs branch, call, and comparison operations<br />

There is no support for floating point. Figure 5.6 shows a simplified diagram of the Defoe architecture.<br />

Registers<br />

Defoe has a set of 64 programmer visible general purpose registers which are 64 bits wide. Register R0<br />

always contains 0. There is no support for register renaming like in a super scalar architecture.<br />

Predication<br />

Predicate registers are special 1 bit registers that specify a true or false value. There are 16 programmer visible<br />

predicate registers in the Defoe named PR0 to PR15. All operations in Defoe are predicated, i.e., each operation<br />

specifies a predicate register in the PR field. The instruction is always executed. Logically, if the predicate is<br />

false, the results are discarded or the side effects of the instruction do not happen. In practice, for reasons<br />

of efficiency, this may be implemented by computing a result, but writing back the old value of the target<br />

register. Predicate register 0 always contains the value 1 and cannot be altered. Specifying PR0 as the predicate<br />

performs unconditional operations. Comparison operations use predicate registers as their target register.<br />

Instruction Encoding<br />

Defoe is a 64-bit compressed VLIW architecture. By compressed, we mean that rather than use a fixed<br />

size MultiOp and waste slots by filling in NOPs when no suitable operation can be scheduled in a slot<br />

within a MultiOp we use variable length MultiOps. Individual operations are encoded as 32-bit words.<br />

A special stop bit in the 32-bit word indicates the end of an instruction word (Fig. 5.7). Common arithmetic<br />

operations also have an immediate mode, where a sign or zero extended 8 bit constant may be used as<br />

© 2002 by CRC Press LLC<br />

Simple<br />

Integer<br />

Defoe architecture.<br />

To L2<br />

Cache<br />

Simple<br />

Integer<br />

Complex<br />

Integer<br />

Load/<br />

Store<br />

64 entry Register File<br />

Predecode Buffer<br />

Dispersal Unit<br />

I-Cache<br />

D-Cache<br />

From L2<br />

Cache<br />

Load/<br />

Store<br />

Branch/<br />

Cmp<br />

16x<br />

Pred<br />

Score<br />

Board<br />

&<br />

Fetch

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!