04.11.2012 Views

1 Montgomery Modular Multiplication in Hard- ware

1 Montgomery Modular Multiplication in Hard- ware

1 Montgomery Modular Multiplication in Hard- ware

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

FEI KEMT<br />

C’<br />

carry cha<strong>in</strong> carry cha<strong>in</strong><br />

FA FA FA<br />

(a) carry-propagate adder<br />

C<br />

FA FA . . . FA<br />

(b) carry-save adder<br />

Figure 2 – 2 One level of the w-bit adder implemented as CPA and CSA with FAs<br />

f<strong>in</strong>al speed is dom<strong>in</strong>ated by the embedded memory access time or other critical path<br />

<strong>in</strong> the logic. The value wmax may differ between technologies due to the different<br />

rout<strong>in</strong>g and dist<strong>in</strong>ct physical layout (number of LEs <strong>in</strong> LAB). The question is if the<br />

wmax is <strong>in</strong> the range of allowed values for the on-chip memory width of available<br />

FPGAs. In this way we could store and also operate the variables with optimal<br />

word width and achieve the best Area-Time product.<br />

Carry-Save Adder Unit The whole computational complexity of both algo-<br />

rithms lies <strong>in</strong> two additions of three w-bit operands for comput<strong>in</strong>g Si+1. The<br />

propagation of the carry bits between the w adders is (<strong>in</strong> general) too slow. The<br />

implementation of the MWR2MM CSA <strong>in</strong> [108] uses redundant representation of<br />

<strong>in</strong>termediate sum S and carry-save adders [38]. The MWR2MM CSA w-bit PE<br />

architecture based on Full Adders (FAs) is depicted <strong>in</strong> Figure 2 – 3.<br />

In order to reduce the storage size and arithmetic hard<strong>ware</strong> complexity the vari-<br />

ables X, Y , and M are available <strong>in</strong> a non-redundant form. The <strong>in</strong>termediate <strong>in</strong>ternal<br />

sum S is received and generated <strong>in</strong> the redundant form as 1S and 2S. The advantage<br />

of redundant form lies <strong>in</strong> the <strong>in</strong>dependence of the latency from the word length w<br />

as there is no direct connection between the FAs. The output of the adders is valid<br />

right after appearance of the <strong>in</strong>put signals and the delay is given ma<strong>in</strong>ly by <strong>in</strong>ternal<br />

comb<strong>in</strong>ational logic of the FA.<br />

The process<strong>in</strong>g delay may <strong>in</strong>crease for larger w as a result of the broadcast<br />

problem only, it will not depend on the arithmetic operation itself. Conversion<br />

<strong>in</strong>to the normal non-redundant representation is only done at the very end of the<br />

MMM computation. The <strong>in</strong>termediate result of sum S may be further shifted to<br />

other MMM unit as operand X or Y for a new computation (e.g. next iteration<br />

of the modular exponentiation). The redundant representation of variables that<br />

requires twice as much memory as a non-redundant representation and a need for the<br />

transformation to/from redundant form have been considered as the ma<strong>in</strong> drawbacks<br />

27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!