1 Montgomery Modular Multiplication in Hard- ware
1 Montgomery Modular Multiplication in Hard- ware
1 Montgomery Modular Multiplication in Hard- ware
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
FEI KEMT<br />
C’<br />
carry cha<strong>in</strong> carry cha<strong>in</strong><br />
FA FA FA<br />
(a) carry-propagate adder<br />
C<br />
FA FA . . . FA<br />
(b) carry-save adder<br />
Figure 2 – 2 One level of the w-bit adder implemented as CPA and CSA with FAs<br />
f<strong>in</strong>al speed is dom<strong>in</strong>ated by the embedded memory access time or other critical path<br />
<strong>in</strong> the logic. The value wmax may differ between technologies due to the different<br />
rout<strong>in</strong>g and dist<strong>in</strong>ct physical layout (number of LEs <strong>in</strong> LAB). The question is if the<br />
wmax is <strong>in</strong> the range of allowed values for the on-chip memory width of available<br />
FPGAs. In this way we could store and also operate the variables with optimal<br />
word width and achieve the best Area-Time product.<br />
Carry-Save Adder Unit The whole computational complexity of both algo-<br />
rithms lies <strong>in</strong> two additions of three w-bit operands for comput<strong>in</strong>g Si+1. The<br />
propagation of the carry bits between the w adders is (<strong>in</strong> general) too slow. The<br />
implementation of the MWR2MM CSA <strong>in</strong> [108] uses redundant representation of<br />
<strong>in</strong>termediate sum S and carry-save adders [38]. The MWR2MM CSA w-bit PE<br />
architecture based on Full Adders (FAs) is depicted <strong>in</strong> Figure 2 – 3.<br />
In order to reduce the storage size and arithmetic hard<strong>ware</strong> complexity the vari-<br />
ables X, Y , and M are available <strong>in</strong> a non-redundant form. The <strong>in</strong>termediate <strong>in</strong>ternal<br />
sum S is received and generated <strong>in</strong> the redundant form as 1S and 2S. The advantage<br />
of redundant form lies <strong>in</strong> the <strong>in</strong>dependence of the latency from the word length w<br />
as there is no direct connection between the FAs. The output of the adders is valid<br />
right after appearance of the <strong>in</strong>put signals and the delay is given ma<strong>in</strong>ly by <strong>in</strong>ternal<br />
comb<strong>in</strong>ational logic of the FA.<br />
The process<strong>in</strong>g delay may <strong>in</strong>crease for larger w as a result of the broadcast<br />
problem only, it will not depend on the arithmetic operation itself. Conversion<br />
<strong>in</strong>to the normal non-redundant representation is only done at the very end of the<br />
MMM computation. The <strong>in</strong>termediate result of sum S may be further shifted to<br />
other MMM unit as operand X or Y for a new computation (e.g. next iteration<br />
of the modular exponentiation). The redundant representation of variables that<br />
requires twice as much memory as a non-redundant representation and a need for the<br />
transformation to/from redundant form have been considered as the ma<strong>in</strong> drawbacks<br />
27