1 Montgomery Modular Multiplication in Hard- ware
1 Montgomery Modular Multiplication in Hard- ware
1 Montgomery Modular Multiplication in Hard- ware
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
FEI KEMT<br />
from or writ<strong>in</strong>g to a s<strong>in</strong>gle register <strong>in</strong> a specific ECM unit, the unit needs to be<br />
recognised separately by unique address prefix. In comb<strong>in</strong>ation with a address for<br />
each unit, a register has a unique hard<strong>ware</strong> address and can be addressed from<br />
outside the ECM unit. This is imperative s<strong>in</strong>ce the central control logic writes data<br />
to these registers before phase 1 starts and it reads data from one of the registers<br />
after phase 2 has been f<strong>in</strong>ished.<br />
Each register can conta<strong>in</strong> n bits and is organised <strong>in</strong> e = � �<br />
n+1 words of size w<br />
w<br />
(see Figure 4 – 2). Memory access is performed word wise. Reasonable values for<br />
w are w = 4, 8, 16, 32 what is given by the <strong>in</strong>cluded multiplier requir<strong>in</strong>g those word<br />
widths.<br />
0:<br />
1:<br />
e-1:<br />
w bits<br />
w bits<br />
.<br />
.<br />
.<br />
w bits<br />
P1 register: e x w bits<br />
. . . .<br />
0:<br />
1:<br />
e-1:<br />
w bits<br />
w bits<br />
.<br />
.<br />
.<br />
w bits<br />
P21 register: e x w bits<br />
Figure 4 – 2 Organisation of the ECM unit’s memory registers for 21 variables with e words of<br />
width w<br />
The ALU performs the arithmetic modulo 2n, i.e., modular multiplication, mod-<br />
ular squar<strong>in</strong>g, modular addition and subtraction.<br />
4.2.3 Choice of the Arithmetic Algorithms<br />
The ma<strong>in</strong> purpose when we were design<strong>in</strong>g the ECM was to synthesise an area-time<br />
efficient implementation. All algorithms are chosen to allow achievement of a low<br />
area and relatively high speed. Low area consumption can be achieved by structures,<br />
which allow for a certa<strong>in</strong> degree of pipel<strong>in</strong>e and consequently do not require much<br />
memory. For the ECM, we have chosen a set of algorithms which seem to be very well<br />
suited for our purpose. The chosen algorithms are fully scalable and make possible<br />
to analyse different unit parameters and their impact on units performance.<br />
In the follow<strong>in</strong>g, we briefly describe the algorithms for modular addition, subtrac-<br />
tion, and multiplication to be implemented for the ALU. Squar<strong>in</strong>g is done with the<br />
multiplication circuit s<strong>in</strong>ce a separate hard<strong>ware</strong> circuit for squar<strong>in</strong>g would <strong>in</strong>crease<br />
60