1 Montgomery Modular Multiplication in Hard- ware
1 Montgomery Modular Multiplication in Hard- ware
1 Montgomery Modular Multiplication in Hard- ware
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
FEI KEMT<br />
by the radix b changes to a check of the LSB. In the Step 4 the division is replaced<br />
by a simple right shift operation.<br />
The formulation that describes the radix-2 algorithm was used as the start<strong>in</strong>g<br />
po<strong>in</strong>t for derivation of a scalable design comput<strong>in</strong>g the MMM presented <strong>in</strong> [108,109].<br />
Later we will discuss the features of such scalable architecture. Before that, we make<br />
a closer look at the operations of the algorithm and consider their modifications so<br />
they are better suitable for efficient execution on chosen FPGA hard<strong>ware</strong> platform.<br />
The decision whether perform an addition of the modulus M to the temporal<br />
sum Si+1 is based on the value of the variable qi that can be simply implemented.<br />
The test checks the LSB of the partial sum Si+1 = Si + xiY and stores it as variable<br />
qi once the addition of xiY is f<strong>in</strong>ished (see step 3 of the Algorithm 1 – 3). The stored<br />
value decides on the addition of M <strong>in</strong> the follow<strong>in</strong>g iteration of the loop.<br />
However, the second condition (see step 6 of the Algorithm 1 – 3) causes a prob-<br />
lem for a possible pipel<strong>in</strong>ed execution of computations. After the loop of additions,<br />
multiplications and shifts, the mentioned comparison and subsequent conditional<br />
subtraction is required. Without the f<strong>in</strong>al reduction step the outcome of the <strong>in</strong>ner<br />
loop of multiplication can provide an improper <strong>in</strong>put for the subsequent multipli-<br />
cation operation. That may happen <strong>in</strong> the case when the f<strong>in</strong>al value of S is bigger<br />
than M (S > M). We have <strong>in</strong>tention to use the MMM <strong>in</strong> a series of multiplica-<br />
tions when the transformation <strong>in</strong>to the <strong>Montgomery</strong> doma<strong>in</strong> br<strong>in</strong>gs profit over an<br />
expensive reduction as it was showed <strong>in</strong> the Algorithm 1 – 1. Therefore we analyse<br />
possibilities for omitt<strong>in</strong>g the f<strong>in</strong>al condition step by changes <strong>in</strong> the Algorithm 1 – 3<br />
and make possible a use of pipel<strong>in</strong>ed multipliers.<br />
Algorithm Modifications The MMM algorithm (Algorithm 1 – 2) <strong>in</strong>troduced<br />
earlier is further extended. Two variants of the algorithm are discussed and im-<br />
plemented, both support<strong>in</strong>g scalable multiple-word oriented implementation, but<br />
handl<strong>in</strong>g a carry process<strong>in</strong>g <strong>in</strong> different ways.<br />
In the modified Algorithm 1 – 4 we use the follow<strong>in</strong>g <strong>in</strong>put operands:<br />
k�<br />
X = xi2<br />
i=0<br />
i = (0, 0, xk, xk−1, . . . , x1, x0) < 2M , (1.14)<br />
�Y =<br />
k�<br />
�yi2 i+1 = (yk, . . . , y1, y0, 0) < 4M , (1.15)<br />
i=0<br />
where R = 2 k+3 , Y < 2M, and 2 k−1 < M < 2 k is an k-bit number (the same as<br />
<strong>in</strong> the Algorithm 1 – 3). Note that � Y <strong>in</strong> Equation 1.15 is a left shifted version of<br />
14