04.11.2012 Views

1 Montgomery Modular Multiplication in Hard- ware

1 Montgomery Modular Multiplication in Hard- ware

1 Montgomery Modular Multiplication in Hard- ware

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

FEI KEMT<br />

by the radix b changes to a check of the LSB. In the Step 4 the division is replaced<br />

by a simple right shift operation.<br />

The formulation that describes the radix-2 algorithm was used as the start<strong>in</strong>g<br />

po<strong>in</strong>t for derivation of a scalable design comput<strong>in</strong>g the MMM presented <strong>in</strong> [108,109].<br />

Later we will discuss the features of such scalable architecture. Before that, we make<br />

a closer look at the operations of the algorithm and consider their modifications so<br />

they are better suitable for efficient execution on chosen FPGA hard<strong>ware</strong> platform.<br />

The decision whether perform an addition of the modulus M to the temporal<br />

sum Si+1 is based on the value of the variable qi that can be simply implemented.<br />

The test checks the LSB of the partial sum Si+1 = Si + xiY and stores it as variable<br />

qi once the addition of xiY is f<strong>in</strong>ished (see step 3 of the Algorithm 1 – 3). The stored<br />

value decides on the addition of M <strong>in</strong> the follow<strong>in</strong>g iteration of the loop.<br />

However, the second condition (see step 6 of the Algorithm 1 – 3) causes a prob-<br />

lem for a possible pipel<strong>in</strong>ed execution of computations. After the loop of additions,<br />

multiplications and shifts, the mentioned comparison and subsequent conditional<br />

subtraction is required. Without the f<strong>in</strong>al reduction step the outcome of the <strong>in</strong>ner<br />

loop of multiplication can provide an improper <strong>in</strong>put for the subsequent multipli-<br />

cation operation. That may happen <strong>in</strong> the case when the f<strong>in</strong>al value of S is bigger<br />

than M (S > M). We have <strong>in</strong>tention to use the MMM <strong>in</strong> a series of multiplica-<br />

tions when the transformation <strong>in</strong>to the <strong>Montgomery</strong> doma<strong>in</strong> br<strong>in</strong>gs profit over an<br />

expensive reduction as it was showed <strong>in</strong> the Algorithm 1 – 1. Therefore we analyse<br />

possibilities for omitt<strong>in</strong>g the f<strong>in</strong>al condition step by changes <strong>in</strong> the Algorithm 1 – 3<br />

and make possible a use of pipel<strong>in</strong>ed multipliers.<br />

Algorithm Modifications The MMM algorithm (Algorithm 1 – 2) <strong>in</strong>troduced<br />

earlier is further extended. Two variants of the algorithm are discussed and im-<br />

plemented, both support<strong>in</strong>g scalable multiple-word oriented implementation, but<br />

handl<strong>in</strong>g a carry process<strong>in</strong>g <strong>in</strong> different ways.<br />

In the modified Algorithm 1 – 4 we use the follow<strong>in</strong>g <strong>in</strong>put operands:<br />

k�<br />

X = xi2<br />

i=0<br />

i = (0, 0, xk, xk−1, . . . , x1, x0) < 2M , (1.14)<br />

�Y =<br />

k�<br />

�yi2 i+1 = (yk, . . . , y1, y0, 0) < 4M , (1.15)<br />

i=0<br />

where R = 2 k+3 , Y < 2M, and 2 k−1 < M < 2 k is an k-bit number (the same as<br />

<strong>in</strong> the Algorithm 1 – 3). Note that � Y <strong>in</strong> Equation 1.15 is a left shifted version of<br />

14

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!