04.11.2012 Views

1 Montgomery Modular Multiplication in Hard- ware

1 Montgomery Modular Multiplication in Hard- ware

1 Montgomery Modular Multiplication in Hard- ware

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

FEI KEMT<br />

Algorithm 4 – 2 <strong>Modular</strong> addition<br />

Require: Two <strong>in</strong>tegers x, y < 2n<br />

Ensure: Sum z = x + y mod 2n<br />

1: z ⇐ x + y<br />

2: T ⇐ z − 2n<br />

3: if T ≥ 0 then<br />

4: z ⇐ T<br />

5: end if<br />

6: return z<br />

Algorithm 4 – 3 <strong>Modular</strong> subtraction<br />

Require: Two <strong>in</strong>tegers x, y < 2n<br />

Ensure: Difference z = x − y mod 2n<br />

1: T = z ⇐ x − y<br />

2: if z < 0 then<br />

3: z ⇐ T + 2n<br />

4: end if<br />

5: return z<br />

4.2.4 Parallelization of the Algorithm<br />

ECM can be perfectly parallelized by us<strong>in</strong>g different curves <strong>in</strong> parallel s<strong>in</strong>ce the<br />

computations of each unit are completely <strong>in</strong>dependent. For the control of more<br />

than one ECM unit, it is essential to know that both phases, phase 1 and phase 2,<br />

are controlled completely identically, <strong>in</strong>dependent of the composite to be factored.<br />

Solely the curve parameter and possibly the modulus of the units and, hence, the<br />

coord<strong>in</strong>ates of the <strong>in</strong>itial po<strong>in</strong>t differ. Thus, all units have to be <strong>in</strong>itialized differently<br />

which is done by simply writ<strong>in</strong>g the values <strong>in</strong>to the correspond<strong>in</strong>g memory locations<br />

sequentially.<br />

Dur<strong>in</strong>g the execution of both phases, exactly the same commands can be sent to<br />

all units <strong>in</strong> parallel. S<strong>in</strong>ce the runtime of multiplication/squar<strong>in</strong>g is constant (does<br />

not rely on <strong>in</strong>put values) and for addition/subtraction differs at most <strong>in</strong> 2(e + 1)<br />

clock cycles, all units can execute the same command <strong>in</strong> approximately the same<br />

time.<br />

After phase 2, the results are read from the units one after another. The required<br />

time for this data I/O is negligible for one ECM unit s<strong>in</strong>ce the computation time of<br />

both phases dom<strong>in</strong>ates. For several units <strong>in</strong> parallel, the computation time does not<br />

64

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!