1 Montgomery Modular Multiplication in Hard- ware
1 Montgomery Modular Multiplication in Hard- ware
1 Montgomery Modular Multiplication in Hard- ware
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
FEI KEMT<br />
Algorithm 4 – 2 <strong>Modular</strong> addition<br />
Require: Two <strong>in</strong>tegers x, y < 2n<br />
Ensure: Sum z = x + y mod 2n<br />
1: z ⇐ x + y<br />
2: T ⇐ z − 2n<br />
3: if T ≥ 0 then<br />
4: z ⇐ T<br />
5: end if<br />
6: return z<br />
Algorithm 4 – 3 <strong>Modular</strong> subtraction<br />
Require: Two <strong>in</strong>tegers x, y < 2n<br />
Ensure: Difference z = x − y mod 2n<br />
1: T = z ⇐ x − y<br />
2: if z < 0 then<br />
3: z ⇐ T + 2n<br />
4: end if<br />
5: return z<br />
4.2.4 Parallelization of the Algorithm<br />
ECM can be perfectly parallelized by us<strong>in</strong>g different curves <strong>in</strong> parallel s<strong>in</strong>ce the<br />
computations of each unit are completely <strong>in</strong>dependent. For the control of more<br />
than one ECM unit, it is essential to know that both phases, phase 1 and phase 2,<br />
are controlled completely identically, <strong>in</strong>dependent of the composite to be factored.<br />
Solely the curve parameter and possibly the modulus of the units and, hence, the<br />
coord<strong>in</strong>ates of the <strong>in</strong>itial po<strong>in</strong>t differ. Thus, all units have to be <strong>in</strong>itialized differently<br />
which is done by simply writ<strong>in</strong>g the values <strong>in</strong>to the correspond<strong>in</strong>g memory locations<br />
sequentially.<br />
Dur<strong>in</strong>g the execution of both phases, exactly the same commands can be sent to<br />
all units <strong>in</strong> parallel. S<strong>in</strong>ce the runtime of multiplication/squar<strong>in</strong>g is constant (does<br />
not rely on <strong>in</strong>put values) and for addition/subtraction differs at most <strong>in</strong> 2(e + 1)<br />
clock cycles, all units can execute the same command <strong>in</strong> approximately the same<br />
time.<br />
After phase 2, the results are read from the units one after another. The required<br />
time for this data I/O is negligible for one ECM unit s<strong>in</strong>ce the computation time of<br />
both phases dom<strong>in</strong>ates. For several units <strong>in</strong> parallel, the computation time does not<br />
64