1 Montgomery Modular Multiplication in Hard- ware
1 Montgomery Modular Multiplication in Hard- ware
1 Montgomery Modular Multiplication in Hard- ware
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
FEI KEMT<br />
1. Fully soft<strong>ware</strong> solution implemented on a 32-bit Nios processor.<br />
2. Mixed soft<strong>ware</strong>-hard<strong>ware</strong> design with 16-bit Nios processor and the pipel<strong>in</strong>ed<br />
coprocessor <strong>in</strong>clud<strong>in</strong>g the CSA PE.<br />
3. Mixed soft<strong>ware</strong>-hard<strong>ware</strong> design with 16-bit Nios processor and the pipel<strong>in</strong>ed<br />
coprocessor <strong>in</strong>clud<strong>in</strong>g the CPA PE.<br />
Further, we provide the details of each system design and comment the obta<strong>in</strong>ed<br />
results.<br />
1. The soft<strong>ware</strong> implementation of the MMM algorithm has been written <strong>in</strong> the<br />
Nios assembly language by us<strong>in</strong>g all known optimization techniques for the<br />
target processor. The Separated Operand Scann<strong>in</strong>g (SOS) MMM method [39]<br />
was used as the best method for given Nios RISC architecture [66]. The<br />
Table 2 – 5 shows the tim<strong>in</strong>gs for the execution of the MMM on the fully<br />
soft<strong>ware</strong> solution runn<strong>in</strong>g on the processor clocked at 50 MHz. The 32-bit<br />
Nios processor occupies 2137 LEs without the logic for the <strong>in</strong>teger multiplier<br />
(for MUL <strong>in</strong>struction) that requires additional 446 LEs.<br />
In case of the soft<strong>ware</strong> implementation it is effective to apply a different algo-<br />
rithms for the multiplication and squar<strong>in</strong>g what reduces the execution time for<br />
the squar<strong>in</strong>g operation. However due to vulnerability aga<strong>in</strong>st the side-channel<br />
attacks it is better to align the execution times of both operations.<br />
Table 2 – 5 Execution times of soft<strong>ware</strong> implementation of MMM on Altera Nios development<br />
board (with APEX EP20K200 clocked at 50 MHz)<br />
Length Method <strong>Multiplication</strong> Squar<strong>in</strong>g<br />
(e × w) (ms) (ms)<br />
1024 SOS32MEM 2.40 1.87<br />
2048 SOS32MEM 9.47 7.24<br />
2. In the mixed hard<strong>ware</strong>-soft<strong>ware</strong> design the multiplication and squar<strong>in</strong>g is com-<br />
pletely implemented <strong>in</strong> the hard<strong>ware</strong>. Both operations share the same arith-<br />
metic unit. Due to move of the computational complexity from the ma<strong>in</strong> pro-<br />
cessor to the dedicated coprocessor one does not need to use the 32-bit version<br />
of the Nios core. Instead of the 32-bit controller one can <strong>in</strong>clude the 16-bit<br />
40