04.11.2012 Views

1 Montgomery Modular Multiplication in Hard- ware

1 Montgomery Modular Multiplication in Hard- ware

1 Montgomery Modular Multiplication in Hard- ware

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

FEI KEMT<br />

units are available, the total execution time TMMM will <strong>in</strong>crease. On the other<br />

hand the area occupation of the coprocessor can be changed accord<strong>in</strong>g to the area<br />

constra<strong>in</strong>ts of the target device. Implementation of n < nmax stages means also<br />

more operations needed for read<strong>in</strong>g from and stor<strong>in</strong>g <strong>in</strong> the memory. Shift<strong>in</strong>g the<br />

processed data between the stages is faster than stor<strong>in</strong>g the <strong>in</strong>termediate results <strong>in</strong><br />

the memory block and their repeated read<strong>in</strong>g to f<strong>in</strong>ish the computations on them.<br />

Therefore the best performance is achieved <strong>in</strong> design with maximal number of stages<br />

nmax (n = nmax).<br />

Parametrisation The MMM coprocessor has three variable parameters (w, e, and<br />

n) that can be chosen for any implementation. Accord<strong>in</strong>g to the required area of<br />

the implemented coprocessor and the required tim<strong>in</strong>gs for the MMM computations<br />

the number of pipel<strong>in</strong>ed stages and the word width (n, w) can be chosen. The<br />

security level of public-key algorithm def<strong>in</strong>es the length of operands for the multiplier<br />

(k = we). This approach gives high flexibility to the processor and coprocessor<br />

design.<br />

In general, there are two possible approaches how to <strong>in</strong>crease the speed of the<br />

MMM computation <strong>in</strong> the proposed designs (check Equation 2.4 to understand the<br />

relations between the design parameters and the computation time TMMM):<br />

1. To <strong>in</strong>crease the word length w. In this way the number of iterations given by<br />

e is reduced what yields a shorter computation time. While the older FPGAs<br />

provide memory blocks with dual port memory feature and configurable word<br />

lengths only up to 16 bits (Altera Apex [8]), <strong>in</strong> the high-performance models<br />

it can be up to 32 bits for middle-sized blocks or 128 bits for large memory<br />

blocks (Altera Stratix II [20]). S<strong>in</strong>ce the capacity of the block is sufficient<br />

for typical RSA operands it makes sense to use only one block per operand.<br />

In case of an older technology with smaller memory blocks and chosen bigger<br />

word width (16 < w ≤ 32) two memory blocks per variable aare required.<br />

In dependency of the memory configuration several variables may share one<br />

memory block. Operands mapp<strong>in</strong>g to the memory is especially important for<br />

constra<strong>in</strong>ed SOC designs with limited number of memory blocks.<br />

2. To <strong>in</strong>crease the number of pipel<strong>in</strong>ed stages n. The hard<strong>ware</strong> structure of the<br />

PE for both solutions (CSA PE and CPA PE) is relatively simple and fast<br />

and <strong>in</strong>dependent on the number of stages, what was a condition for a scalable<br />

design. An addition of several pipel<strong>in</strong>ed stages can <strong>in</strong>crease the overall speed,<br />

31

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!