08.03.2014 Views

FPGA based Hardware Accleration for Elliptic Curve Cryptography ...

FPGA based Hardware Accleration for Elliptic Curve Cryptography ...

FPGA based Hardware Accleration for Elliptic Curve Cryptography ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.3. SEQUENTIAL MULTIPLICATION SCHEMES 18<br />

to the rectangles determine the indices of the segments, whose sums have been multiplied. E.g., the label<br />

”123” represents termv<br />

YP¾ v 3 ¾ v<br />

8 =A| Y4¾ | 3 ¾ |<br />

8 the , which denoted<br />

8 ©øYSv @ | is in Eqn. 2.14. The<br />

horizontal position of a rectangle represents exponent¯<br />

the of the associated ò 5 ±<br />

factor . E.g., the rectangle<br />

in the lower left edge labeled ”3” together with its position denotes the v<br />

8 ¾ |<br />

8 Pò 5 term . The<br />

} 6 v |<br />

result<br />

is computed by summing up (XORing) all the terms according to their horizontal position. This<br />

final is product segments wide, as one would expect. The partial products can be reordered as shown in<br />

Fig. 2.4d. This order was achieved from a consideration of three optimization criteria.<br />

First, most partial products are added two times to compute the final result. They can be grouped together<br />

and placed in one of three patterns, which are indicated in Fig. 2.4d. This is true <strong>for</strong> all instances of the<br />

MSK algorithm (again this has been evaluated semi-manually by a C program <strong>for</strong> Õ c II<br />

any ). In the<br />

architecture detailed in Sec. 3, these patterns are computed by some additional combinational logic, which<br />

is connected to the output of the combinational multiplier.<br />

Second, the resulting patterns are ordered descending¯<br />

by of their ò factor<br />

±<br />

. In this way, the product can<br />

be accumulated easily in a shift register.<br />

5<br />

As the third optimization criterion the remaining degree of freedom is taken advantage of in the following<br />

way: The patterns are once more reordered, such that when iterating over them from top to bottom, one of<br />

two conditions holds: Either the current pattern is constructed from a single segment (e.g.v<br />

®j | ® product ,<br />

but v ®j¾ v YXœ| ®j¾ | Y not ) or the set of indices of the pattern segments differs only at one index from<br />

its predecessor (as in the productsv<br />

® | ® andv<br />

®¾ v Y[#=| ®:¾ | YX partial ). Since this criterion can not<br />

always be met <strong>for</strong> all segments some accumulation steps take one additional cycle. However it can be shown<br />

that it is always possible to reorder the segments in a way that either the sum of up to two single segments or<br />

at most two additional segments need to be accumulated. A fact that already has been proven and has been<br />

evaluated <strong>for</strong> interesting all , too.<br />

By applying the third optimization criterion to the pattern sequence, the partial product computations<br />

can be per<strong>for</strong>med as follows: By + placing -bit accumulator registers at the inputs of the combinational<br />

multiplier, from which each can add up one segment to the current value or load one new segment in a<br />

single clock cycle, terms<br />

ô © v @ | the can be computed iteratively in a pipelined fashion (see Fig. 3.2).<br />

This results in a two stage pipelined design <strong>for</strong> the complete datapath and yields a total cZ of clock cycles<br />

to per<strong>for</strong>m one multiplication the! #"'$ using .<br />

The MSK scheme has a slight per<strong>for</strong>mance disadvantage in terms of + required -bit multiplications in<br />

comparison to the classical Karatsuba algorithm (11% ! #"¨$ <strong>for</strong> and 33% ! #" <strong>for</strong> ), but there are<br />

considerable benefits:<br />

First, the number segments of that the polynomials are divided into is not limited to be a power of two,<br />

but can be any natural number when the MSK scheme is applied. With respect to a HW implementation<br />

this provides more flexibility concerning the selection of system parameters. Like stated be<strong>for</strong>e, segment<br />

counts in range<br />

E JBþ4@\–\–@PR the provide the best results; a fact that can be uniquely exploited by the MSK<br />

approach.<br />

Second, each time an additional level of recursion unrolling is applied to the classical Karatsuba algorithm,<br />

two new patterns occur in the multiplication scheme, whose size is growing exponentially by a factor of 2<br />

(compare Fig. 2.5 to Fig. 2.4b.) In contrast, <strong>for</strong> any of value the number of different patterns will<br />

exceedþ<br />

never<br />

in case of the MSK scheme. This fact allows the efficient multiplication of polynomials of different<br />

degrees on the same datapath: If, e.g., the underlying supports datapath any& œ" segments, scheme<br />

x Õ <br />

<strong>for</strong><br />

can be per<strong>for</strong>med just by modification of the controller which is running the MSK algorithm.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!