FPGA based Hardware Accleration for Elliptic Curve Cryptography ...

2.3. SEQUENTIAL MULTIPLICATION SCHEMES 18 

to the rectangles determine the indices of the segments, whose sums have been multiplied. E.g., the label 

”123” represents termv 

YP¾ v 3 ¾ v 

8 =A| Y4¾ | 3 ¾ | 

8 the , which denoted 

8 ©øYSv @ | is in Eqn. 2.14. The 

horizontal position of a rectangle represents exponent¯ 

the of the associated ò 5 ± 

factor . E.g., the rectangle 

in the lower left edge labeled ”3” together with its position denotes the v 

8 ¾ | 

8 Pò 5 term . The 

} 6 v | 

result 

is computed by summing up (XORing) all the terms according to their horizontal position. This 

final is product segments wide, as one would expect. The partial products can be reordered as shown in 

Fig. 2.4d. This order was achieved from a consideration of three optimization criteria. 

First, most partial products are added two times to compute the final result. They can be grouped together 

and placed in one of three patterns, which are indicated in Fig. 2.4d. This is true for all instances of the 

MSK algorithm (again this has been evaluated semi-manually by a C program for Õ c II 

any ). In the 

architecture detailed in Sec. 3, these patterns are computed by some additional combinational logic, which 

is connected to the output of the combinational multiplier. 

Second, the resulting patterns are ordered descending¯ 

by of their ò factor 

± 

. In this way, the product can 

be accumulated easily in a shift register. 

5 

As the third optimization criterion the remaining degree of freedom is taken advantage of in the following 

way: The patterns are once more reordered, such that when iterating over them from top to bottom, one of 

two conditions holds: Either the current pattern is constructed from a single segment (e.g.v 

®j | ® product , 

but v ®j¾ v YXœ| ®j¾ | Y not ) or the set of indices of the pattern segments differs only at one index from 

its predecessor (as in the productsv 

® | ® andv 

®¾ v Y[#=| ®:¾ | YX partial ). Since this criterion can not 

always be met for all segments some accumulation steps take one additional cycle. However it can be shown 

that it is always possible to reorder the segments in a way that either the sum of up to two single segments or 

at most two additional segments need to be accumulated. A fact that already has been proven and has been 

evaluated for interesting all , too. 

By applying the third optimization criterion to the pattern sequence, the partial product computations 

can be performed as follows: By + placing -bit accumulator registers at the inputs of the combinational 

multiplier, from which each can add up one segment to the current value or load one new segment in a 

single clock cycle, terms 

ô © v @ | the can be computed iteratively in a pipelined fashion (see Fig. 3.2). 

This results in a two stage pipelined design for the complete datapath and yields a total cZ of clock cycles 

to perform one multiplication the! #"'$ using . 

The MSK scheme has a slight performance disadvantage in terms of + required -bit multiplications in 

comparison to the classical Karatsuba algorithm (11% ! #"¨$ for and 33% ! #" for ), but there are 

considerable benefits: 

First, the number segments of that the polynomials are divided into is not limited to be a power of two, 

but can be any natural number when the MSK scheme is applied. With respect to a HW implementation 

this provides more flexibility concerning the selection of system parameters. Like stated before, segment 

counts in range 

E JBþ4@\–\–@PR the provide the best results; a fact that can be uniquely exploited by the MSK 

approach. 

Second, each time an additional level of recursion unrolling is applied to the classical Karatsuba algorithm, 

two new patterns occur in the multiplication scheme, whose size is growing exponentially by a factor of 2 

(compare Fig. 2.5 to Fig. 2.4b.) In contrast, for any of value the number of different patterns will 

exceedþ 

never 

in case of the MSK scheme. This fact allows the efficient multiplication of polynomials of different 

degrees on the same datapath: If, e.g., the underlying supports datapath any& œ" segments, scheme 

x Õ 

for 

can be performed just by modification of the controller which is running the MSK algorithm.

Previous page

Next page

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

FPGA based Hardware Accleration for Elliptic Curve Cryptography ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?