17.01.2013 Views

SIMPLE PLL-BASED TRUE RANDOM NUMBER ... - KEMT FEI TUKE

SIMPLE PLL-BASED TRUE RANDOM NUMBER ... - KEMT FEI TUKE

SIMPLE PLL-BASED TRUE RANDOM NUMBER ... - KEMT FEI TUKE

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>SIMPLE</strong> <strong>PLL</strong>-<strong>BASED</strong> <strong>TRUE</strong> <strong>RANDOM</strong> <strong>NUMBER</strong> GENERATOR FOR<br />

EMBEDDED DIGITAL SYSTEMS<br />

Viktor Fischer * , Miloš Drutarovský ** , Martin Šimka ** and Frédéric Celle *<br />

*<br />

Laboratoire Traitement du Signal et<br />

Instrumentation, Unité Mixte de<br />

Recherche CNRS 5516,<br />

Université Jean Monnet<br />

10, rue Badouin,<br />

42 000 Saint-Etienne, France<br />

** Department of Electronics and<br />

Multimedia Communications,<br />

Technical University of Košice<br />

Park Komenského 13,<br />

04120 Košice, Slovak Republic<br />

{Milos.Drutarovsky, Martin.Simka}<br />

{fischer, celle}@univ-st-etienne.fr @tuke.sk<br />

Abstract. The paper presents a simple true random number generator (TRNG)<br />

which can be embedded in digital Application Specific Integrated Circuits<br />

(ASICs) and Field Programmable Logic Devices (FPLDs). As a source of<br />

randomness, it uses on-chip noise generated in the internal analog phaselocked<br />

loop (<strong>PLL</strong>) circuitry. In contrast with traditionally used free running<br />

oscillators, it uses a novel method of randomness extraction based on two<br />

rationally related synthesized clock signals. The generator has been developed<br />

for embedded cryptographic applications, where it significantly increases the<br />

system security, but it can be used in a wide range of applications. The quality<br />

of TRNG output is confirmed by applying special statistical tests, which pass<br />

even for high output bit-rates of several hundreds of kilobits per second.<br />

1 Introduction<br />

Producing unpredictable, i.e. irreproducible uniformly distributed, random number<br />

sequences on hardware is one of the central issues of cryptographic software and hardware<br />

implementations. Random numbers are needed in particular for key generation,<br />

authentication protocols, zero-knowledge protocols, padding, in many digital signature<br />

schemes and even in some encryption algorithms. In all these applications, security greatly<br />

depends on the quality of the source of random numbers.<br />

TRNGs can be produced using any non-deterministic process. The most typical<br />

implementations use quantum mechanics [1] which can produce up to one Mbit per second.<br />

Radioactivity can also provide true random bits [2] but at a much lower rate of several<br />

hundreds of bits per second. Many other hardware generators exist. They usually exploit the<br />

thermal noise (resistance or shoot) in electronic devices. In the well-known Intel Random<br />

Number Generator [3] a slow clock signal modulated by a thermal noise is sampled using<br />

fast clock signal from a free running oscillator. The drift between the two clocks provides<br />

the source of random bits with an average rate of 75 Kbits per second.


Our aim was to find a solution completely embedded in a digital circuit. These circuits<br />

are well suited for implementation of so called pseudo-random number generators.<br />

However, pseudo-random number generators cannot provide sufficient security, because<br />

they are predictable. Digital circuits include only limited sources of randomness, e. g.<br />

metastability, frequency of a free-running oscillator, clock jitter, etc. Usually, reliable<br />

generators based on the metastability and/or frequency instability are difficult to implement<br />

and they are not secure enough for cryptographic applications. The entropy increase is not<br />

sufficiently high in some cases, so the output of generator can be predicted [4]. In another<br />

case [5] off-chip components are required what decreases a cryptographic security of the<br />

implementation.<br />

In contrast with these methods, in [6] we have proposed a novel method of randomness<br />

extraction based on two rationally related synthesized clock signals. It was shown that it is<br />

perfectly suited for modern FPLDs with the internal analog phase-locked loop (<strong>PLL</strong>)<br />

circuitry (e.g. Apex or Stratix FPLDs from Altera). The generator provided random data<br />

with only small deviations from ideal one.<br />

In this paper we present a modified version of our generator [6] having simplified<br />

structure and higher data rate. It is based on an extended knowledge of jitter features<br />

obtained by measurements in real conditions on Altera development boards.<br />

2 The <strong>PLL</strong>-synthesized clock jitter<br />

2.1 Analog <strong>PLL</strong>s embedded in digital circuits<br />

New digital VLSI circuits use advanced clock generation and distribution circuitry based on<br />

embedded analog <strong>PLL</strong>s [7], [8], [9], [10]. A simplified block diagram of one analog based<br />

<strong>PLL</strong> block available in advanced digital circuits is depicted in Figure 1. Each <strong>PLL</strong> block<br />

can provide at least one synthesized clock signal with frequency F OUT :<br />

m KM<br />

FOUT = FIN = FIN<br />

(1)<br />

n× k KD<br />

where F IN is the frequency of the external input clock source. Reference-, feedback- and<br />

post-divider values n, m and k can vary from one to several hundreds in FPGAs [9], [10], or<br />

to several thousands in ASICs [8].<br />

:n<br />

Input Clock<br />

FIN<br />

Phase<br />

Frequency<br />

Detector<br />

:m<br />

Charge<br />

Pump<br />

&<br />

Loop<br />

VCO :k<br />

Filter Output Clock<br />

ClockShift<br />

Circuitry<br />

FOUT = FIN m<br />

n × k<br />

Figure 1: Simplified block diagram of an embedded <strong>PLL</strong> circuitry<br />

2.2 Jitter of <strong>PLL</strong> synthesized clock signals<br />

In analog <strong>PLL</strong>s, a noise causes the Voltage Controlled Oscillator (VCO) to fluctuate in<br />

frequency. Other frequency fluctuations are caused by variations of supply voltage,<br />

temperature and by a noisy environment. The internal control circuitry adjusts the VCO


ack to the specified frequency, but a certain part of the fluctuations caused by nondeterministic<br />

noise cannot be compensated for and is seen as a clock jitter.<br />

The size of the intrinsic jitter depends on the quality factor Q of the VCO, on the<br />

bandwidth of the Loop Filter and on the so-called pattern jitter introduced by the Phase<br />

Frequency Detector. It is often given in peak-to-peak value or 1-sigma (or RMS) value. 1sigma<br />

value of the jitter ( σ jit ) depends on the technology and the configuration of the <strong>PLL</strong><br />

and it can range from 3.5 ps to 10 ps for ASICs [8] up to 50 ps for FPGAs [9], [10]. Since<br />

the technology of the <strong>PLL</strong> and the quality of the VCO is usually defined, a user can change<br />

the output jitter by modification of divider values and filter bandwidth.<br />

For example the jitter in an Apex FPLD has 1-sigma value of σ jit ≈15.9<br />

ps for a<br />

F OUT = 66.6 MHz clock signal and multiplication factor 2 [11]. These results were acquired<br />

under “ideal conditions”, with only a minimal amount of FPLD resources occupied and<br />

minimal input/output activities. Our last measurements (cf. in subsection 2.3) show that the<br />

clock jitter in an Apex FPLD is significantly higher (about 140 ps ) for higher multiplication<br />

factors and when internal FPLD flip-flops are switching on different clock frequencies.<br />

2.3 Jitter size measurement<br />

Since the knowledge of the jitter size and its statistical features is crucial for correct settings<br />

of the TRNG parameters, several measurements have been made for different<br />

configurations of the <strong>PLL</strong> (different values KM<br />

and KD<br />

). We used Agilent Infiniium DCA<br />

86100B wide bandwidth oscilloscope and the Nios development board [12] with Apex<br />

EP20K200 device. The 1-sigma value of the jitter measured at the output of the <strong>PLL</strong> has<br />

achieved values σ jit ≈ 32 ps for FIN = 33. 3 MHz<br />

and KM K D = 21.<br />

For ratios<br />

KM K D = 32,43,54,65…<br />

the 1-sigma value was in the range σ jit ≈47 −64ps<br />

and the<br />

jitter has approximately Gaussian distribution as it is illustrated in Figure 2a) for<br />

configuration M D 65 = K K .<br />

However, for M D 157 48 = K K used in [6], the jitter was σ jit ≈140<br />

ps and it exhibited<br />

two peaks with a total size of 600<br />

ps (peak-to-peak). The obtained jitter distribution is<br />

depicted in Figure 2b) and it is compatible with a reference measurement made by Xilinx<br />

on Altera FPLDs [11], where a two-peak jitter distribution has been documented. Since the<br />

jitter included in the clock signal in real conditions was significantly higher than that<br />

documented by Altera (note, that Altera has used the same kind of development board, only<br />

K M and KD<br />

parameters were different), we could significantly reduce the generator<br />

complexity. As far as K M and K D parameters are chosen properly, the proposed method is<br />

insensitive to the jitter distribution (see section 3 and [6]).<br />

a) b)<br />

Figure 2: The jitter probability distribution for a) 65<br />

M D<br />

K K = b) K K = 157 48<br />

M D


3 Randomness extraction from the <strong>PLL</strong>-generated clock jitter<br />

The basic principle behind our method is to extract the randomness from the jitter of the<br />

clock signal synthesized in the embedded analog <strong>PLL</strong>. The jitter is detected by the sampling<br />

of a reference (clock) signal using a rationally related (clock) signal synthesized in the onchip<br />

analog <strong>PLL</strong>. The fundamental problem lies in the fact that the reference signal has to<br />

be sampled near the edges influenced by the jitter. The basic structure of the random<br />

bitstream generator is depicted in Figure 3.<br />

CLK<br />

CLJ<br />

<strong>PLL</strong> D<br />

CLK<br />

Q<br />

D<br />

CLK<br />

Q<br />

q(nTCLK)<br />

Decimator<br />

(NKD)<br />

x(nNTQ)<br />

Figure 3: Basic structure of random bitstream generator using <strong>PLL</strong>-synthesized low-jitter clock signal<br />

Because there is a probability that the first flip-flop could become metastable, the<br />

second flip-flop is cascaded. In case the first flip-flop produces a metastable output, it can<br />

resolve until its output is clocked by the second flip-flop. This flip-flops connection does<br />

not assure that only stable signal is clocked, but the probability that the output qn ( T CLK ) will<br />

get a valid state is much higher [13].<br />

Let CLJ be an on-chip <strong>PLL</strong>-synthesized rectangular clock waveform with the frequency<br />

FCLJ = FCLK KM/ KD<br />

(2)<br />

where CLK is a reference clock signal and parameters K M and KD<br />

defined in (1) are<br />

related to the <strong>PLL</strong> structure. Signal CLJ is sampled into the D flip-flop using a clock signal<br />

with frequency F CLK . There are KD rising edges of CLK signal and 2K M edges (rising and<br />

falling) of CLJ waveform during time period<br />

TQ = KDTCLK = KMTCLJ (3)<br />

It has been shown in [6] that if K M and KD<br />

are relative primes, the set of samples create an<br />

equidistant set of values with the distance step<br />

TCLK TCLJ<br />

d = GCD( 2 KM, KD) = GCD( 2 KM,<br />

K D)<br />

(4)<br />

2KM 2KD<br />

where GCD means Greatest Common Divisor. It has been shown that the worst-case<br />

distance between the two closest edges of CLK and CLJ during the period T Q is given as<br />

( )<br />

If KM, K D and F CLJ are chosen so, that<br />

σ > MAX ( ∆ T )<br />

MAX ∆ T = d /2<br />

(5)<br />

jit<br />

we can guarantee that during T Q the sampling edge of CLK will fall at least once into the<br />

edge zone of CLJ (the edge zone means the time interval around the edge with a width<br />

smaller than σ jit ). Therefore during the period T Q , D values of CLJ will be sampled into<br />

the first D flip-flop and at least one of them will statistically depend on the random jitter, so<br />

the output value of the second flip-flop will be nondeterministic. In [6] we used<br />

delay elements to increase the probability of overlapping of and CLJ edge zones.<br />

K<br />

qnT ( CLK )<br />

CLK<br />

min<br />

min<br />

(6)


Thanks to the measurements we got the information about the jitter value, and also about<br />

the probability of edges overlapping that both are higher as expected. Therefore the delay<br />

line is not needed anymore. The decimated output signal<br />

( Q) = ( Q) ⊕ ( Q − CLK) … ⊕ ( Q−( D−1) CLK)<br />

xnT qnT qnT T qnT K T<br />

which is generated at the output of an Exclusive-OR (XOR)-based decimator as a bit-wise<br />

addition modulo 2 ( ⊕ ) of K D samples ( ) . q sampled with the frequency F CLK , will be<br />

nondeterministic, too.<br />

It can be seen that in reference to [6] we have changed basic structure of the generator<br />

in several ways:<br />

- because of higher jitter value, which has been approved by precise jitter<br />

measurements we could replace the delay line and the bank of flip-flops by a single<br />

flip-flop,<br />

- the metastability behavior of the signal qn ( T CLK ) and thus of the generator in general<br />

was improved by addition of the second flip-flop,<br />

- we have validated that if condition (6) is fulfilled, the speed of the generator can be<br />

increased by reducing the decimation factor to NK D , where N = 1 is number of T Q<br />

periods.<br />

4 TRNG implementation<br />

We have validated our simplified structure of the random bitstream generator using Altera<br />

analog <strong>PLL</strong>s embedded in Apex E family. We have used an evaluation board with a PC<br />

Card interface. The Apex EP20K160 device has included generator, 16x128-bit FIFO, PC<br />

Card interface and a custom logic. As the best option a 2-<strong>PLL</strong> configuration (shown in<br />

Figure 4) with only one input clock signal has been chosen.<br />

Synthesized clock signals CLK and CLJ are not fed out from the FPLD (in design<br />

presented in [6] one synthesized signal has been fed out of the device and again reused in<br />

FPLD, what is definitely less secure). Therefore they cannot be manipulated separately<br />

without a circuit reconfiguration. This fact is very important for cryptographic applications,<br />

because it significantly improves overall system security.<br />

Oscil<br />

40 MHz<br />

CLK4p<br />

CLK2p (NC)<br />

CLKLK_FB2p (NC)<br />

CLKLK_OUT2p (NC)<br />

APEX EP20K160E-2X<br />

<strong>PLL</strong>4<br />

clk1<br />

inclk<br />

clk0<br />

<strong>PLL</strong>2<br />

clk1<br />

inclk<br />

clk0<br />

fCLJ<br />

(96,4 MHz)<br />

Random<br />

bitstream<br />

generator<br />

fCLK (95 MHz)<br />

S/P<br />

Conv.<br />

Serial output 454 kbits/s<br />

Custom logic<br />

&<br />

PC Card interface<br />

FIFO<br />

16x128<br />

Figure 4: Block diagram of the experimental PC card with Apex EP20K160 ETC144-2x<br />

(7)


Multiplication and division factors for individual output signals were selected as follows<br />

(the first number represents the <strong>PLL</strong> index and the second number the output index):<br />

clk 20 = 40× 53 22 = 96.36( MHz)<br />

clk 40 = 40× 19 38 = 20( MHz)<br />

clk 41= 40× 19 8 = 95( MHz)<br />

Therefore MAX ( ∆Tmin ) ≈ 12 ps (see equations (4) and (5)). This value is much smaller than<br />

measured jitter value σ jit (see section 2.3). The TRNG was implemented for N = 1 and so<br />

RTRNG ≈ 45400bit/s<br />

(about six times more than in [6]). This was done intentionally in order<br />

to check generated output bitstream under more stringent conditions.<br />

The FPLD resource requirements of the proposed TRNG blocks as well as supporting<br />

logic (FIFO and control logic) are shown in Table 1. The first four columns show resource<br />

requirements (Logic Cells, Embedded System Blocks (ESB), <strong>PLL</strong> outputs and Global<br />

Clocks used) for the generator. The other four columns give requirements of the complete<br />

circuit including FIFO buffer and data bus controller. Presented results have been obtained<br />

using Altera Quartus II v. 2.2.<br />

Table 1: TRNG resource requirements<br />

TRNG only TRNG + FIFO and logic<br />

LC ESB <strong>PLL</strong> GCLK LC ESB <strong>PLL</strong> GCLK<br />

19 0 2 2 129 2 2 3<br />

While 19 logic cells represent less than 1% of available cells, the project uses 100% of<br />

available <strong>PLL</strong>s (two <strong>PLL</strong>s). Since each <strong>PLL</strong> contains two clock outputs, two clock outputs<br />

remain available (one at each <strong>PLL</strong>) and with some restrictions, they can be used inside the<br />

application area (e. g. in our case for the PC Card interface). Internal usage of <strong>PLL</strong> clock<br />

signals represents no security problem as these signals cannot be externally manipulated.<br />

5 Statistical evaluation of TRNG<br />

There are some well-documented general statistical tests that can be used to look for<br />

deviations from an ideal TRNG [14]. A good TRNG should pass all kinds of tests.<br />

5.1 Results of the NIST tests<br />

Our NIST statistical tests were performed on 1 Gigabit of continuous TRNG output records<br />

(for N = 1)<br />

and followed the testing strategy, general recommendations and result<br />

interpretation described in [14]. We have used a set of 1024 1-Megabit sequences produced<br />

by the generator and we have evaluated the set of P -values at a significance level α = 0.01 .<br />

We have got similar results as in [6] (with the exception of FFT test, see below) so we can<br />

conclude that there are no detectable differences for ensemble of 1024 1-Megabit records.<br />

Results of these tests are not included in the paper due to space limitation.<br />

It was claimed in [6] that some 1-Gbit TRNG records did not pass NIST FFT tests.<br />

During evaluation of proposed new TRNG implementation we have found errors in NIST<br />

FFT test formulation (confirmed also in [15]). After correction of these errors in NIST<br />

package all the FFT tests pass without any problems for all tested (1-Gbit) records. To


prove even further the quality of the generator, we have applied also very strict Frequency<br />

(Monobit) tests for very long records.<br />

5.2 Frequency (Monobit) test of very long record<br />

The most common statistical test of TRNGs is the Frequency (Monobit) test [14]. Good<br />

TRNGs should provide independent binary (Bernoulli) random variables 0 and 1 with the<br />

same probability. For a sequence of independent identically distributed Bernoulli random<br />

variables x( nNK D ) we can define variable Sn = X1+ … + Xn,<br />

where Xn = 2x( nNKD)<br />

−1<br />

are<br />

antipodaly encoded values { − 1,1}<br />

. By the classic De Moivre-Laplace theorem [14], for a<br />

sufficiently large number of trials, the distribution of the normalized binomial sum<br />

sn = Sn<br />

/ n is closely approximated by a standard normal distribution N ( 0,1)<br />

and, roughly<br />

said, Sn < 3 n for almost all n (note that we used extremely long TRNG record in order to<br />

detect also very small deviations).<br />

Results for 74-Gbit record (bottom curve, for decimation factor N = 1)<br />

and 37-Gbit<br />

record (upper curve, for N = 2 ) are shown in Figure 5. Small deviation for N = 1 is<br />

visible. After decimation by factor 2, deviation is within expected values (marked as<br />

parabola with peak point in 0), but it remains visible. This is the only deviation from an<br />

ideal TRNG currently known to us. On the other hand, the bias decreasing using the XOR<br />

decimator is applicable only for independent bits. The fact that we obtained lower bias after<br />

decimation gives us indirect evidence that output bits are independent and very close to the<br />

perfectly random bits.<br />

Difference<br />

0.5<br />

0<br />

-0.5<br />

-1<br />

-1.5<br />

-2<br />

-2.5<br />

x Cummulated 106<br />

1<br />

"1" and "0" Difference<br />

0 1 2 3 4 5 6 7 8<br />

x 10 10<br />

-3<br />

Record Length<br />

Figure 5: Results of frequency tests of very long TRNG output: a) 74 Gbits ( N = 1)<br />

b) 37 Gbits ( N = 2 )<br />

6 Conclusions<br />

In this paper we have described and evaluated a simplified method of true random bitstream<br />

generation inside digital VLSI circuits. The design of the TRNG and the method of<br />

randomness extraction guarantee that the output depends on a physical undeterministic<br />

b)<br />

a)


internal random process. The randomness of the sequence of numbers has been extensively<br />

tested and only extremely small differences from an ideal TRNG have been detected.<br />

The proposed solution is very cheap, it uses a very small number of logic resources and<br />

it is faster than comparable methods. Although the functionality of the proposed solution<br />

has been demonstrated for Altera Apex family, the same principle can be used for all recent<br />

high-performance ASICs or FPLDs that include an on-chip reconfigurable analog <strong>PLL</strong> for<br />

clock synthesis.<br />

In the future we plan a deeper research of the jitter and its measurement, as its statistical<br />

features directly affect the statistical features of the output random sequence.<br />

Acknowledgments<br />

This work has been done in the frame of the project CryptArchi included in the French<br />

national program ACI Cryptologie and Slovak scientific project VEGA 1/1057/04.<br />

References<br />

[1] T. Jennewein, U. Achleitner, G. Weihs, H. Weinfurter, A. Zeilinger, “A fast and compact quantum random<br />

number generator”, Rev. Sci. Inst. 71, 1675-1680 (2000)<br />

[2] J. Walker, “Hotbits: Genuine random numbers, generated by radioactive decay”, online at<br />

www.fourmilab.ch/hotbits<br />

[3] B. Jun, P. Kocher, "The INTEL Random Number Generator", Cryptography Research, Inc., White Paper<br />

prepared for Intel Corporation, April 1999, pp.1-8, www.intel.com<br />

[4] M. Dichtl, “How to Predict the Output of a Hardware Random Number Generator”, In C.D. Walter, C. K.<br />

Koc, Ch. Paar (Eds.): CHES 2003, LNCS 2779, Springer, Berlin, 2003, pp.181-188<br />

[5] K.H. Tsoi, K.H. Leung, P.H. Leong, “Compact FPGA-based True and Pseudo Random Number<br />

Generators”, Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines<br />

(FCCM), California USA, 2003, pp.51-61<br />

[6] V. Fischer, M. Drutarovský, “True Random Number Generator Embedded in Reconfigurable Hardware”,<br />

In B.S.Kaliski Jr. et al. (Eds.): CHES 2002, LNCS 2523, Springer, Berlin, 2003, pp.415-430<br />

[7] J. G. Maneatis, “Selecting <strong>PLL</strong>s for ASIC Applications Requires Tradeoffs”, Planet Analog Magazine<br />

9/2003, www.planetanalog.com<br />

[8] “XpressArray High Density 0.18um Structured ASIC”, WEB page of the AMI Semiconductors Company,<br />

www.amis.com<br />

[9] “Using the ClockLock & ClockBoost <strong>PLL</strong> Features in Apex Devices”, Altera Application Note 115, v.2.3,<br />

May 2002, pp.1-55, www.altera.com<br />

[10] “Using <strong>PLL</strong>s in Stratix Devices”, Altera Application Note 200, v.1.0, February 2002, pp.1-70,<br />

www.altera.com<br />

[11] “Superior Jitter Management with DLLs”, Virtech Tech Topic VTT013 (v.1.2), January 21, 2003, pp.1-6,<br />

www.xilinx.com<br />

[12] “Nios Embedded Processor Development Board”, Altera Data Sheet, v.2.1, April 2002, pp.1-22,<br />

www.altera.com/nios<br />

[13] “Metastability in Altera Devices”, Altera Application Note 42, v.4.0, May 1999, pp.1-10,<br />

www.altera.com<br />

[14] A. Rukhin, J. Soto, J. Nechvatal, M. Smid, E. Barker, S. Leigh, M. Levenson, M. Vangel, D. Banks, A.<br />

Heckert, J. Dray, S. Vo, “A statistical test suite for random and pseudorandom number generators for<br />

cryptographic applications“, NIST Special Publication 800-22, May 15, 2001, pp.1-153,<br />

http://csrc.nist.gov/rng/<br />

[15] S. Kim, K. Umeno, and A. Hasegawa, “Corrections of the NIST Statistical Test Suite for Randomness”,<br />

Cryptology ePrint Archive, Report 2004/018, January 26, 2004, http://eprint.iacr.org/2004/018.pdf

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!