22.04.2014 Views

a590003

a590003

a590003

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Using Homomorphic Encryption for<br />

Large Scale Statistical Analysis<br />

CURIS 2012<br />

David Wu and Jacob Haven<br />

Advised by Professor Dan Boneh<br />

Medical Records<br />

from Hospital A<br />

Medical Records<br />

from Hospital B<br />

Medical Records<br />

from Hospital C<br />

Motivation<br />

Cloud-based solutions have become increasingly popular in the past few<br />

years. An example of the cloud-based model is shown below. Here, three different<br />

hospitals provide data to the cloud. The cloud computing platform then<br />

analyzes and extracts useful information from the data.<br />

Cloud<br />

Computing<br />

Platform<br />

Statistical Analysis<br />

of Medical Data<br />

One of the main concern with cloud computing has been the privacy and confidentiality<br />

of the data. One solution is to send the data encrypted to the cloud.<br />

However, we still need to support useful computations on the encrypted data.<br />

Fully homomorphic encryption (FHE) is a way of supporting such computations<br />

on encrypted data.<br />

We note that while other mechanisms exist for secure computation, they generally<br />

require the different data providers to exchange information. Because<br />

FHE schemes are public key schemes, FHE is much better suited for the scenario<br />

where we have many sources of data.<br />

Our Approach<br />

Due to the significant overhead in homomorphic computation, implementations<br />

of homomorphic encryption schemes for statistical analysis have been<br />

limited to small datasets (≈ 100 data points) and low dimensional data (≈ 2-4<br />

dimensions).<br />

Using recent techniques in batched computation and a different message encoding<br />

scheme, we demonstrate the viability of using leveled homomorphic<br />

encryption to compute on datasets with over a million elements as well as<br />

datasets of much higher dimension.<br />

In particular, we consider two applications of homomorphic encryption: computing<br />

the mean and covariance of multivariate data and performing linear regression<br />

over encrypted datasets.<br />

Computation over Large Integers<br />

To support computation over large amounts of data, we need to be able to<br />

handle large integers (i.e., 128-bit precision). However, it is not computationally<br />

feasible to choose message spaces of this magnitude. To support computations<br />

with at least 128-bit precision, we leverage the Chinese Remainder<br />

Theorem (CRT):<br />

Data<br />

We choose primes<br />

Data<br />

Data<br />

Data<br />

Result<br />

Result<br />

Result<br />

such that<br />

CRT<br />

Result<br />

We perform the computation modulo each prime. Given the results of the<br />

computation with respect to each prime, we apply the CRT to obtain the<br />

value modulo the product of the primes (at least 128-bit precision).<br />

The computations with respect to each prime is completely independent of<br />

the computation with respect to the other primes. As such, all of the computations<br />

are naturally parallelizable.<br />

Homomorphic Encryption Scheme (Server Side)<br />

Our leveled FHE scheme supports three basic operations: addition, multiplication, and Frobenius automorphisms.<br />

Below, we show how we can use these operations to compute the inner product on encrypted data.<br />

5 3 1 8<br />

0 1 1 0<br />

Homomorphic Encryption Scheme (Client Side)<br />

Leveled fully homomorphic encryption (FHE) schemes supports addition and multiplication over ciphertexts.<br />

Such schemes are capable of evaluating boolean circuits with bounded depth (determined by the number of<br />

multiplications) over ciphertexts, and thus, can perform many computations over the ciphertext.<br />

Consider a scenario where Charlie wants to compute the inner product of Alice’s and Bob’s data. Note that covariance<br />

computation and linear regression can be expressed in terms of matrix products, which can be viewed<br />

as a series of inner products.<br />

Alice’s Data<br />

[5, 3, 1, 8]<br />

[0, 1, 1, 0]<br />

Bob’s Data<br />

5 3 1 8<br />

0 1 1 0<br />

0 3 1 0 3 4 1 0 4 4 4 4<br />

4 0 0 0<br />

Element-wise addition of batched ciphertexts.<br />

Element-wise multiplication of batched ciphertexts.<br />

3 1 0 0 1 0 3 4 1 0 0 0<br />

Automorphism operation, which can apply arbitrary permutations to elements in the plaintext slots. Used<br />

here to rotate slots by .<br />

Because the individual plaintext slots are non-interacting, we use a series of automorphisms to rotate the<br />

slots and additions to sum up the entries in a batch<br />

In the last step of the circuit, we zero out the values of the remaining slots. In the case where the number<br />

of slots is not a power of two, this ensures that no additional information about the data is leaked. The<br />

result is stored in the first slot of the final ciphertext.<br />

For security, we must add noise into ciphertexts during the encryption process. Homomorphic operations<br />

on ciphertexts increase this noise. In order to decrypt successfully, the noise must be below a chosen<br />

threshold.<br />

In fully homomorphic computation, multiplication is substantially more expensive (both in terms of runtime<br />

and amount of noise generated) than addition. We can quantify this by defining the depth of a circuit to be<br />

the number of multiplications in the circuit. Evaluating deeper circuits requires larger parameters, and correspondingly,<br />

longer runtimes. A comparison of addition and multiplication runtimes for different circuit<br />

depths is given below:<br />

Depth<br />

pk Charlie<br />

5 3 1 8<br />

0 1 1 0<br />

pk Charlie<br />

Time to Perform<br />

1 Addition (ms)<br />

Cloud<br />

Computing<br />

Platform<br />

Batching Encryption Evaluation Decryption<br />

Time to Perform<br />

1 Multiplication (s)<br />

1 1.94 0.62<br />

2 3.23 2.14<br />

5 10.81 14.11<br />

10 25.44 102.20<br />

20 141.93 771.55<br />

4 0 0 0<br />

sk Charlie<br />

Clear blocks represent plaintext blocks while striped blocks represent ciphertext blocks. The shading in the ciphertext<br />

block represents the amount of noise in the ciphertext (explained below).<br />

Batching: Pack multiple plaintext messages (data elements) into a single ciphertext block. This enables<br />

the server to evaluate a single instruction on multiple data with low overhead.<br />

Encryption: Encrypt the packed plaintext blocks with the FHE public key (Charlie’s public key).<br />

Evaluation: Send the encrypted data to the cloud server for processing.<br />

Decryption: The client (Charlie) decrypts the result using his secret key.<br />

4<br />

Experiments<br />

Below are results from timing tests illustrating performance of linear regression<br />

as well as mean and covariance computation on different datasets. Running<br />

times are relative to one prime in the CRT decomposition.<br />

Time (minutes)<br />

Time (minutes)<br />

Time (minutes)<br />

Time (minutes)<br />

700<br />

600<br />

500<br />

400<br />

300<br />

200<br />

100<br />

0<br />

350<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

0<br />

1200<br />

1000<br />

800<br />

600<br />

400<br />

200<br />

0<br />

450<br />

400<br />

350<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

0<br />

Timing Tests for Linear Regression as a Function of Data Dimension<br />

5.4 7.9 15.6 11.4 14.9 28.0 32.1 41.5 71.6 148.3<br />

1 2 3 4 5<br />

Dimension of Data<br />

(left bar: 4096 data points, middle bar: 65536 data points, right bar: 262144 data points)<br />

Key Generation Batching Encryption Computation Decryption<br />

Timing Tests for Linear Regression as a Function of Dataset Size<br />

10.0 9.6 10.0 10.5 14.8<br />

214.7<br />

199.0<br />

28.2<br />

84.8<br />

549.8<br />

531.9<br />

665.7<br />

305.0<br />

1024 4096 8192 16384 65536 262144 1048576 4194304<br />

Number of Data points<br />

(2-dimensional data)<br />

Key Generation Batching Encryption Computation Decryption<br />

Timing Tests for Mean and Covariance Computation<br />

as a Function of Data Dimension<br />

250.3<br />

444.3<br />

689.8<br />

1010.0<br />

119.1<br />

5.3 10.8 34.0<br />

1 2 4 8 12 16 20 24<br />

Dimension of Data<br />

(4096 data points)<br />

Key Generation Batching Encryption Computation Decryption<br />

Timing Tests for Mean and Covariance Computation<br />

as a Function of Dataset Size<br />

119.1 119.3 121.9 131.7<br />

34.2 34.6 36.6 42.2<br />

66.4<br />

190.8<br />

4096 8192 16384 262144 65536 1048576<br />

Number of Data points<br />

(left bar: 4-dimensional data, right bar: 8-dimensional data)<br />

154.1<br />

403.0<br />

Key Generation Batching Encryption Computation Decryption<br />

Conclusion<br />

We have constructed a scale-invariant leveled fully homomorphic<br />

encryption system.<br />

Using batching and CRT-based message encoding, we are able to perform<br />

large scale statistical analysis on millions of data points and data of<br />

moderate dimension.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!