a590003
a590003
a590003
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Using Homomorphic Encryption for<br />
Large Scale Statistical Analysis<br />
CURIS 2012<br />
David Wu and Jacob Haven<br />
Advised by Professor Dan Boneh<br />
Medical Records<br />
from Hospital A<br />
Medical Records<br />
from Hospital B<br />
Medical Records<br />
from Hospital C<br />
Motivation<br />
Cloud-based solutions have become increasingly popular in the past few<br />
years. An example of the cloud-based model is shown below. Here, three different<br />
hospitals provide data to the cloud. The cloud computing platform then<br />
analyzes and extracts useful information from the data.<br />
Cloud<br />
Computing<br />
Platform<br />
Statistical Analysis<br />
of Medical Data<br />
One of the main concern with cloud computing has been the privacy and confidentiality<br />
of the data. One solution is to send the data encrypted to the cloud.<br />
However, we still need to support useful computations on the encrypted data.<br />
Fully homomorphic encryption (FHE) is a way of supporting such computations<br />
on encrypted data.<br />
We note that while other mechanisms exist for secure computation, they generally<br />
require the different data providers to exchange information. Because<br />
FHE schemes are public key schemes, FHE is much better suited for the scenario<br />
where we have many sources of data.<br />
Our Approach<br />
Due to the significant overhead in homomorphic computation, implementations<br />
of homomorphic encryption schemes for statistical analysis have been<br />
limited to small datasets (≈ 100 data points) and low dimensional data (≈ 2-4<br />
dimensions).<br />
Using recent techniques in batched computation and a different message encoding<br />
scheme, we demonstrate the viability of using leveled homomorphic<br />
encryption to compute on datasets with over a million elements as well as<br />
datasets of much higher dimension.<br />
In particular, we consider two applications of homomorphic encryption: computing<br />
the mean and covariance of multivariate data and performing linear regression<br />
over encrypted datasets.<br />
Computation over Large Integers<br />
To support computation over large amounts of data, we need to be able to<br />
handle large integers (i.e., 128-bit precision). However, it is not computationally<br />
feasible to choose message spaces of this magnitude. To support computations<br />
with at least 128-bit precision, we leverage the Chinese Remainder<br />
Theorem (CRT):<br />
Data<br />
We choose primes<br />
Data<br />
Data<br />
Data<br />
Result<br />
Result<br />
Result<br />
such that<br />
CRT<br />
Result<br />
We perform the computation modulo each prime. Given the results of the<br />
computation with respect to each prime, we apply the CRT to obtain the<br />
value modulo the product of the primes (at least 128-bit precision).<br />
The computations with respect to each prime is completely independent of<br />
the computation with respect to the other primes. As such, all of the computations<br />
are naturally parallelizable.<br />
Homomorphic Encryption Scheme (Server Side)<br />
Our leveled FHE scheme supports three basic operations: addition, multiplication, and Frobenius automorphisms.<br />
Below, we show how we can use these operations to compute the inner product on encrypted data.<br />
5 3 1 8<br />
0 1 1 0<br />
Homomorphic Encryption Scheme (Client Side)<br />
Leveled fully homomorphic encryption (FHE) schemes supports addition and multiplication over ciphertexts.<br />
Such schemes are capable of evaluating boolean circuits with bounded depth (determined by the number of<br />
multiplications) over ciphertexts, and thus, can perform many computations over the ciphertext.<br />
Consider a scenario where Charlie wants to compute the inner product of Alice’s and Bob’s data. Note that covariance<br />
computation and linear regression can be expressed in terms of matrix products, which can be viewed<br />
as a series of inner products.<br />
Alice’s Data<br />
[5, 3, 1, 8]<br />
[0, 1, 1, 0]<br />
Bob’s Data<br />
5 3 1 8<br />
0 1 1 0<br />
0 3 1 0 3 4 1 0 4 4 4 4<br />
4 0 0 0<br />
Element-wise addition of batched ciphertexts.<br />
Element-wise multiplication of batched ciphertexts.<br />
3 1 0 0 1 0 3 4 1 0 0 0<br />
Automorphism operation, which can apply arbitrary permutations to elements in the plaintext slots. Used<br />
here to rotate slots by .<br />
Because the individual plaintext slots are non-interacting, we use a series of automorphisms to rotate the<br />
slots and additions to sum up the entries in a batch<br />
In the last step of the circuit, we zero out the values of the remaining slots. In the case where the number<br />
of slots is not a power of two, this ensures that no additional information about the data is leaked. The<br />
result is stored in the first slot of the final ciphertext.<br />
For security, we must add noise into ciphertexts during the encryption process. Homomorphic operations<br />
on ciphertexts increase this noise. In order to decrypt successfully, the noise must be below a chosen<br />
threshold.<br />
In fully homomorphic computation, multiplication is substantially more expensive (both in terms of runtime<br />
and amount of noise generated) than addition. We can quantify this by defining the depth of a circuit to be<br />
the number of multiplications in the circuit. Evaluating deeper circuits requires larger parameters, and correspondingly,<br />
longer runtimes. A comparison of addition and multiplication runtimes for different circuit<br />
depths is given below:<br />
Depth<br />
pk Charlie<br />
5 3 1 8<br />
0 1 1 0<br />
pk Charlie<br />
Time to Perform<br />
1 Addition (ms)<br />
Cloud<br />
Computing<br />
Platform<br />
Batching Encryption Evaluation Decryption<br />
Time to Perform<br />
1 Multiplication (s)<br />
1 1.94 0.62<br />
2 3.23 2.14<br />
5 10.81 14.11<br />
10 25.44 102.20<br />
20 141.93 771.55<br />
4 0 0 0<br />
sk Charlie<br />
Clear blocks represent plaintext blocks while striped blocks represent ciphertext blocks. The shading in the ciphertext<br />
block represents the amount of noise in the ciphertext (explained below).<br />
Batching: Pack multiple plaintext messages (data elements) into a single ciphertext block. This enables<br />
the server to evaluate a single instruction on multiple data with low overhead.<br />
Encryption: Encrypt the packed plaintext blocks with the FHE public key (Charlie’s public key).<br />
Evaluation: Send the encrypted data to the cloud server for processing.<br />
Decryption: The client (Charlie) decrypts the result using his secret key.<br />
4<br />
Experiments<br />
Below are results from timing tests illustrating performance of linear regression<br />
as well as mean and covariance computation on different datasets. Running<br />
times are relative to one prime in the CRT decomposition.<br />
Time (minutes)<br />
Time (minutes)<br />
Time (minutes)<br />
Time (minutes)<br />
700<br />
600<br />
500<br />
400<br />
300<br />
200<br />
100<br />
0<br />
350<br />
300<br />
250<br />
200<br />
150<br />
100<br />
50<br />
0<br />
1200<br />
1000<br />
800<br />
600<br />
400<br />
200<br />
0<br />
450<br />
400<br />
350<br />
300<br />
250<br />
200<br />
150<br />
100<br />
50<br />
0<br />
Timing Tests for Linear Regression as a Function of Data Dimension<br />
5.4 7.9 15.6 11.4 14.9 28.0 32.1 41.5 71.6 148.3<br />
1 2 3 4 5<br />
Dimension of Data<br />
(left bar: 4096 data points, middle bar: 65536 data points, right bar: 262144 data points)<br />
Key Generation Batching Encryption Computation Decryption<br />
Timing Tests for Linear Regression as a Function of Dataset Size<br />
10.0 9.6 10.0 10.5 14.8<br />
214.7<br />
199.0<br />
28.2<br />
84.8<br />
549.8<br />
531.9<br />
665.7<br />
305.0<br />
1024 4096 8192 16384 65536 262144 1048576 4194304<br />
Number of Data points<br />
(2-dimensional data)<br />
Key Generation Batching Encryption Computation Decryption<br />
Timing Tests for Mean and Covariance Computation<br />
as a Function of Data Dimension<br />
250.3<br />
444.3<br />
689.8<br />
1010.0<br />
119.1<br />
5.3 10.8 34.0<br />
1 2 4 8 12 16 20 24<br />
Dimension of Data<br />
(4096 data points)<br />
Key Generation Batching Encryption Computation Decryption<br />
Timing Tests for Mean and Covariance Computation<br />
as a Function of Dataset Size<br />
119.1 119.3 121.9 131.7<br />
34.2 34.6 36.6 42.2<br />
66.4<br />
190.8<br />
4096 8192 16384 262144 65536 1048576<br />
Number of Data points<br />
(left bar: 4-dimensional data, right bar: 8-dimensional data)<br />
154.1<br />
403.0<br />
Key Generation Batching Encryption Computation Decryption<br />
Conclusion<br />
We have constructed a scale-invariant leveled fully homomorphic<br />
encryption system.<br />
Using batching and CRT-based message encoding, we are able to perform<br />
large scale statistical analysis on millions of data points and data of<br />
moderate dimension.