Privacy-Preserving Multi-Dimensional Range Search over ...
Privacy-Preserving Multi-Dimensional Range Search over ...
Privacy-Preserving Multi-Dimensional Range Search over ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Privacy</strong>-<strong>Preserving</strong> <strong>Multi</strong>-<strong>Dimensional</strong> <strong>Range</strong> <strong>Search</strong><br />
<strong>over</strong> Encrypted Cloud Data within Sublinear Time<br />
Boyang Wang †,†† , Yantian Hou † , Ming Li † , Haitao Wang † and Hui Li ††<br />
† Department of Computer Science, Utah State University, Logan, UT, USA<br />
†† State Key Laboratory of Integrated Service Networks, Xidian University, Xi’an, Shaanxi, China<br />
Abstract—Cloud computing promises users massive scale outsourced<br />
data storage services with much lower costs than traditional<br />
methods. However, privacy concerns compel sensitive data<br />
to be stored in the cloud under an encrypted form. This posts<br />
a great challenge for effectively utilizing the outsourced data,<br />
such as executing common SQL queries. A variety of searchable<br />
encryption techniques have been proposed to solve this issue;<br />
yet efficiency and scalability are still the two main obstacles<br />
for their adoptions in real-world datasets, which are multidimensional<br />
in general. In this paper, we propose the first sublinear<br />
time and privacy-preserving searchable encryption scheme<br />
for multi-dimensional range queries <strong>over</strong> encrypted databases.<br />
By designing and leveraging two novel predicate encryption<br />
primitives, our solution efficiently indexes and searches <strong>over</strong> the<br />
encrypted database using multi-dimensional data structures (e.g.,<br />
kd-trees), thus achieving sublinear time complexity. In addition,<br />
our methodology is also general as different data structures<br />
(e.g., range trees) can be exploited to yield different tradeoffs<br />
among privacy, storage <strong>over</strong>head and search time (as low as<br />
polylogarithmic). Our scheme is provably secure, and through<br />
extensive experimental evaluation on a large-scale real-world<br />
dataset, we show the efficiency and scalability of our scheme.<br />
I. INTRODUCTION<br />
With the adoption of cloud computing, data owners can reap<br />
huge economic benefits by outsourcing their data to the cloud<br />
instead of storing their data locally. Due to the serious privacy<br />
concerns in the cloud, sensitive data, such as financial or<br />
medical records, should be encrypted before being outsourced<br />
to the cloud server [1]. As a result, the cloud server or any<br />
other unauthorized party is not allowed to reveal the content<br />
of outsourced data unless it possesses proper decryption keys.<br />
Unfortunately, the encryption on outsourced data inevitably<br />
introduces new challenges to data utilization. Especially, how<br />
to enable data users to effectively search <strong>over</strong> encrypted data as<br />
they do in the plaintext domain, is one of the most significant<br />
issues needed to be solved in the cloud.<br />
To enable different search functions <strong>over</strong> encrypted data,<br />
many searchable encryption (SE) schemes have been proposed<br />
[2]–[7]. Besides protecting the data content as in traditional<br />
encryption schemes, SE schemes should not only hide the<br />
attributes (e.g. keywords) of encrypted data records, but also<br />
protect the content of search queries. However, most of the existing<br />
SE schemes are only able to handle very simple queries,<br />
such as single keyword search [3] or single-dimensional<br />
range queries [6], which are not able to well support multidimensional<br />
range queries <strong>over</strong> encrypted data. Considering<br />
real-world datasets are often multi-dimensional while common<br />
SQL queries (such as equality and comparison) can all be<br />
flexibly expressed as range queries [8], we believe designing<br />
an efficient and privacy-preserving <strong>Multi</strong>-<strong>Dimensional</strong> <strong>Range</strong><br />
<strong>Search</strong> <strong>over</strong> Encrypted data (MDRSE) will be an important<br />
step to make searchable encryption practical on large datasets<br />
in a real-world cloud setting.<br />
Shi et al. [8] first designed an SE scheme to handle multidimensional<br />
range queries <strong>over</strong> encrypted data, which can<br />
be used in many applications, such as network audit logs<br />
and financial databases [8]. Unfortunately, the search time<br />
of its straightforward application to large datasets is linearly<br />
increasing with regard to the total number of data records.<br />
Since the size of a real-world dataset is generally very large<br />
(e.g. in the order of millions or larger), this scheme is clearly<br />
not practical especially under the cloud setting.<br />
More recently, Lu [9] designed a symmetric key based<br />
single dimensional range search scheme (named LSED) on<br />
encrypted data within logarithmic search time, and mentioned<br />
a direct extension of it (denoted as LSED + ) into the multidimension<br />
by achieving sublinear search time. However, this<br />
direct extension, which decomposes multi-dimensional search<br />
and decryption abilities (search tokens and decryption keys)<br />
into the ones in each single dimension, will result in significant<br />
privacy leaks [9].<br />
Salary<br />
2,000<br />
1,000<br />
G<br />
20 30<br />
Age<br />
G The Granted <strong>Search</strong> and Decryption Ability<br />
<strong>Privacy</strong> Leak on One <strong>Range</strong> Query<br />
(a) <strong>Privacy</strong> leak on one query.<br />
Salary<br />
3,000<br />
2,500<br />
2,000<br />
1,000<br />
G<br />
G<br />
20 30 45 50<br />
Age<br />
G The Granted <strong>Search</strong> and Decryption Abilities<br />
<strong>Privacy</strong> Leak Caused by Collusion<br />
(b) <strong>Privacy</strong> leak caused by collusion.<br />
Fig. 1. Examples of privacy leakage in LSED + [9].<br />
Specifically, given the search and decryption ability of<br />
a two-dimensional range query Q = (age ∈ [20,30]) ∧<br />
(salary ∈ [1,000,2,000]) in LSED + , a data user is able to<br />
not only search but also decrypt on a single dimensional range<br />
query Qx = (age ∈ [20,30]) on X-dimension or Qy =<br />
(salary ∈ [1,000,2,000]) on Y-dimension independently,<br />
which clearly reveals more private information to this data user<br />
than he is allowed to (as described in Fig. 1(a)). The leakage of<br />
these privacy, referred to as single-dimensional query privacy<br />
(search) and single-dimensional confidentiality (decryption)<br />
respectively, will be dramatically aggravated when the number<br />
of dimensions of a range query increases. More<strong>over</strong>, failing to<br />
preserve query privacy or confidentiality in single dimension
TABLE I<br />
COMPARISON AMONG DIFFERENT SOLUTIONS.<br />
Solution Storage Cost <strong>Search</strong> Time Single-Dim. Confidentiality/Query <strong>Privacy</strong> Public Key Based<br />
Straightforward [8] O(n·DlogT) O(n·DlogT) Yes / Yes Yes<br />
LSED + [9] O(n·DlogT) O(n 1−1/D ·Dlog 2 T) No / No No<br />
Ours (KD-Trees) O(n·DlogT) O(n 1−1/D ·DlogT) Yes / Yes Yes<br />
Ours (<strong>Range</strong> Trees) O(nlog D−1 n·DlogT) O(log D−1 n·DlogT) Yes / No Yes<br />
will inevitably introduce collusion among different queries [8].<br />
As shown in Fig. 1(b), the data user is able to search and<br />
decrypt the exact information on a multi-dimensional range<br />
query Q ′ = (age ∈ [20,30]) ∧ (salary ∈ [2,500,3,000])<br />
without the permission of the data owner.<br />
On the other hand, due to the leak of single dimensional<br />
query privacy in LSED + , the cloud server may easily deduce<br />
the content of a multi-dimensional range query with some<br />
advanced background information of outsourced data. For<br />
instance, if the cloud server can decompose the search token of<br />
multi-dimensional range query Q into each single dimension<br />
while some single-dimensional statistics (e.g., the percentage<br />
of people in their twenties within the whole population is<br />
20% or people’s salary between [1,000,2,000] is 40%) are<br />
available, then it is able to easily infer the content of this<br />
range query in each single dimension respectively based on<br />
the respective percentage of matched records in the database.<br />
Therefore, how to design an efficient and privacy-preserving<br />
MDRSE remains open.<br />
In this paper, we endeavor to solve this problem by proposing<br />
a privacy-preserving <strong>Multi</strong>-<strong>Dimensional</strong> <strong>Range</strong> <strong>Search</strong><br />
<strong>over</strong> Encrypted data (MDRSE) within sublinear time in the<br />
cloud. Obviously, with multi-dimensional data structures, such<br />
as kd-trees [10], we can easily achieve sublinear search time in<br />
the plaintext domain with existing methods [11]. However, the<br />
main challenge of utilizing these multi-dimensional data<br />
structures in this paper is how to exactly follow the search<br />
algorithm of a multi-dimensional data structure in the<br />
plaintext domain while all the nodes (including all the data<br />
records) in the structure and range queries are encrypted.<br />
Meanwhile, we also need to preserve single dimensional<br />
confidentiality and single dimensional query privacy. The<br />
main contributions of our paper are:<br />
(1) We first present two new predicate encryptions, named<br />
hyper-rectangle intersection predicate encryption and hyperrectangle<br />
contained predicate encryption respectively, by using<br />
some existing cryptographic primitive [8] in a novel way.<br />
With these predicate encryptions, we can verify the geometry<br />
relationships of two hyper-rectangles, including whether two<br />
hyper-rectangles intersect, and whether one hyper-rectangle is<br />
fully contained in another one.<br />
(2) With the properties of the two proposed predicate<br />
encryptions, we can build a sublinear time MDRSE, which<br />
can exactly follow the search algorithm of a kd-tree in the<br />
plaintext domain while all the nodes in the kd-tree and range<br />
queries are encrypted. More<strong>over</strong>, the security analyses show<br />
that our scheme is selectively secure and privacy-preserving.<br />
(3) With the similar design methodology, we can also<br />
construct another MDRSE based on range trees [12] to achieve<br />
a more efficient (polylogarithmic) search time <strong>over</strong> encrypted<br />
data. As a necessary tradeoff, it requires much more storage<br />
<strong>over</strong>head than the one based on kd-trees. In addition, it<br />
inevitably losses single dimensional query privacy, which is<br />
caused by the structure of a range tree itself. However, single<br />
dimensional confidentiality is still preserved.<br />
(4) Using a large-scale real-world dataset, we evaluate<br />
the performance of our schemes in experiments with up to<br />
millions of data records. Compared to previous solutions, our<br />
schemes show much higher scalability and efficiency <strong>over</strong><br />
large encrypted datasets.<br />
The comparison among different solutions is presented in<br />
Table I, where n denotes the number of data records in a<br />
database, D is the number of dimensions and T is the size<br />
of each dimension. Our schemes are within sublinear search<br />
time with respect to n.<br />
A. Framework for MDRSE<br />
II. PROBLEM DEFINITION<br />
In this paper, we focus on realizing sublinear time and<br />
privacy-preserving <strong>Multi</strong>-<strong>Dimensional</strong> <strong>Range</strong> <strong>Search</strong> <strong>over</strong> Encrypted<br />
Data (MDRSE). To achieve efficient (faster than linear)<br />
search time, a multi-dimensional data structure Γ should<br />
be used. Without loss of generality, we assume the entire range<br />
of the i-th dimension is [1,Ti], where Ti is a power of 2. We<br />
present the definition of MDRSE as below.<br />
Definition 1: (MDRSE). An MDRSE scheme includes the<br />
following six probabilistic polynomial time algorithms.<br />
• Setup(λ, {T1,...,TD}): Given security parameter λ and<br />
upper bound Ti in each dimension, output public key PK<br />
and master key MK.<br />
• Build({M1,...,Mn}, {A1,...,An}): Given all the data<br />
records {M1,...,Mn} and their attributes {A1,...,An},<br />
output data structure Γ.<br />
• Encrypt(PK, Mi, Ai, Γ): Given public key PK, each<br />
data record Mi, its attribute vector Ai = (ai,1,...,ai,D)<br />
and data structure Γ, output an encrypted data record Ci,<br />
and return outsourced data structure Γ ∗ based on all the<br />
encrypted data record {C1,...,Cn}.<br />
• Extract(MK, Q): Given master key MK and multidimensional<br />
range search query Q, output search token<br />
TKQ and decryption key SKQ.<br />
• <strong>Search</strong>(TKQ, Ci, Γ ∗ ): Given search token TKQ, encrypted<br />
data record Ci and outsourced data structure<br />
Γ ∗ , where Ci ←Encrypt(PK, Mi, Ai, Γ), output 1 if<br />
Ai ∈ Q; otherwise, output 0.<br />
• Decrypt(DKQ, Ci): Given decryption key SKQ and<br />
encrypted data record Ci, where Ci ←Encrypt(PK, Mi,<br />
Ai, Γ), output data record Mi if Ai ∈ Q; otherwise, ⊥.<br />
In the cloud scenario which we described in Fig. 2, there<br />
are three entities, including one or multiple data owners, data
Data Owner<br />
1. Request<br />
2. <strong>Search</strong> Token & Decryption Key<br />
Data User<br />
Encrypted Data<br />
3. <strong>Search</strong> Token<br />
4. Matched Encrypted Data<br />
Cloud Server<br />
Fig. 2. The system model includes one or multiple data owners, data users,<br />
and the cloud server.<br />
users and the cloud server. Compared to the case of a single<br />
data owner, an extra trusted party is needed to manage the<br />
master key for the case of multiple owners. Setup is run by<br />
the data owner (or a trusted party in the multi-owner case). The<br />
data owner builds a multi-dimensional data structure Γ with<br />
Build. Then, the owner encrypts every data record Mi based<br />
on its attribute vector Ai = (ai,1,...,ai,n) (an attributeai,j can<br />
be regraded as a value of a field/dimension in the database,<br />
such as ”age=30” or ”salary=2,000”) with public key PK using<br />
Encrypt, and outsources ciphertexts C = (C1,...,Cn) and<br />
outsourced data structure Γ ∗ to the cloud. This outsourced data<br />
structureΓ ∗ has the same structure asΓ, but all the nodes inΓ ∗<br />
are encrypted. The data owner (or a trusted party in the multiowner<br />
case) generates search token TKQ and decryption key<br />
SKQ for a data user based on a multi-dimensional range search<br />
query Q with Extract. In <strong>Search</strong>, a data user submits<br />
search token TKQ as the encrypted version of search query Q<br />
to the cloud server, and the cloud server follows outsourced<br />
data structure Γ ∗ and returns encrypted data record Ci to<br />
this data user, if 1 ←<strong>Search</strong>(TKQ, Ci, Γ ∗ ). After retrieving<br />
all matched records from the cloud server, the data user can<br />
decrypt each one using decryption key DKQ in Decrypt.<br />
In practice, to improve the efficiency of encryption <strong>over</strong><br />
large-size data records (e.g., e-health records), we can use an<br />
extra layer of symmetric key encryption to encrypt each data<br />
record first, and encrypt each symmetric key with MDRSE.<br />
B. Threat Model<br />
In this paper, we assume the data owner herself is a secure<br />
and trusted party. The cloud server is assumed as a semitrusted<br />
party (a.k.a. honest-but-curious party [5], [13]), which<br />
means the data owner believes the cloud server is able to<br />
provide reliable data storage and search services (will return<br />
the correct search results by following the protocols), but may<br />
be curious about not only the content of data outsourced by<br />
the owner but also the content of search queries submitted by<br />
data users. We also assume that data users are curious — a<br />
data user may try to search and decrypt more data than allowed<br />
with his given capability, or even collude with the cloud server<br />
to reveal more privacy information about the owner’s data.<br />
C. Security Definition and Design Objectives<br />
Similar to previous public-key based searchable encryption<br />
schemes, our MDRSE should achieve selective security [14].<br />
The security game between a challenger and an adversary with<br />
selective security can be described as follows.<br />
Definition 2: (MDRSE-selective security). An MDRSE<br />
scheme is selectively secure if all polynomial time adversaries<br />
have at most a negligible advantageǫin the following selective<br />
security game.<br />
• Setup: The challenger runs algorithmSetup(λ, {T1,...,<br />
TD}) to generate public key PK and master key MK. It<br />
keeps master key MK private and gives public key PK to<br />
the adversary.<br />
• Phase 1: The adversary adaptively issues p search token<br />
and decryption key requests to the challenger for<br />
multi-dimensional range queries Q1,...,Qp, and obtains<br />
p search tokens and corresponding p decryption keys.<br />
• Challenge: The adversary submits two equal length data<br />
records M∗ 0 and M∗ 1 , and their corresponding attribute<br />
vectors A∗ 0 and A∗ 1, where A∗ 0 /∈ Qi and A∗ 1 /∈ Qi,<br />
∀i ∈ [1,p]. The challenger flips a coin, the random<br />
result of the coin flipping is b ∈ {0,1}. Then, the<br />
challenger encrypts data record M∗ b by running algorithm<br />
Encrypt(PK, M∗ b , A∗ b , Γ), and returns encrypted data<br />
record C∗ b to the adversary.<br />
• Phase 2: The adversary continues to issue another q<br />
search token and decryption key requests to the challenger<br />
for multi-dimensional range queries Qp+1,...,Qp+q, and<br />
obtains q search tokens and corresponding q decryption<br />
keys, where A∗ 0 /∈ Qi and A∗ 1 /∈ Qi, ∀i ∈ [p+1,p+q].<br />
• Guess: Eventually, the adversary outputs a guess b ′ of b.<br />
An adversary A’s advantage in the above game is defines as<br />
Adv MDRSE−SS<br />
A (λ) = |Pr[b = b ′ ]− 1<br />
| ≤ ǫ. (1)<br />
2<br />
In the above security game, an adversary is allowed to obtain<br />
search tokenTKQ and decryption keySKQ for any search query<br />
Q of his choice, however, it is not able to distinguish encrypted<br />
data record C∗ 0 from encrypted data record C∗ 1 , where none of<br />
search query chosen by this adversary contains attribute vector<br />
A∗ 0 or A∗ 1.<br />
Besides the above security requirement, the main design<br />
objectives of our MDRSE focus on the following three aspects:<br />
• Sublinear <strong>Search</strong> Time. The search with a multidimensional<br />
range search query on the entire encrypted<br />
cloud data should be finished within sublinear time with<br />
respect to n, the total number of the data records.<br />
• Single <strong>Dimensional</strong> Confidentiality. The decryption<br />
ability of a data user on a multi-dimensional range query<br />
should not render him the additional capability to decrypt<br />
on each single dimension independently.<br />
• Single <strong>Dimensional</strong> Query <strong>Privacy</strong>. The search ability<br />
of a data user on a multi-dimensional range query should<br />
not grant him the additional capability to search on each<br />
single dimension independently.<br />
<strong>Preserving</strong> single dimensional confidentiality will resist decryption<br />
collusion among queries. Similarly, protecting single<br />
query privacy is able to prevent search collusion among<br />
queries. In this paper, we do not aim at providing the data<br />
access pattern privacy, which is an orthogonal problem [15].
L2<br />
L8<br />
L5<br />
P4 P9<br />
P5<br />
P1<br />
P2<br />
L4<br />
P3<br />
L1<br />
P7<br />
L9<br />
L7<br />
P10<br />
L6<br />
P6<br />
P8<br />
L3<br />
P1<br />
L8<br />
P2<br />
L1<br />
L2 L3<br />
L4 L5<br />
L6 L7<br />
P3 P4 P5<br />
L9<br />
P6 P7<br />
P8 P9 P10<br />
Nodes are not visited:<br />
L5 L8 P1 P2 P4 P5<br />
Nodes are visited:<br />
L1 L2 L3 L4 L6 L7 P3 P8 P9 P10<br />
Nodes are reported directly:<br />
L9 P6 P7<br />
Points are reported (points lie in the query):<br />
P3 P6 P7<br />
(a) Points {P1,...,P10} in two- (b) The kd-tree based on Points {P1,...,P10}, and the search on it based on a query rectangle R, where<br />
dimension, and a query rectangle R point P3, P6, and P7 are reported<br />
c<strong>over</strong>s point P3, P6, and P7<br />
III. PRELIMINARIES<br />
In this section, we introduce some multi-dimensional data<br />
structures and some definitions in the multi-dimensions.<br />
A. <strong>Multi</strong>-<strong>Dimensional</strong> Data Structures<br />
A kd-tree [10] is a type of multi-dimensional data structure,<br />
which can be used in many applications, such as range<br />
searches and nearest neighbor searches [11]. The basic idea<br />
to build a kd-tree is to split a set of (1-dimensional) points<br />
into two subsets of roughly equal size, one subset contains<br />
the points smaller than or equal to the splitting value, the<br />
other subset contains the points larger than the splitting value.<br />
The splitting value is stored at the root, and the two subsets<br />
are stored recursively in the two subtrees [11]. For the case<br />
of two-dimensional points, the splitting is first performed on<br />
X-dimension, next on Y-dimension, then on X-dimension, and<br />
so on. Further details about how to construct a kd-tree can be<br />
found in [11]. An example of a kd-tree in the two-dimension<br />
is illustrated in Fig. 3. According to the construction, a kd-tree<br />
for a set of n points use O(n) storage and can be constructed<br />
in O(nlogn) time.<br />
L2<br />
L8<br />
L5<br />
P4 P9<br />
P5<br />
P1<br />
P2<br />
L4<br />
P3<br />
L1<br />
P7<br />
L9<br />
L7<br />
P10<br />
L6<br />
P6<br />
P8<br />
L3<br />
Fig. 3. An example of on a kd-tree, and a search query on it.<br />
L9<br />
P6 P7<br />
Fig. 4. Non-leaf node L3 represents two rectangles (the yellow rectangle<br />
and the red rectangle).<br />
As we can see from Fig. 4, a leaf node in a kd-tree<br />
represents a point, and a non-leaf node (a splitting value)<br />
represents two rectangles. Generally, a two-dimensional range<br />
query on a kd-tree can also be described as a rectangle [11].<br />
We observe that whether two rectangles intersect and<br />
whether a rectangle is fully contained in another one are<br />
two significant factors to make decisions at non-leaf nodes<br />
during the search algorithm on a kd-tree. For the general<br />
case in the multi-dimension, we use hyper-rectangles instead<br />
of rectangles. A search example and details of the search<br />
algorithm are presented in Fig. 3 and Algorithm 1. With<br />
L1<br />
L6<br />
P8<br />
L3<br />
P9<br />
L7<br />
P10<br />
kd-trees, a D-dimensional range query takes O(n 1−1/D +k)<br />
(sublinear) time, where k is the number of reported points.<br />
Algorithm 1: <strong>Search</strong>KdTree(v,R)<br />
Input: The root (or a node) v of a kd-tree, and a rectangle R.<br />
Output: All points at leaves below v that lie in R.<br />
1 if v is a leaf then<br />
2 Report the point stored at v if it lies in R.<br />
3 else if v is a non-leaf then<br />
4 if LeftRectangle(v) is fully contained in R then<br />
5 Report all the points in LeftRectangle(v);<br />
6 else if LeftRectangle(v) intersects R then<br />
7 <strong>Search</strong>KdTree(LeftChild(v), R);<br />
8 if RightRectangle(v) is fully contained in R then<br />
9 Report all the points in RightRectangle(v);<br />
10 else if RightRectangle(v) intersects R then<br />
11 <strong>Search</strong>KdTree(RightChild(v), R);<br />
Compared to a kd-tree, a range tree [11] is able to achieve an<br />
even faster search time with O(log D n + k). As a necessary<br />
tradeoff, it requires O(nlog D−1 n) storage, which is much<br />
more than the O(n) storage in a kd-tree. Details description<br />
about range trees can be found in [11].<br />
B. Definitions in the <strong>Multi</strong>-Dimension<br />
Here we introduce some definitions, including lattices,<br />
points and hyper-rectangles.<br />
Definition 3: (Lattice). Let ∆ = (T1,...,TD), where Ti<br />
is the upper bound in the i-th dimension and 1 ≤ i ≤ D.<br />
A lattice L∆ is defined as L∆ = [T1] × ··· × [TD], where<br />
[Ti] = {1,...,Ti}.<br />
Definition 4: (Point). A point X in L∆ is defined as X =<br />
(x1,...,xD), where xi ∈ [Ti], ∀i ∈ D.<br />
Definition 5: (Hyper-Rectangle). A hyper-rectangle HR<br />
in L∆ is defined as HR = (R1,...,RD), where Ri is a range<br />
in the i-th dimension, Ri ∈ [1,Ti], ∀i ∈ D.<br />
IV. MULTI-DIMENSIONAL RANGE SEARCH OVER<br />
ENCRYPTED DATA (MDRSE)<br />
In this section, we will first show how to design a straightforward<br />
MDRSE with some existing cryptographic primitive<br />
[8] but without the help of any data structure. Based on<br />
the properties of the primitive, this straightforward solution
can protect both single dimensional confidentiality and query<br />
privacy, but only with linear search time, which is not practical<br />
for searching on large datasets in the cloud. Then, we will describe<br />
how to build a sublinear time and privacy-preserving<br />
MDRSE with kd-trees and the same cryptographic primitive<br />
[8], which however, is leveraged in a novel way.<br />
A. Straightforward Solution<br />
By directly leveraging a recent primitive, named <strong>Multi</strong>dimensional<br />
<strong>Range</strong> Predicate Encryption (MRPE) [8], we can<br />
first design a straightforward solution of MDRSE. Specifically,<br />
the data owner can directly encrypt each data record using<br />
MRPE without any data structure, and outsource all the<br />
encrypted data records to the cloud server. Then, the cloud<br />
server is able to preform search on all the encrypted records<br />
one by one based on the search token submitted by a data user.<br />
In fact, this is the final solution in [8]. Clearly, the search time<br />
of this straightforward solution is linearly increasing with the<br />
total number of encrypted data records in the cloud server.<br />
Considering the number of data records stored in the cloud<br />
server could be quite large, this straightforward solution is<br />
not practical, especially in the cloud scenario.<br />
The MRPE mentioned above is a special type of encryption<br />
that, if a message is encrypted with a point X, then the<br />
encrypted version of this message can be correctly decrypted<br />
with a decryption key generated by a hyper-rectangle HR,<br />
if X ∈ HR. Details of MRPE are presented in Fig. 5. In<br />
the straightforward solution, each data record Mi can be seen<br />
as a message, its attribute vector Ai = (a1,i,...,aD,i) can be<br />
described as a point X, and a multi-dimensional range search<br />
query Q can be represented as a hyper-rectangle HR.<br />
This MRPE [8] is built on Anonymous Identity-Based<br />
Encryption (AIBE) [16] and binding techniques, which are<br />
able to prevent collusion attacks introduced by the use of<br />
the combination of different secret keys [8]. The ability of<br />
preventing collusion attacks with different secret keys enables<br />
the straightforward solution to preserve single dimensional<br />
confidentiality and query privacy. MRPE also has a predicateonly<br />
version, denoted as MRPE O , where only the flag string<br />
Flag is encrypted, and the final output of decryption is 1,<br />
if X ∈ HR and 0 otherwise. MRPE is selectively secure<br />
if all polynomial time adversaries have at most a negligible<br />
advantage ǫ in the selective security game. The security<br />
of MRPE is based on the Decision Bilinear Diffie-Hellman<br />
Assumption and Decision Linear Assumption [8]. Details of<br />
security analyses can be found in [8]. Actually, MRPE can be<br />
seen as a type of Functional Encryption [17], which is believed<br />
as a new vision for public key cryptography.<br />
B. Our Solution: Challenges and Overview<br />
In order to improve the search efficiency (faster than linear<br />
search time), we would like to exploit a kd-tree to organize<br />
all the data records and handle multi-dimensional range search<br />
queries by following the search algorithm of a kd-tree. Obviously,<br />
it is easy to implement with existing techniques if the<br />
data records are in plaintext. However, it is a challenging task<br />
to exactly follow the search algorithm of a kd-tree when all<br />
the nodes in the kd-tree and range queries are encrypted. More<br />
Definition 6: (MRPE). A <strong>Multi</strong>-<strong>Dimensional</strong> <strong>Range</strong> Predicate<br />
Encryption (MRPE) scheme should consist of the following four<br />
probabilistic polynomial time algorithms.<br />
Setup (λ, L∆): Given security parameter λ and lattice L∆ =<br />
[T1]×···×[TD], where [Ti] = {1,...,Ti}, output public key<br />
PK and master key MK.<br />
Encrypt (PK, X, M, Flag): Given public key PK, point X ∈<br />
L∆, message M and flag string Flag ∈ {0,1} α , where Flag<br />
is a constant string with α bits, output ciphertext C, where C<br />
is the encryption of Flag||M, || denotes the concatenation of<br />
the flag string and a message.<br />
DeriveKey (MK, HR): Given master key MK and hyperrectangle<br />
HR, where HR ∈ L∆, output secret key SKHR.<br />
Decrypt (PK, SKHR, C): Given public key PK, secret key<br />
SKHR and ciphertext C, output M if the first α bits of<br />
the decryption of C is Flag (X ∈ HR), and ⊥ otherwise<br />
(X /∈ HR). The above algorithms must satisfy the following<br />
constraints:<br />
Decrypt(PK,SKHR,C) =<br />
Flag||M, if X ∈ HR<br />
⊥, w.h.p., if X /∈ HR<br />
where C =Encrypt(PK,X,M,Flag)<br />
SKHR =DeriveKey(MK,HR)<br />
If X /∈ HR, the probability that the decryption result shows ⊥<br />
is 1−2 −α .<br />
Fig. 5. Details of MRPE.<br />
specifically, according to what we observed in Section III, we<br />
need to find methods to decide whether two hyper-rectangles<br />
intersect; and whether one hyper-rectangle is fully contained<br />
in another one based only on encrypted data.<br />
To the best of our knowledge, there is no Functional<br />
Encryption (also denoted as Predicate Encryption) from the<br />
current literature is able to be directly used to verify the<br />
two geometry relationships of two hyper-rectangles that we<br />
mentioned above while still preserving privacy in each single<br />
dimension. Interestingly, by utilizing MRPE [8] in a novel<br />
way, we can find solutions to decide these two geometry<br />
relationships of two hyper-rectangles on encrypted data. More<br />
importantly, we can also protect privacy and confidentiality in<br />
each single dimension.<br />
In fact, by doing so, we can actually build two new Predicate<br />
Encryptions based on MRPE. The first one is hyper-rectangle<br />
intersection predicate encryption, where if a message is<br />
encrypted with a hyper-rectangle HR and it can only be<br />
decrypted with a decryption key generated by another hyper-<br />
rectangle HR ′ , if HR ∩ HR ′<br />
= ∅. The second one is<br />
hyper-rectangle contained predicate encryption, where if<br />
a message is encrypted with a hyper-rectangle HR and it can<br />
only be decrypted with a decryption key generated by another<br />
hyper-rectangle HR ′ , if HR ⊆ HR ′ .<br />
C. Hyper-Rectangle Intersection Predicate Encryption<br />
We now explain how to build a hyper-rectangle intersection<br />
predicate encryption based on MRPE. For the ease of<br />
description, we start with two ranges in a single dimension.<br />
Given two ranges R and R ′ in a single dimension, we treat<br />
the first range R = [xl,xu] as a point in a two-dimension, and
transform the second range R ′ = [x ′ l ,x′ u] into a rectangle in a<br />
two-dimension, and then decide whether the two-dimensional<br />
point based on the first range lies in the two-dimensional<br />
rectangle generated from the second range. If the point is<br />
indeed in the rectangle, then the two ranges intersect with<br />
each other. Otherwise, they do not intersect. Examples of range<br />
intersection are described in Fig. 6. Based on this basic idea,<br />
we have the following lemma<br />
x1,l x1,u<br />
1<br />
x2,l x2,u x3,l x3,u<br />
x ′ l<br />
x ′ u<br />
x4,l x4,u<br />
<strong>Range</strong>s intersect with R ′ = [x ′ l ,x′ u]<br />
<strong>Range</strong>s do not intersect with R ′ = [x ′ l ,x′ u]<br />
Fig. 6. Examples of whether two ranges intersect.<br />
Lemma 1: For two ranges R = [xl,xu] and R ′ = [x ′ l ,x′ u],<br />
we have a point X = (x1,x2) = (xl,xu) based on range R<br />
and a rectangle HR ′ = (R ′ 1,R ′ 2) based on range R ′ , where<br />
R ′ 1 = [1,x ′ u] and R ′ 2 = [x ′ l ,T]. If X ∈ HR′ , then R∩R ′ =<br />
∅; otherwise, X /∈ HR ′ , then R∩R ′ = ∅.<br />
The correctness of this lemma can be explained as below<br />
R∩R ′ = ∅ ⇔<br />
xl ≤ x ′ u<br />
xu ≥ x ′ l<br />
⇔<br />
xl ∈ [1,x ′ u]<br />
xu ∈ [x ′ l ,T] ⇔ X ∈ HR′ .<br />
By extending Lemma 1, we can decide whether the intersection<br />
of two hyper-rectangles is null.<br />
Lemma 2: For two hyper-rectangles HR = (R1,...,RD)<br />
and HR ′ = (R ′ 1,...,R ′ D ), where Ri = [xi,l,xi,u] and R ′ i =<br />
[x ′ i,l ,x′ i,u ], for i ∈ [D], we have a point X = (x1,...,x2D) =<br />
(x1,l,x1,u,...,xD,l,xD,u) and a hyper-rectangle HR ′′ =<br />
(R ′′<br />
1,...,R ′′<br />
2D ), where R′′ 2i−1 = [1,x′ i,u ] and R′′ 2i = [x′ i,l ,T],<br />
for i ∈ [D]. If X ∈ HR ′′ , then HR∩HR ′ = ∅; otherwise,<br />
X /∈ HR ′′ , then HR∩HR ′ = ∅.<br />
Then, we can encrypt a message with this 2D-dimensional<br />
point (transformed from HR) using MRPE, and this message<br />
can only be decrypted with a decryption key generated by a<br />
2D-dimensional hyper-rectangle (transformed from HR ′ ), if<br />
HR ∩ HR ′ = ∅. Clearly, this hyper-rectangle intersection<br />
predicate encryption is selectively secure if MRPE is selectively<br />
secure.<br />
D. Hyper-Rectangle Contained Predicate Encryption<br />
Similar as the method in the last subsection, we can decide<br />
whether a hyper-rectangle is fully contained in another one.<br />
For the ease of understanding, we still start with two ranges<br />
in the single dimension. Given two ranges R and R ′ , we<br />
transform the first range as a two-dimensional point, and<br />
transform the second one as a two-dimensional rectangle. If<br />
the point is in the rectangle, then range R is fully contained<br />
in range R ′ . Compared to the method of deciding whether<br />
two ranges intersect, the second range R ′ in this method is<br />
transformed in a different way, which is illustrated in the<br />
following lemma and Fig. 7.<br />
T<br />
x1,l<br />
x ′ l<br />
x1,u<br />
x2,lx2,u x3,lx3,u<br />
1 T<br />
x ′ u<br />
x4,l x4,u<br />
<strong>Range</strong>s are fully contained in R ′ = [x ′ l ,x′ u]<br />
<strong>Range</strong>s are not fully contained in R ′ = [x ′ l ,x′ u]<br />
Fig. 7. Examples of whether a range is fully contained in another one.<br />
Lemma 3: For two ranges R = [xl,xu] and R ′ = [x ′ l ,x′ u],<br />
we have a point X = (x1,x2) = (xl,xu) based on range R<br />
and a rectangle HR ′ = (R ′ 1,R ′ 2) based on range R ′ , where<br />
R ′ 1 = [x ′ l ,x′ u] and R ′ 2 = [x ′ l ,x′ u]. If X ∈ HR ′ , then R ⊆ R ′ ;<br />
otherwise, X /∈ HR ′ , then R R ′ .<br />
The correctness of this lemma can be proved as below<br />
R ⊆ R ′ ⇔<br />
xl ∈ [x ′ l ,x′ u]<br />
xu ∈ [x ′ l ,x′ u] ⇔ X ∈ HR′ .<br />
Similarly, we can also extend the above lemma into the multidimensional<br />
version.<br />
Lemma 4: For two hyper-rectangles HR = (R1,...,RD)<br />
and HR ′ = (R ′ 1,...,R ′ D ), where Ri = [xi,l,xi,u] and R ′ i =<br />
[x ′ i,l ,x′ i,u ], for i ∈ [D], we have a point X = (x1,...,x2D) =<br />
(x1,l,x1,u,...,xD,l,xD,u) and a hyper-rectangle HR ′′ =<br />
(R ′′<br />
1,...,R ′′<br />
2D ), where R′′ 2i−1 = [x′ i,l ,x′ i,u<br />
] and R′′<br />
2i =<br />
[x ′ i,l ,x′ i,u ], for i ∈ [D]. If X ∈ HR′′ , then HR ⊆ HR ′ ;<br />
otherwise, X /∈ HR ′′ , then HR HR ′ .<br />
Then, we can encrypt a message with this 2D-dimensional<br />
point (transformed from HR) using MRPE, and this message<br />
can only be decrypted with a decryption key generated by<br />
a 2D-dimensional hyper-rectangle (transformed from HR ′ ),<br />
if HR ⊆ HR ′ . Similarly, this hyper-rectangle contained<br />
predicate encryption is selectively secure as long as MRPE<br />
is selectively secure.<br />
E. Construction of Our Scheme<br />
With the two novel predicate encryptions above, we can<br />
build a sublinear time and privacy-preserving MDRSE by<br />
following the framework in Section II. Details of our scheme<br />
based on kd-trees are illustrated in Fig. 8.<br />
Specifically, the data owner first generates the public key<br />
and master key in Setup, and builds a kd-tree Γ based<br />
all the data records and attribute vectors in Build. Then,<br />
with Encrypt, the data owner encrypts every node in Γ,<br />
and outputs outsourced kd-tree Γ ∗ , which has exactly the<br />
same structure as Γ. A non-leaf node is encrypted with<br />
the combination of hyper-rectangle contained predicate encryption<br />
(e.g., {C3,L,C3,R}) and hyper-rectangle intersection<br />
predicate encryption (e.g., {C4,L,C4,R}), and returned as<br />
C = {C3,L,C3,R,C4,L,C4,R}. Note that only predicate-only<br />
version of the two encryptions are required for a non-leaf node,<br />
because a non-leaf node does not store any data records. A<br />
leaf node is encrypted with the combination of symmetric key<br />
encryption, MRPE and its predicate-only version, and returned<br />
as Ci = {Ci,0,Ci,1,Ci,2}.
A data user is given the search token and decryption key<br />
from the data owner based on the master key and search<br />
query inExtract, and submits the search token to the cloud<br />
server to operate <strong>Search</strong>. At a non-leaf node, if hL = 1<br />
(the left hyper-rectangle is fully contained in the query hyperrectangle),<br />
then the cloud server reports all the nodes in the left<br />
subtree; else if gL = 1 (the two hyper-rectangle intersect), the<br />
cloud server continues to search in the left subtree; otherwise,<br />
the cloud server stops to search the left subtree; Meanwhile,<br />
the cloud server also computes hR and gR, and make decisions<br />
for the right subtree with the same rule as described in Fig. 8.<br />
At a leaf node, if f = 1 (a point lies in the query hyperrectangle),<br />
the cloud server returns the encrypted data record<br />
on this leaf node to the data user.<br />
Compared to the search algorithm (described in Section III)<br />
of a kd-tree in the plaintext domain, our search algorithm <strong>over</strong><br />
encrypted data exactly follows the same search rules not only<br />
at non-leaf nodes but also at leaf nodes, which enables our<br />
scheme to achieve sublinear search time. Finally, the data user<br />
is able to reveal a data record Mi with a two-layer decryption<br />
if its attribute vector Ai ∈ Q in Decrypt. According to the<br />
correctness of decryption, even a data user colludes with the<br />
cloud server, and obtains more encrypted data records than he<br />
is expected, he cannot reveal those data record, whose Ai /∈ Q.<br />
F. Security and <strong>Privacy</strong> Analysis<br />
We now analyze the security and privacy of our scheme.<br />
Theorem 1: (Our Scheme-selective security). Our proposed<br />
scheme is selectively secure, as long as MRPE is<br />
selectively secure.<br />
Proof: The security of our scheme includes two aspects:<br />
the security on non-leaf nodes and security on leaf nodes.<br />
In our scheme, each non-leaf node can be represented<br />
as two hyper-rectangles, and the two hyperrectangle<br />
are encrypted with MRPEO separately. Without<br />
loss of generality, for two different encrypted values<br />
C∗ 1 = {C∗ 1,3,L ,C∗ 1,3,R ,C∗ 1,4,L ,C∗ 1,4,R } and C∗ 2 =<br />
{C∗ 2,3,L ,C∗ 2,3,R ,C∗ 2,4,L ,C∗ 2,4,R } on two different non-leaf<br />
nodes returned from the selective security game defined in<br />
Section II, if the advantage of an adversaryAon distinguishing<br />
two different ciphertexts C∗ 1,3,L and C∗ 2,3,L (or C∗ 1,3,R and<br />
C∗ 2,3,R ) in the MRPEO selectively secure game is ǫ, then<br />
the advantage of this adversary A on distinguishing the two<br />
different encrypted values C∗ 1 and C∗ 2 on two different nonleaf<br />
nodes is4ǫ−6ǫ2 +4ǫ3−ǫ4 .The advantage of this adversary<br />
A on non-leaf nodes in our scheme’s selective security game<br />
is defined as<br />
Adv OurScheme−SS<br />
A−Non−Leaf (λ) = |Pr[b = b′ ]− 1<br />
2 |<br />
≤ 4ǫ−6ǫ 2 +4ǫ 3 −ǫ 4 .<br />
which is negligible.<br />
Each leaf node is encrypted with one round MRPE and<br />
one round MRPE O . Without loss of generality, for two different<br />
encrypted values C ∗ 1 = {C ∗ 1,0,C ∗ 1,1,,C ∗ 1,2} and C ∗ 2 =<br />
{C ∗ 2,0,C ∗ 2,1,C ∗ 2,2} on two different leaf nodes (data records)<br />
returned from the selective security game defined in Section<br />
II, if the advantage of an adversary A on distinguishing<br />
two different ciphertexts C ∗ 1,1 and C ∗ 2,1 in the MRPE selectively<br />
secure game is ǫ, and the advantage on distinguishing<br />
two different ciphertexts C ∗ 1,2 and C ∗ 2,2 in the corresponding<br />
MRPE O selectively secure game is ǫ, then the advantage of<br />
this adversary A on distinguishing two different encrypted<br />
values C ∗ 1 and C ∗ 2 on two different leaf nodes (data records)<br />
is 2ǫ−ǫ 2 .<br />
The advantage of this adversary A on leaf nodes in our<br />
scheme’s selective security game is defined as<br />
Adv OurScheme−SS<br />
A−Leaf (λ) = |Pr[b = b ′ ]− 1<br />
2 | ≤ 2ǫ−ǫ2 . (2)<br />
which is negligible.<br />
Note that since the length of ciphertexts on a non-leaf<br />
node and a leaf node are different, an adversary is able<br />
to distinguish a non-leaf node from a leaf node based on<br />
ciphertexts. However, distinguishing whether a node is a leaf<br />
node or non-leaf node, the adversary will not gain more<br />
more advantage in the selective security game. Therefore, our<br />
proposed scheme is selectively secure, as long as MRPE is<br />
selectively secure.<br />
Theorem 2: (Single <strong>Dimensional</strong> Confidentiality). If our<br />
scheme is selectively secure, then it preserves single dimensional<br />
confidentiality.<br />
Proof: We first assume an adversary is able to decrypt<br />
data records on a single-dimensional range search query independently<br />
based on a decryption key on a multi-dimensional<br />
range search query, which means our scheme would fail to<br />
achieve multi-dimensional confidentiality. Then we will show<br />
that this assumption would conflict the fact that our scheme<br />
is selectively secure.<br />
More specifically, given a multi-dimensional range search<br />
query Q = [x1,l,x1,u] ∧ [x2,l,x2,u] ∧ ···∧[xD,l,xD,u], the<br />
data owner generates a decryption key SK SKQ on this query Q<br />
for the adversary. This search query Q can be represented as a<br />
hyper-rectangle HRQ = {R1,R2,...,RD}, Ri = [xi,l,xi,u],<br />
for 1 ≤ i ≤ D.<br />
We first assume the adversary is able to decrypt with SK SKQ<br />
on a single-dimensional range search query Q ′ = [x1,l,x1,u],<br />
which can be represented as HRQ ′ = {R1,R ′ 2,...,R ′ D },<br />
R1 = [x1,l,x1,u] and R ′ i = [1,Ti], for 2 ≤ i ≤ D.<br />
Then, for two ciphertexts C ∗ 1 and C ∗ 2 , where the two points<br />
X ∗ 1,X ∗ 2 ∈ HRQ ′ but X∗ 1,X ∗ 2 /∈ HRQ, this adversary is<br />
able to decrypt C ∗ 1 and C ∗ 2 , which means it can distinguish<br />
the corresponding messages M ∗ 1 and M ∗ 2 . It also indicates<br />
that, for the two points X ∗ 1,X ∗ 2 /∈ HRQ, this adversary is<br />
able to distinguish M∗ 1 and M∗ 2 based on the decryption key<br />
SK SKQ, which means MRPE is not selectively secure. Clearly, it<br />
would conflict the fact that our scheme is selectively secure.<br />
Therefore, as long as our scheme is selectively secure, then our<br />
scheme is able to provide multi-dimensional confidentiality.<br />
Theorem 3: (Single <strong>Dimensional</strong> Query <strong>Privacy</strong>). If all<br />
the data records in our scheme are indexed with a kd-tree<br />
and our scheme is selectively secure, then it achieves single<br />
dimensional query privacy.
Setup(λ, {T1,...,TD}): Given security parameter λ and upper<br />
bound Ti in each dimension, build a lattice L∆ = [T1]×···×[TD]<br />
and a lattice L2∆ = [T1]×[T1]×···×[TD]×[TD]; compute<br />
(PK1,MK1) ←MRPE.Setup(λ,L∆),<br />
(PK2,MK2) ←MRPE O .Setup(λ,L∆),<br />
(PK3,MK3) ←MRPE O .Setup(λ,L2∆),<br />
(PK4,MK4) ←MRPE O .Setup(λ,L2∆);<br />
and output public key PK = {PK1,PK2,PK3,PK4} and master key<br />
MK = {MK1,MK2,MK3,MK4}.<br />
Build({M1,...,Mn}, {A1,...,An}): Given all the data records<br />
{M1,...,Mn} and their attributes {A1,...,An}, output kd-tree Γ.<br />
Encrypt(PK PK PK, {M1,...,Mn}, Γ): Given public key PK =<br />
{PK1,PK2,PK3,PK4}, all the data records {M1,...,Mn} and kdtree<br />
Γ, encrypt each node in kd-tree Γ.<br />
• If it is a non-leaf node, let HRL and HRR be the two hyperrectangles<br />
of this node; transform HRL = (R1,..,RD),<br />
where Ri = [xi,l,xi,u], into a point XL = (x1,...,x2D) =<br />
(x1,l,x1,u,...,xD,l,xD,u), transform HRR into a point XR<br />
in the same way; compute<br />
C3,L ←MRPE O .Encrypt(PK3,XL,Flag),<br />
C3,R ←MRPE O .Encrypt(PK3,XR,Flag),<br />
C4,L ←MRPE O .Encrypt(PK4,XL,Flag),<br />
C4,R ←MRPE O .Encrypt(PK4,XR,Flag),<br />
and output C = {C3,L,C3,R,C4,L,C4,R}.<br />
• If it is a leaf node, it is a data record Mi, i ∈ [1,n], and<br />
its attribute vector is Ai = (ai,1,...,ai,D); set a point X<br />
as X = (x1,...,xD) = (ai,1,...,ai,D), generate a random<br />
symmetric key Ki, then encrypt as<br />
Ci,0 ←Enc(Ki,Mi),<br />
Ci,1 ←MRPE.Encrypt(PK1,X,Ki,Flag),<br />
Ci,2 ←MRPE O .Encrypt(PK2,X,Flag),<br />
and output Ci = {Ci,0,Ci,1,Ci,2}.<br />
Output outsourced kd-tree Γ ∗ after encrypting all the nodes.<br />
Extract(MK MK MK,Q): Given master key MK = {MK1,MK2,MK3,MK4}<br />
and a multi-dimensional range search query Q = [x1,l,x1,u] ∧<br />
Proof: <strong>Preserving</strong> single dimensional query privacy in<br />
an MDRSE includes two aspects. First, the leveraged multidimensional<br />
data structure itself will not reveal additional<br />
query privacy in single-dimensions. Second, a data user (or<br />
the cloud server) is not able to decompose the search token of<br />
a multi-dimensional range query into the ones in each single<br />
dimension.<br />
For the first aspect, because all the data records in our<br />
scheme are indexed based on the structure of a kd-tree, where<br />
the search decision at each node depends on the ranges in all<br />
the dimensions, therefore, an adversary is not able to reveal<br />
single dimensional query privacy from the data structure of a<br />
kd-tree.<br />
Meanwhile, we can also prove that if an adversary is able<br />
to decompose the search token of a multi-dimensional range<br />
query into each single dimension, then it will conflict the fact<br />
that our scheme is selectively secure.<br />
More specifically, given a multi-dimensional range search<br />
query Q = [x1,l,x1,u] ∧ [x2,l,x2,u] ∧ ···∧[xD,l,xD,u], the<br />
Fig. 8. Details of our scheme based on kd-trees.<br />
··· ∧ [xD,l,xD,u], transform Q into a hyper-rectangle HR =<br />
(R1,...,RD), where Ri = [xi,l,xi,u] and 1 ≤ i ≤ D; generate<br />
a hyper-rectangle HR ′ = (R ′ 1,...,R ′ 2D), where R ′ 2i−1 =<br />
[xi,l,xi,u] and R ′ 2i = [xi,l,xi,u], for 1 ≤ i ≤ D, and a hyperrectangle<br />
HR ′′ = (R ′′<br />
1,...,R ′′<br />
2D), where R ′′<br />
2i−1 = [1,xi,u] and<br />
R ′′<br />
2i = [xi,l,Ti], for 1 ≤ i ≤ D; compute<br />
SK1 ←MRPE.DeriveKey(MK1,HR),<br />
TK2 ←MRPE O .DeriveKey(MK2,HR),<br />
TK3 ←MRPE O .DeriveKey(MK3,HR ′ ),<br />
TK4 ←MRPE O .DeriveKey(MK4,HR ′′ );<br />
and return search token TK TKQ = {TK2,TK3,TK4} and decryption<br />
key SK SKQ = SK1.<br />
<strong>Search</strong>(TK TK TKQ, Γ ∗ ): Given search token TK TKQ = {TK2,TK3,TK4}<br />
and outsourced kd-tree Γ ∗ , search from the root node of Γ ∗ and<br />
perform as follows:<br />
• If it is a non-leaf node C = {C3,L,C3,R,C4,L,C4,R}, compute<br />
hL ← MRPE O .Decrypt(PK3,TK3,C3,L), if hL = 1,<br />
report all the nodes in the left subtree; otherwise compute<br />
gL ← MRPE O .Decrypt(PK4,TK4,C4,L), if gL = 1, continue<br />
to search the left subtree; otherwise stop to search the<br />
left subtree;<br />
compute hR ←MRPE O .Decrypt(PK3,TK3,C3,R), if hR =<br />
1, report all the nodes in the right subtree; otherwise compute<br />
gR ← MRPE O .Decrypt(PK4,TK4,C4,R), if gR = 1,<br />
continue to search the right subtree; otherwise stop to search<br />
the right subtree;<br />
• If it is a leaf node Ci = {Ci,0,Ci,1,Ci,2}, compute<br />
f ← MRPE O .Decrypt(PK2,TK2,Ci,2), if f = 1, return<br />
Ci; otherwise, do not return Ci.<br />
Decrypt(SK SK SKQ,Ci): Given decryption keySK SK SKQ and encrypted data<br />
record Ci = {Ci,0,Ci,1,Ci,2}, compute<br />
Ki ←MRPE.Decrypt(PK1,SKHR,Ci,1),<br />
Mi ← Dec(Ki,Ci,0) = Dec(Ki,Enc(Ki,Mi)),<br />
and output data record Mi, if attribute set Ai ∈ Q; and ⊥<br />
otherwise.<br />
data owner generates a search token TK TKQ on this query Q for<br />
the adversary. This search query Q can be represented as a<br />
hyper-rectangle HRQ = {R1,R2,...,RD}, Ri = [xi,l,xi,u],<br />
for 1 ≤ i ≤ D.<br />
We first assume the adversary is able to search with TK TKQ<br />
on a single-dimensional range search query Q ′ = [x1,l,x1,u],<br />
which can be represented as HRQ ′ = {R1,R ′ 2,...,R ′ D },<br />
R1 = [x1,l,x1,u] and R ′ i = [1,Ti], for 2 ≤ i ≤ D.<br />
Then, for two ciphertexts C ∗ 1 and C ∗ 2 , where the two points<br />
X ∗ 1,X ∗ 2 ∈ HRQ ′ but X∗ 1,X ∗ 2 /∈ HRQ, this adversary is<br />
able to decrypt C ∗ 1 and C ∗ 2 , which means it can distinguish<br />
the corresponding messages M ∗ 1 and M ∗ 2 . It also indicates<br />
that, for the two points X ∗ 1,X ∗ 2 /∈ HRQ, this adversary is<br />
able to distinguish M∗ 1 and M∗ 2 based on the search token<br />
TK TKQ, which means MRPE is not selectively secure. Clearly, it<br />
would conflict the fact that our scheme is selectively secure.<br />
Therefore, as long as our scheme is selectively secure, an<br />
adversary is not able to decompose the search token of a multidimensional<br />
range query into each single dimension.
V. MULTI-DIMENSIONAL RANGE SEARCH OVER<br />
ENCRYPTED DATA (MDRSE) ON RANGE TREES<br />
A. Overview<br />
Generally, different multi-dimensional data structures in the<br />
plaintext domain can be used to achieve different tradeoffs<br />
between search time and storage <strong>over</strong>head. Ideally, we would<br />
like to have the capability to make the same tradeoff <strong>over</strong><br />
encrypted data. In fact, we can also use the similar methodology<br />
in the last section to design another MDRSE with an<br />
even faster (polylogarithmic) search time based on range trees<br />
[12]. According to the tradeoff, which we shown in Section<br />
III, the scheme based on a range tree requires much more<br />
storage <strong>over</strong>head than the one based on a kd-tree.<br />
Specifically, we can exactly follow the search algorithm of<br />
a range tree in plaintext when all the nodes in a range tree are<br />
encrypted with the utilization of hyper-rectangle intersection<br />
predicate encryption and another novel predicate encryption,<br />
named hyper-rectangle containing predicate encryption.<br />
With the hyper-rectangle containing predicate encryption, if<br />
a message is encrypted with a hyper-rectangle HR, then it<br />
can be only decrypted with a decryption key generated by a<br />
hyper-rectangle HR ′ , if HR ⊇ HR ′ . Note that this hyperrectangle<br />
containing predicate encryption is different from the<br />
hyper-rectangle contained predicate encryption designed in the<br />
previous section.<br />
However, because the search algorithm of a range tree<br />
itself can be operated on each single dimension independently,<br />
this scheme based on range trees is not able to achieve<br />
single dimensional query privacy. For example, given a multidimensional<br />
range query Q = [x1,x2] ∧ [y1,y2], the search<br />
results of it after X-dimension is exactly the same as the search<br />
result of the single dimensional range query Qx = [x1,x2], no<br />
matter all the nodes in the range tree are encrypted or not. Note<br />
that single dimensional confidentiality in the range tree-based<br />
scheme is still preserved.<br />
Actually, we also believe our design methodology can be<br />
leveraged into other multi-dimensional data structures, such as<br />
R-trees [18] and Quadtrees [19], whose search algorithm also<br />
depends on the geometry relationships of hyper-rectangles.<br />
B. Introduction to <strong>Range</strong> Trees<br />
Before describing our MDRSE based on range trees, we first<br />
briefly introduce how to search on a range tree. Without loss of<br />
generality, we present the following examples and algorithms<br />
with the two-dimension, which can be easily extended into the<br />
multi-dimensional case. An example of a range tree is listed<br />
in Fig. 9. In a 2-dimensional range tree, all the nodes are<br />
first organized in x-dimension as a balanced binary tree. Then<br />
for each non-leaf node, it also has a second level tree (still<br />
a balanced binary tree), where the nodes under this non-leaf<br />
node are organized based on y-dimension.<br />
The essential idea to search on a range tree is to search<br />
each single dimension one by one. Specifically, given range<br />
query Q = [x1,x2]∧[y1,y2], we first concentrate on finding<br />
the points whose x-dimension lies in [x1,x2], and then we<br />
report those points whose y-dimension also lies in [y1,y2]. As<br />
a result, for the search on each single dimension (x-dimension<br />
or y-dimension), it is a 1-dimensional range query, which is<br />
described as follows.<br />
Given range query Qx = [x1,x2], we search with [x1,x2] in<br />
the x-dimension until we get to a node vsplit where the search<br />
paths split. From the left child of vsplit, we continue the search<br />
with x1, and at every node v where the search path of x goes<br />
left, we report all points in the right subtree of v. Similarly,<br />
we continue the search with x2 at the right child of vsplit, and<br />
at every node v where the search path of x ′ goes right, we<br />
report all points in the left subtree of v. Finally, we check the<br />
leaves µ and µ ′ where the two paths end to see if they contain<br />
a point in the range. The search process is also illustrated in<br />
Fig.9(c).<br />
Based on the range search algorithm on 1-dimension, the<br />
search algorithm of a 2-dimensional range tree is presented in<br />
Algorithm 2. Further details about range trees can be found<br />
in [11].<br />
Algorithm 2: 2D<strong>Range</strong>Query(v, Q)<br />
Input: A 2D range tree Γ and a query Q = [x1,x2]∧[y1,y2].<br />
Output: All points in Γ that lie in Q.<br />
1 vsplit ← FindSplitNode(Γ,[x1,x2]);<br />
2 if vsplit is a leaf then<br />
3 Report it if the point stored at vsplit lies in Q.<br />
4 else<br />
5 v ← LeftChild(vsplit);<br />
6 while v is not a leaf do<br />
7 if x1 ≤ xv then<br />
8 1D<strong>Range</strong>Query(Γassoc(RightChild(v)),[y1,y2]);<br />
9 v ← LeftChild(v);<br />
10 else<br />
11 v ← RightChild(v);<br />
12 Report the leaf (where the path ends) if the point stored at<br />
vsplit lies in Q;<br />
13 Similarly, follow the path from RightChild(vsplit) to x2,<br />
call 1D<strong>Range</strong>Query with the range [y1,y2] on the<br />
associate structures of subtrees Γassoc(LeftChild(v)), which<br />
is on the left of the path, and report a point at the leaf<br />
(where the path ends) if it lies in Q.<br />
C. Challenges<br />
As we can see from Algorithm 2, by doing comparison (i.e.<br />
if x1 ≤ xv) in 1-dimensional range query, it is more easier<br />
and simpler to make decisions at nodes in a range tree than<br />
verifying geometry relationships of two hyper-rectangles at<br />
nodes in a kd-tree. However, during the design of an MDRSE<br />
based on range trees, if we directly follow the one-by-one<br />
search on each single dimension of a range tree in plaintext<br />
domain, we have to decompose search tokens and decryption<br />
tokens <strong>over</strong> encrypted data into each single dimension, which<br />
will no doubt reveal more information to data users than<br />
they are authorized.<br />
In order to leverage the similar methodology in the last<br />
section to solve the above problem, we first transform the<br />
decision conditions at nodes in a range tree into geometry<br />
relationships of two rectangles in the two-dimension (two
X-dimension<br />
(first level tree)<br />
Y-dimension<br />
(second level tree)<br />
(a) 2D range tree, where each non-leaf node in xdimension<br />
has a second level tree on y-dimension.<br />
X-dimension<br />
(first level tree)<br />
Two Rectangles<br />
HRL = ([1,3],[1,Ty])<br />
HRR = ([3,6],[1,Ty])<br />
1<br />
3<br />
1,5 3,8 4,2<br />
hyper-rectangles in the multi-dimension). This transformation<br />
makes the decision conditions more complicated than the<br />
original ones, but will offer us an opportunity to design an<br />
MDRSE based on range trees.<br />
Specifically, similar as a non-leaf node in a kd-tree, a nonleaf<br />
node in a range tree can also be represented with two<br />
rectangles on each level tree (shown in Fig. 9(b)) and the range<br />
query can be always denoted as a rectangle HRQ. Then, we<br />
can find a method to decide the split node in a range tree by<br />
verifying whether a rectangle fully c<strong>over</strong>s another one. Details<br />
are illustrated in the following algorithm. To distinguish the<br />
original one, we denote it as FindSplitNode ∗ . Note that the<br />
input of the original one is the range treeΓand a range[x1,x2]<br />
in a single dimension, while the input of ours is the range tree<br />
Γ and the entire range query Q in the multi-dimension.<br />
Algorithm 3: FindSplitNode ∗ (Γ, Q)<br />
Input: A 2D range tree (or its subtree) Γ, and a query<br />
Q = [x1,x2]∧[y1,y2], which can be transformed as a<br />
rectangle HRQ = [Rx,Ry], where Rx = [x1,x2] and<br />
Ry = [y1,y2];<br />
Output: The split node vsplit in a single dimension.<br />
1 v is the root (or a node) of Γ;<br />
2 if v is a leaf then<br />
3 v is the split node.<br />
4 else if v is a non-leaf then<br />
5 if LeftRectangle(v) fully c<strong>over</strong>s HRQ then<br />
6 FindSplitNode ∗ (LeftSubtree(v),Q);<br />
7 else if RightRectangle(v) fully c<strong>over</strong>s HRQ then<br />
8 FindSplitNode ∗ (RightSubtree(v),Q);<br />
9 else<br />
10 v is the split node;<br />
With Algorithm FindSplitNode ∗ , we then transform the<br />
original search algorithm of a range tree by verifying geometry<br />
relationships of two rectangles at nodes. Similar, to<br />
distinguish the original one, we denote ours as 2D<strong>Range</strong>-<br />
Query ∗ . The search rule of 1D<strong>Range</strong>Query ∗ is the same<br />
as 2D<strong>Range</strong>Query ∗ , except when LeftRectangle(v) intersects<br />
HRQ, 1D<strong>Range</strong>Query ∗ reports all the nodes on the right<br />
subtree and continues to search on the left subtree. Clearly, we<br />
can easily extend 2D<strong>Range</strong>Query ∗ into its the D-dimensional<br />
version DD<strong>Range</strong>Query ∗ .<br />
4<br />
6,9<br />
Y-dimension<br />
(second level tree)<br />
5<br />
8<br />
2<br />
6,9<br />
3,8<br />
1,5<br />
4,2<br />
Two Rectangles<br />
HRL = ([1,6],[2,5])<br />
HRR = ([1,6],[5,9])<br />
(b) A first level tree and its second level tree. Each<br />
non-leaf node in x-dimension (or y-dimension) can be<br />
represented as two rectangles.<br />
Fig. 9. An example of a range tree.<br />
3<br />
10<br />
23<br />
19 30<br />
37<br />
49<br />
70 89 93<br />
3 10 19 23 30 37 59 62 70 80 93<br />
49<br />
59<br />
62<br />
Split Node<br />
(c) Find split node in a single dimension with<br />
query Q = [61,90].<br />
Algorithm 4: 2D<strong>Range</strong>Query ∗ (v,Q)<br />
Input: A 2D range tree Γ, and a query Q = [x1,x2]∧[y1,y2],<br />
which can be transformed as a rectangle<br />
HRQ = [Rx,Ry], where Rx = [x1,x2] and<br />
Ry = [y1,y2];<br />
Output: All points in Γ that lie in Q.<br />
1 vsplit ← FindSplitNode ∗ (Γ,Q);<br />
2 if vsplit is a leaf then<br />
3 Report it if the point X stored at vsplit lies in HRQ.<br />
4 else<br />
5 v ← LeftChild(vsplit);<br />
6 while v is not a leaf do<br />
7 if LeftRectangle(v) intersects HRQ then<br />
8 1D<strong>Range</strong>Query ∗ (Γassoc(RightChild(v)),Q);<br />
9 v ← LeftChild(v);<br />
10 else<br />
11 v ← RightChild(v);<br />
12 Report the leaf (where the path ends) if the point X stored<br />
at vsplit lies in HRQ;<br />
13 Similarly, follow the path from RightChild(vsplit) to some<br />
leaf, call 1D<strong>Range</strong>Query ∗ with the query Q on the<br />
associate structures of subtrees Γassoc(LeftChild(v)), which<br />
is on the left of the path, and report a point at the leaf<br />
(where the path ends) if it lies in HRQ.<br />
D. Hyper-Rectangle Containing Predicate Encryption<br />
As we argued at the beginning of this section, except hyperrectangle<br />
intersection predicate encryption, which is able to<br />
decide whether two rectangles intersect, we also need another<br />
predicate encryption, named hyper-rectangle containing predicate<br />
encryption, to decide whether a hyper-rectangle HR fully<br />
contains another hyper-rectangle HR ′ based on encrypted<br />
data.<br />
Once again, for the ease of description, we start to explain<br />
hyper-rectangle containing predicate encryption with two<br />
ranges in a single dimension. Given two ranges R and R ′ in<br />
a single dimension, we transform the first range R = [xl,xu]<br />
into a point in a two-dimension, and convert the second range<br />
R ′ = [x ′ l ,x′ u] into a rectangle in a two-dimension. Then,<br />
we can decide whether range R fully c<strong>over</strong>s range R ′ by<br />
verifying whether this two dimensional point is in the twodimensional<br />
rectangle. As shown in Fig. 10 and Lemma 5,<br />
the transformation of range R into a point is the same as<br />
hyper-rectangle intersection predication encryption and hyperrectangle<br />
contained predicate encryption, but the transforma-<br />
80<br />
89<br />
97
tion of range R ′ into the two-dimensional rectangle is neither<br />
the same as the way in hyper-rectangle intersection predicate<br />
encryption nor the approach in hyper-rectangle contained<br />
predicate encryption.<br />
x1,l<br />
x2,l<br />
x3,l<br />
x ′ l<br />
x1,u<br />
x ′ u<br />
x3,u<br />
x2,u<br />
1 T<br />
<strong>Range</strong>s fully c<strong>over</strong> R ′ = [x ′ l ,x′ u]<br />
<strong>Range</strong>s fully c<strong>over</strong> R ′ = [x ′ l ,x′ u]<br />
x4,l x4,u<br />
Fig. 10. Examples of whether one range fully c<strong>over</strong>s another one.<br />
Lemma 5: For two ranges R = [xl,xu] and R ′ = [x ′ l ,x′ u],<br />
we have a point X = (x1,x2) = (xl,xu) based on range R<br />
and a rectangle HR ′ = (R ′ 1,R ′ 2) based on range R ′ , where<br />
R ′ 1 = [1,x ′ l ] and R′ 2 = [x ′ u,T]. If X ∈ HR ′ , then R ⊇ R ′ ;<br />
otherwise, X /∈ HR ′ , then R R ′ .<br />
The correctness of this lemma can be explained as below<br />
R ⊇ R ′ ⇔<br />
xl ∈ [1,x ′ l ]<br />
xu ∈ [x ′ u,T] ⇔ X ∈ HR′ .<br />
By extending Lemma 5, we can decide whether a hyperrectangle<br />
fully c<strong>over</strong>s another hyper-rectangle in the multidimension.<br />
Lemma 6: For two hyper-rectangles HR = (R1,...,RD)<br />
and HR ′ = (R ′ 1,...,R ′ D ), where Ri = [xi,l,xi,u] and R ′ i =<br />
[x ′ i,l ,x′ i,u ], for i ∈ [D], we have a point X = (x1,...,x2D) =<br />
(x1,l,x1,u,...,xD,l,xD,u) and a hyper-rectangle HR ′′ =<br />
(R ′′<br />
1,...,R ′′<br />
2D ), where R′′ 2i−1 = [1,x′ i,l ] and R′′ 2i = [x′ i,u ,T],<br />
for i ∈ [D]. If X ∈ HR ′′ , then HR ⊇ HR ′ ; otherwise,<br />
X /∈ HR ′′ , then HR HR ′ .<br />
Then, by transforming two D-dimensional hyper-rectangles<br />
HR, HR ′ into a 2D-dimensional point and a 2D-dimensional<br />
hyper-rectangle respectively based on the above lemma, we<br />
can encrypt a message with this 2D-dimensional point (transformed<br />
from HR) using MRPE, and this message can<br />
only be decrypted with a decryption key generated by a<br />
2D-dimensional hyper-rectangle (transformed from HR ′ ), if<br />
HR ⊇ HR ′ . Clearly, this hyper-rectangle containing predicate<br />
encryption is selectively secure if MRPE is selectively<br />
secure.<br />
E. Construction based on <strong>Range</strong> Trees<br />
Using these two predicate encryptions together, we are able<br />
to build an MDRSE within polylogarithmic search time based<br />
on range trees <strong>over</strong> encrypted data. As we can see from the<br />
following box, during the encryption, each non-leaf node is encrypted<br />
with hyper-rectangle containing predicate encryption<br />
and hyper-rectangle intersection predicate encryption together;<br />
and each leaf node is still encrypted with the combination<br />
of symmetric key encryption, MRPE and its predicate-only<br />
version, which is the same as the case in kd-trees.<br />
During the search, our scheme exactly follows<br />
FindSplitNode ∗ and DD<strong>Range</strong>Query ∗ by computing hi,L,<br />
hi,R, gi,L, gi,R, and f respectively when it is necessary. For<br />
instance, when we need to decide whether LeftRectangle(v)<br />
fully c<strong>over</strong>s HRQ in FindSplitNode ∗ , we compute the<br />
corresponding hi,L of this node v, if hi,L = 1 (indicates<br />
LeftRectangle(v) fully c<strong>over</strong>s HRQ), then we continue to<br />
find split node in the left subtree. By doing these, we can<br />
exactly follow the search algorithm of a range tree when all<br />
the nodes are encrypted.<br />
F. Security and <strong>Privacy</strong> Analysis<br />
With the same approach in the previous section, we can<br />
prove that our scheme (range-tree) is also selectively secure<br />
and single dimensional confidentiality.<br />
Theorem 4: (Our Scheme (range trees)-selective security.)<br />
Our proposed scheme (range trees) is selectively secure<br />
against polynomial-time adversaries, as long as MRPE is<br />
selectively secure against polynomial-time adversaries.<br />
Theorem 5: (Single <strong>Dimensional</strong> Confidentiality). If our<br />
proposed scheme (range trees) is selectively secure, then it<br />
preserves single dimensional confidentiality.<br />
Although the search tokens are not able to be decomposed<br />
into single dimensions in our scheme (range tree), however, it<br />
is not able to achieve single dimensional query privacy. It is<br />
because the data structure of a range tree itself will inevitably<br />
reveal single dimensional query privacy. Now let us explain it<br />
with the following example.<br />
We first assume a data user obtains a search token TK TKQ<br />
based on a two-dimensional range query Q = [x1,x2] ∧<br />
[y1,y2]. Then this data user can submit his search tokenTK TK TKQ to<br />
the cloud, and the cloud server will search on the outsourced<br />
range tree Γ∗ with TK TKQ. If the cloud server colludes with<br />
this data user, stops the search algorithm right after finishing<br />
search on X-dimension, and directly returns the search results<br />
on X-dimension to the data user, then these results are exactly<br />
the same as the search results of a single dimensional range<br />
query Qx = [x1,x2], which means the data user can search on<br />
a range tree for an X-dimensional range query with a search<br />
token of a multi-dimensional query. To search independently<br />
on Y-dimension, the cloud server can directly jump to the<br />
y-dimension tree (second level tree) without searching on Xdimension.<br />
Then, by doing these, the search results after Ydimension<br />
ofQis exactly the same asQy = [y1,y2], no matter<br />
the nodes are encrypted or not.<br />
Well, the good news is to compromise the single dimensional<br />
query privacy, this data user needs to collaborate with<br />
the cloud server. In addition, this data user is not able to<br />
decrypt those results on single dimensions because single<br />
dimensional confidentiality is still protected. The reason of this<br />
privacy leak is caused by the fact that the search algorithm of<br />
a range tree is operated on each single dimension one after<br />
another. When performing search on x-dimension, the ranges<br />
of other dimensions will not affect the search decisions at<br />
nodes in x-dimension. Therefore, as long as two queries have<br />
the same range on x-dimension, the search results after xdimension<br />
(first level tree) is exactly the same, no matter
Setup(λ, {T1,...,TD}): Given security parameter λ and upper<br />
bound Ti in each dimension, build a lattice L∆ = [T1]×···×[TD]<br />
and a lattice L2∆ = [T1]×[T1]×···×[TD]×[TD]; compute<br />
(PK1,MK1) ←MRPE.Setup(λ,L∆),<br />
(PK2,MK2) ←MRPE O .Setup(λ,L∆),<br />
(PK3,MK3) ←MRPE O .Setup(λ,L2∆),<br />
(PK4,MK4) ←MRPE O .Setup(λ,L2∆);<br />
and output public key PK = {PK1,PK2,PK3,PK4} and master key<br />
MK = {MK1,MK2,MK3,MK4}.<br />
Build({M1,...,Mn}, {A1,...,An}): Given all the data records<br />
{M1,...,Mn} and their attributes {A1,...,An}, output range tree<br />
Γ.<br />
Encrypt(PK PK PK, {M1,...,Mn}, Γ): Given public key PK =<br />
{PK1,PK2,PK3,PK4}, all the data records {M1,...,Mn} and kdtree<br />
Γ, encrypt each node in range tree Γ.<br />
• If it is a non-leaf node, let HRi,L and HRi,R be the two<br />
hyper-rectangles represented on the i-level tree, where i ∈<br />
[1,D]; transform HRi,L = (Ri,1,..,Ri,D), where Ri,j =<br />
[xi,j,l,xi,j,u], into a point Xi,L = (xi,1,...,xi,2D) =<br />
(xi,1,l,xi,1,u,...,xi,D,l,xi,D,u), transform HRi,R into a<br />
point Xi,R in the same way; compute<br />
Ci,3,L ←MRPE O .Encrypt(PK3,Xi,L,Flag),<br />
Ci,3,R ←MRPE O .Encrypt(PK3,Xi,R,Flag),<br />
Ci,4,L ←MRPE O .Encrypt(PK4,Xi,L,Flag),<br />
Ci,4,R ←MRPE O .Encrypt(PK4,Xi,R,Flag),<br />
and output C = {C1,...,CD}, where Ci =<br />
{Ci,3,L,Ci,3,R,Ci,4,L,Ci,4,R} for the i-level tree of this<br />
non-leaf node.<br />
• If it is a leaf node, it is a data record Mi, i ∈ [1,n], and<br />
its attribute vector is Ai = (ai,1,...,ai,D); set a point X<br />
as X = (x1,...,xD) = (ai,1,...,ai,D), generate a random<br />
symmetric key Ki, then encrypt as<br />
Ci,0 ←Enc(Ki,Mi),<br />
Ci,1 ←MRPE.Encrypt(PK1,X,Ki,Flag),<br />
Ci,2 ←MRPE O .Encrypt(PK2,X,Flag),<br />
whether the nodes in the tree are encrypted or not.<br />
VI. PERFORMANCE<br />
In this section, we analyze the performance of our scheme,<br />
evaluate the performance in experiments based on a large-scale<br />
real-world dataset.<br />
A. Complexity Analysis<br />
In our scheme, we utilize a kd-tree to achieve sublinear<br />
search time <strong>over</strong> encrypted data. The search complexity of a<br />
kd-tree is O(n 1−1/D ), where n is the number of data records<br />
and D is the number of dimensions. We encrypt nodes in a<br />
kd-tree with two novel predicate encryptions based on MRPE<br />
[8], where the decryption cost at each node can be optimized<br />
to O(DlogT). Therefore, the total search time of our scheme<br />
is O(n 1−1/D · DlogT). According to the structure of a kdtree,<br />
the storage cost of a kd-tree for a number of n records<br />
is O(n). Considering the size of an encrypted value under<br />
MRPE is O(DlogT) [8], the total storage cost of our scheme<br />
is O(n·DlogT) as shown in Table I.<br />
Fig. 11. Details of our scheme based on range trees.<br />
and output Ci = {Ci,0,Ci,1,Ci,2}.<br />
Output outsourced range tree Γ ∗ after encrypting all the nodes.<br />
Extract(MK MK MK,Q): Given master key MK = {MK1,MK2,MK3,MK4}<br />
and a multi-dimensional range search query Q = [x1,l,x1,u] ∧<br />
··· ∧ [xD,l,xD,u], transform Q into a hyper-rectangle HR =<br />
(R1,...,RD), where Ri = [xi,l,xi,u] and 1 ≤ i ≤ D; generate a<br />
hyper-rectangle HR ′ = (R ′ 1,...,R ′ 2D), where R ′ 2i−1 = [1,xi,l]<br />
and R ′ 2i = [xi,u,Ti], for 1 ≤ i ≤ D, and another hyperrectangle<br />
HR ′′ = (R ′′<br />
1,...,R ′′<br />
2D), where R ′′<br />
2i−1 = [xi,l,xi,u] and<br />
R ′′<br />
2i = [xi,l,xi,u], for 1 ≤ i ≤ D, then compute<br />
SK1 ←MRPE.DeriveKey(MK1,HR),<br />
TK2 ←MRPE O .DeriveKey(MK2,HR),<br />
TK3 ←MRPE O .DeriveKey(MK3,HR ′ ),<br />
TK4 ←MRPE O .DeriveKey(MK4,HR ′′ );<br />
and return search token TK TKQ = {TK2,TK3,TK4} and decryption<br />
key SK SKQ = SK1.<br />
<strong>Search</strong>(TK TK TKQ, Γ ∗ ): Given search token TK TKQ = {TK2,TK3,TK4}<br />
and outsourced range tree Γ ∗ , search from the root node<br />
of Γ ∗ by exactly following FindSplitNode ∗ (Γ ∗ ,TK TK TKQ) and<br />
DD<strong>Range</strong>Query ∗ (Γ ∗ ,TK TK TKQ), and compute hi,L,hi,R,gi,L,<br />
gi,R,f when it is necessary, where<br />
hi,L ←MRPE O .Decrypt(PK3,TK3,Ci,3,L),<br />
hi,R ←MRPE O .Decrypt(PK3,TK3,Ci,3,R),<br />
gi,L ←MRPE O .Decrypt(PK4,TK4,Ci,4,L),<br />
gi,R ←MRPE O .Decrypt(PK4,TK4,Ci,3,R),<br />
f ←MRPE O .Decrypt(PK2,TK2,Ci,2);<br />
and report points that lies in Q.<br />
Decrypt(SK SK SKQ,Ci): Given decryption keySK SK SKQ and encrypted data<br />
record Ci = {Ci,0,Ci,1,Ci,2}, compute<br />
Ki ←MRPE.Decrypt(PK1,SKHR,Ci,1),<br />
Mi ← Dec(Ki,Ci,0) = Dec(Ki,Enc(Ki,Mi)),<br />
and output data record Mi, if attribute set Ai ∈ Q; and ⊥<br />
otherwise.<br />
B. Experimental Results<br />
Now we evaluate the performance of our proposed scheme,<br />
and compare it with the straightforward solution in experiments.<br />
All the experiments are tested under Ubuntu with<br />
Intel Core i5 3.30 GHz Processor and 4 GB Memory. We<br />
leverage Pairing-Based Library (PBC) 1 to simulate the cost of<br />
cryptographic operations. We use a large-scale real world data<br />
set with millions of records (U.S. census 1990 2 ) to evaluate the<br />
search performance of our scheme and the straightforward solution<br />
[8]. In the experiments, we test the search performance<br />
<strong>over</strong> 200 times with the input of random multi-dimensional<br />
range search queries.<br />
As we can see from Fig. 12(a) and 12(b), the average<br />
search time in the straightforward solution increases linearly<br />
with the number of records when the number of dimensions<br />
is fixed. While the average search time of our scheme also<br />
increases with the number of records, however, it increases<br />
1 crypto.stanford.edu/pbc/<br />
2 http://archive.ics.uci.edu/ml/datasets.html
Ave. <strong>Search</strong> Time (s)<br />
9000<br />
8000<br />
7000<br />
6000<br />
5000<br />
4000<br />
3000<br />
2000<br />
1000<br />
Our Scheme<br />
Straightforward<br />
0<br />
10 20 30 40 50<br />
n: the number of records (thousand)<br />
(a) Average search time (second)<br />
when D = 4.<br />
Ave. <strong>Search</strong> Time (s)<br />
3000<br />
2500<br />
2000<br />
1500<br />
1000<br />
500<br />
0<br />
2 3 4 5<br />
D: the number of dimensions<br />
Ave. <strong>Search</strong> Time (s)<br />
1600<br />
1400<br />
1200<br />
1000<br />
800<br />
600<br />
400<br />
200<br />
0<br />
Our Scheme<br />
Straightforward<br />
10 20 30 40 50<br />
n: the number of records (thousand)<br />
(b) Average search time (second)<br />
when D = 2.<br />
Fig. 12. Impact of n on average search time.<br />
20<br />
Our Scheme<br />
Straightforward<br />
15<br />
Our Scheme<br />
Straightforward<br />
(a) Average search time (second)<br />
when n = 10,000.<br />
Ave. <strong>Search</strong> Time (ks)<br />
10<br />
5<br />
0<br />
2 3 4 5<br />
D: the number of dimensions<br />
(b) Average search time (kilosecond)<br />
when n = 100,000.<br />
Fig. 13. Impact of D on average search time.<br />
much slower than the straightforward solution. Therefore, our<br />
scheme can finish search much faster than the straightforward<br />
solution under the same parameters. For instance, when D = 4<br />
and n = 50,000, the straightforward solution requires 7,377<br />
seconds in average to finish each multi-dimensional range<br />
search query; with our solution, the same search query can<br />
be done within 1,537 seconds in average. We can also find in<br />
Fig. 13(a) and 13(b) that our scheme has a better performance<br />
than the straightforward solution.<br />
In Table II, we compare the performance of our scheme<br />
and the straightforward solution under different scales of data<br />
records. Compared to the straightforward solution, our scheme<br />
is able to save a huge of time on multi-dimensional range<br />
search queries. Especially, when the total of the data records<br />
is one million and the number of dimensions is D = 4, our<br />
scheme is able to finish search within 5.33×10 3 seconds in<br />
average while the average search time of the straightforward<br />
solution is 152.03×10 3 seconds.<br />
TABLE II<br />
AVERAGE SEARCH TIME (KILOSECOND) WHEN D = 4.<br />
n Our Scheme Straightforward<br />
10,000 0.47 1.59<br />
100,000 3.25 15.44<br />
1,000,000 5.33 152.03<br />
As we mentioned in the previous section, we can improve<br />
search efficiency by leveraging range trees. For instance, with<br />
a number of 4,096 records, the search time <strong>over</strong> encrypted<br />
data with range trees is only around 10.96 seconds while the<br />
one with kd-trees is about 161.03 seconds.<br />
TABLE III<br />
AVERAGE SEARCH TIME (SECOND) WHEN D = 4.<br />
n With kd-trees With range-trees<br />
1,024 82.10 8.89<br />
2,048 142.33 9.46<br />
4,096 161.03 10.96<br />
Note that since the main cryptographic operation during<br />
the decryption of MRPE is the pairing operation, we can use<br />
some optimized method [20] on the computation of pairing<br />
to improve the current search performance. More specifically,<br />
by using an efficient method [20] for computing pairing<br />
operations, we can reduce the time of computing pairing from<br />
6.33 ms to 1.69 ms, which can significantly improve the<br />
current search performance of our scheme. For instance, with<br />
the optimization on the computation of pairing operations,<br />
when the number of data records is one million and the<br />
number of dimension of search queries is D = 4, then the<br />
average search time <strong>over</strong> encrypted data can be reduced to<br />
1,423 seconds. In addition, we can also leverage some special<br />
hardware, such as the Elliptic Semiconductor CLP-17 [21],<br />
to further reduce the search time on multi-dimensional range<br />
search <strong>over</strong> encrypted cloud data.<br />
VII. RELATED WORK<br />
<strong>Search</strong>able Encryption. <strong>Search</strong>able Encryption schemes can<br />
be divided into two categories, symmetric key based and<br />
public key based. The first practical symmetric key based<br />
keyword search solution for encrypted text documents was<br />
proposed by Song et al. [2], which has linear complexity with<br />
regard to the document length. Since then, the quest for higher<br />
efficiency and better functionality has never stopped. Various<br />
schemes has improved the efficiency of keyword search by<br />
creating an encrypted searchable index such like an inverted<br />
index [4], [5], [7], [22], [23]. However, the above works can<br />
only support single keyword search functionalities. Recently<br />
Cao et al. [24] proposed a multi-keyword ranked search<br />
scheme <strong>over</strong> encrypted cloud data. Kamara et al. focused on<br />
enabling privacy-preserving dynamic updates for SSE [13].<br />
However, the search complexity is still linear with the number<br />
of documents. More<strong>over</strong>, the above works mostly focus on<br />
document search and does not apply to searches on databases<br />
(may contain numerical values) considered in this paper.<br />
On the other hand, the first public key based keyword<br />
searchable encryption scheme was proposed by Boneh et al.<br />
[3]. To support more complex queries, conjunctive keyword<br />
search solutions <strong>over</strong> encrypted data have been proposed [25],<br />
[26]. They are based on bilinear pairings, while the search<br />
needs to touch every data record. Bellare et al. [27] proposed<br />
an efficient deterministic searchable public key encryption<br />
scheme. Logarithmic search complexity can be achieved;<br />
however, it only allows equality test. In addition, due to its<br />
deterministic property, plaintext frequency will be exposed due<br />
to duplicate attributes.<br />
Predicate Encryption. Predicate Encryption is a promising<br />
approach to achieve more flexible search functionalities [28],<br />
[29]. In general, given a ciphertext encrypting under an<br />
attribute vector A, a predicate function f(), the encryption<br />
enables a user who holds the corresponding token to f() to<br />
test whether f(A) = 1 without decrypting the data.<br />
Boneh et al. proposed a public key based hidden vector<br />
encryption scheme [6], which supports conjunctive subset,<br />
equality and range queries. However, the complexity of range<br />
search per data record is O(TD) (linear with the range size<br />
T ), which is far from efficient when the range is large. Later<br />
Katz et al. [30] proposed another public key based predicate<br />
encryption scheme that supports inner products. Theoretically,
equality, subset, conjunctive, disjunctive predicates can all be<br />
expressed in the form of inner-products. However, it incurs an<br />
exponential blowup for handling arbitrary boolean expressions<br />
(containing both CNF/DNF [29]).<br />
Shi et al. [8] proposed an MRPE by exploiting an efficient<br />
tree-based range representation, the per data record search<br />
complexity is reduced to O(DlogT). Later, Shen et al. [31]<br />
found the predicate encryption in [8] may inherently reveal<br />
the query predicate within a token, and proposed a symmetrickey<br />
inner product encryption scheme to solve it. However, it<br />
does not support efficient multi-dimensional range search and<br />
the symmetric-key based approach is not suitable for multiowner<br />
settings. To enable multi-dimensional range query with<br />
predicate privacy, Li et al. [32] proposed a public key based<br />
encrypted search scheme, by introducing an additional trusted<br />
proxy that keeps a secret proxy key. However, it lacks support<br />
for efficient arbitrary range queries. Finally, none of the above<br />
schemes achieve sublinear time search complexity with regard<br />
to the dataset size.<br />
VIII. CONCLUSION<br />
In this paper, we address the problem of efficient and<br />
privacy-preserving search <strong>over</strong> encrypted databases outsourced<br />
to the cloud. We propose a suite of MDRSE for handling multidimensional<br />
range queries <strong>over</strong> encrypted data within sublinear<br />
search time. Through a novel combination of predicate encryption<br />
primitives with efficient multi-dimensional data structures<br />
(such as kd-trees and range trees), efficient MDRSE can be<br />
realized. Our approach is general, in that it can be easily<br />
integrated with different multi-dimensional data structures to<br />
achieve different tradeoffs. We carry out extensive experiments<br />
on a large-scale real-world dataset and show the scalability and<br />
efficiency of our schemes.<br />
REFERENCES<br />
[1] M. Armbrust, A. Fox, R. Griffith, A. D.Joseph, R. H.Katz, A. Konwinski,<br />
G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “A View<br />
of Cloud Computing,” Communications of the ACM, vol. 53, no. 4, pp.<br />
50–58, Apirl 2010.<br />
[2] D. Song, D. Wagner, and A. Perrig, “Practical Techniques for <strong>Search</strong>es<br />
on Encrypted Data,” in Proc. of IEEE S&P’00, 2000.<br />
[3] D. Boneh, G. D. Crescenzo, R. Ostrovsky, and G. Persiano, “Public Key<br />
Encryption with Keyword <strong>Search</strong>,” in Proc. of EUROCRYP’04, 2004,<br />
pp. 506–522.<br />
[4] E.-J. Goh, “Secure Indexes,” Cryptology ePrint Archive, Report<br />
2003/216, 2003, http://eprint.iacr.org/.<br />
[5] R. Curtmola, J. A. Garay, S. Kamara, and R. Ostrovsky, “<strong>Search</strong>able<br />
Symmetric Encryption: Improved Definitions and Efficient Constructions,”<br />
in Proc. of ACM CCS’06, 2006.<br />
[6] D. Boneh and B. Waters, “Conjunctive, Subset, and <strong>Range</strong> Queries on<br />
Encrypted Data,” in Proc. of TCC’07, 2007, pp. 535–554.<br />
[7] C. Wang, N. Cao, J. Li, K. Ren, and W. Lou, “Secure Ranked Keyword<br />
<strong>Search</strong> <strong>over</strong> Encrypted Cloud Data,” in Proc. of ICDCS’10, 2010.<br />
[8] E. Shi, J. Bethencourt, T.-H. H. Chan, D. Song, and A. Perrig, “<strong>Multi</strong>-<br />
<strong>Dimensional</strong> <strong>Range</strong> Query <strong>over</strong> Encrypted Data,” in Proc. of IEEE<br />
S&P’07, 2007, pp. 350–364.<br />
[9] Y. Lu, “<strong>Privacy</strong>-<strong>Preserving</strong> Logarithmic-time <strong>Search</strong> on Encrypted Data<br />
in Cloud,” in Proc. of NDSS’12, Feb. 2012.<br />
[10] J. L. Bentley, “<strong>Multi</strong>dimensional Binary <strong>Search</strong> Trees Used for Associative<br />
<strong>Search</strong>ing,” Communications of the ACM, vol. 18, no. 9, pp.<br />
509–517, 1975.<br />
[11] Mark de Berg and Otfried Cheong and Marc van Kreveld and Mark<br />
Overmars, Computational Geometry: Algorithms and Applications.<br />
Springer-Verlag, 2008.<br />
[12] J. L. Bentley, “Decomposable <strong>Search</strong>ing Problems,” Information Processing<br />
Letters, vol. 8, no. 5, pp. 201–244, 1979.<br />
[13] S. Kamara, C. Papamanthou, and T. Roeder, “Dynamic <strong>Search</strong>able<br />
Symmetric Encryption,” in Proc. of ACM CCS’12, 2012, pp. 965–976.<br />
[14] D. Boneh, G. D. Crescenzo, R. Ostrovsky, and G. Persiano, “Public<br />
Key Encryption with Keyword <strong>Search</strong>,” in Proc. of EUROCRYPT’04.<br />
Springer-Verlag, 2004, pp. 506–522.<br />
[15] M. Islam, M. Kuzu, and M. Kantarcioglu, “Access Pattern Disclosure on<br />
<strong>Search</strong>able Encryption: Ramification, Attack and Mitigation,” in Proc.<br />
of NDSS’12, 2012.<br />
[16] X. Boyen and B. Waters, “Anonymous Hierarchical Identity-Based<br />
Encryption (Without Random Oracles),” in Proc. of CRYPTO’06.<br />
Springer-Verlag, 2006, pp. 290–307.<br />
[17] Dan Boneh and Amit Sahai and Brent Waters, “Functional Encryption:<br />
A New Vision for Public Key Cryptography,” Communications of the<br />
ACM, vol. 55, no. 11, pp. 56–64, 2012.<br />
[18] A. Guttman, “R-Trees A Dynamic Index Structure for Spatial <strong>Search</strong>ing,”<br />
in Proc. of ACM SIGMOD’84, 1984.<br />
[19] R. Finkel and J. L. Bentley, “Quad Trees: A Data Structure for Retrieval<br />
on Composite Keys,” ACTA Informatica, vol. 4, no. 1, pp. 1–9, 1974.<br />
[20] D. F. Aranha, K. Karabina, P. Longa, C. H. Gebotys, and J. Lopez,<br />
“Faster Explicit Formulas for Computing Pairings <strong>over</strong> Ordinary<br />
Curves,” in Proc. of EUROCRYPT’11, 2011, pp. 48–68.<br />
[21] “The Elliptic Semiconductor (CLP-17): high performance<br />
elliptic curve cryptography point multiplier core,”<br />
http://www.ellipticsemi.com/pdf/CLP-17 90401.pdf.<br />
[22] Y.-C. Chang and M. Mitzenmacher, “<strong>Privacy</strong> <strong>Preserving</strong> Keyword<br />
<strong>Search</strong>es on Remote Encrypted Data,” in Proc. of ACNS’05, 2005, pp.<br />
442–455.<br />
[23] Z. Yang, S. Zhong, and R. N. Wright, “<strong>Privacy</strong>-<strong>Preserving</strong> Queries on<br />
Encrypted Data,” in Proc. of ESORICS’06, 2006, pp. 479–495.<br />
[24] N. Cao, C. Wang, M. Li, K. Ren, and W. Lou, “<strong>Privacy</strong>-<strong>Preserving</strong><br />
<strong>Multi</strong>-keyword Ranked <strong>Search</strong> <strong>over</strong> Encrypted Cloud Data,” in Proc. of<br />
IEEE INFOCOM’11, 2011.<br />
[25] P. Golle, J. Staddon, and B. Waters, “Secure Conjunctive Keyword<br />
<strong>Search</strong> <strong>over</strong> Encrypted Data,” in Proc. of ACNS’04. Springer-Verlag,<br />
2004, pp. 31–45.<br />
[26] L. Ballard, S. Kamara, and F. Monrose, “Achieving Efficient Conjunctive<br />
Keyword <strong>Search</strong>es <strong>over</strong> Encrypted Data,” Information and Communications<br />
Security, pp. 414–426, 2005.<br />
[27] M. Bellare, A. Boldyreva, and A. O’Neill, “Deterministic and Efficiently<br />
<strong>Search</strong>able Encryption,” in Proc. of CRYPTO’07, 2007, pp. 535–552.<br />
[28] T. Okamoto and K. Takashima, “Hierarchical Predicate Encryption for<br />
Inner-Products,” in Proc. of ASIACRYPT’09, 2009, pp. 214–231.<br />
[29] A. Lewko, T. Okamoto, A. Sahai, K. Takashima, and B. Waters,<br />
“Fully Secure Functional Encryption: Attribute-Based Encryption and<br />
(Hierarchical) Inner Product Encryption,” in Proc. of EUROCRYPT’10,<br />
2010, pp. 62–91.<br />
[30] J. Katz, A. Sahai, and B. Waters, “Predicate Encryption Supporting<br />
Disjunctions, Polynomial Equations, and Inner Products,” in Proc. of<br />
EUROCRYPT’08, 2008, pp. 146–162.<br />
[31] E. Shen, E. Shi, and B. Waters, “Predicate <strong>Privacy</strong> in Encryption<br />
Systems,” in Proc. of TCC’09, 2009, pp. 457–473.<br />
[32] M. Li, S. Yu, N. Cao, and W. Lou, “Authorized Private Keyword <strong>Search</strong><br />
<strong>over</strong> Encrypted Data in Cloud Computing,” in Proc. of IEEE ICDCS’11,<br />
2011, pp. 383–392.