13.11.2014 Views

Incrementally Mining Frequent Itemsets in Update Distorted Databases

Incrementally Mining Frequent Itemsets in Update Distorted Databases

Incrementally Mining Frequent Itemsets in Update Distorted Databases

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

SA-IFIM: <strong>Incrementally</strong> <strong>M<strong>in</strong><strong>in</strong>g</strong> <strong>Frequent</strong><br />

<strong>Itemsets</strong> <strong>in</strong> <strong>Update</strong> <strong>Distorted</strong> <strong>Databases</strong> ⋆<br />

J<strong>in</strong>long Wang, Congfu Xu ⋆⋆ , Hongwei Dan, and Yunhe Pan<br />

Institute of Artificial Intelligence, Zhejiang University<br />

Hangzhou, 310027, Ch<strong>in</strong>a<br />

zjupaper@yahoo.com xucongfu@cs.zju.edu.cn<br />

danhow2008@hotmail.com panyh@sun.zju.edu.cn<br />

Abstract. The issue of ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g privacy <strong>in</strong> frequent itemset m<strong>in</strong><strong>in</strong>g<br />

has attracted considerable attentions. In most of those works, only<br />

distorted data are available which may br<strong>in</strong>g a lot of issues <strong>in</strong> the datam<strong>in</strong><strong>in</strong>g<br />

process. Especially, <strong>in</strong> the dynamic update distorted database<br />

environment, it is nontrivial to m<strong>in</strong>e frequent itemsets <strong>in</strong>crementally due<br />

to the high count<strong>in</strong>g overhead to recompute support counts for itemsets.<br />

This paper <strong>in</strong>vestigates such a problem and develops an efficient<br />

algorithm SA-IFIM for <strong>in</strong>crementally m<strong>in</strong><strong>in</strong>g frequent itemsets <strong>in</strong> update<br />

distorted databases. In this algorithm, some additional <strong>in</strong>formation<br />

is stored dur<strong>in</strong>g the earlier m<strong>in</strong><strong>in</strong>g process to support the efficient <strong>in</strong>cremental<br />

computation. Especially, with the <strong>in</strong>troduction of support<strong>in</strong>g<br />

aggregate and represent<strong>in</strong>g it with bit vector, the transaction database is<br />

transformed <strong>in</strong>to mach<strong>in</strong>e oriented model to perform fast support computation.<br />

The performance studies show the efficiency of our algorithm.<br />

1 Introduction<br />

Recently, privacy becomes one of the prime concerns <strong>in</strong> data m<strong>in</strong><strong>in</strong>g. For not<br />

compromis<strong>in</strong>g the privacy, most of works make use of distortion or randomization<br />

techniques to the orig<strong>in</strong>al dataset, and only the disguised data are shared for data<br />

m<strong>in</strong><strong>in</strong>g [1–3].<br />

<strong>M<strong>in</strong><strong>in</strong>g</strong> frequent itemset models from the distorted databases with the reconstruction<br />

methods br<strong>in</strong>gs expensive overheads as compared to directly m<strong>in</strong><strong>in</strong>g<br />

orig<strong>in</strong>al data sets [2]. In [3, 4], the basic formula from set theory are used to elim<strong>in</strong>ate<br />

these count<strong>in</strong>g overheads. But, <strong>in</strong> reality, for many applications, a database<br />

is dynamic <strong>in</strong> the sense. The changes on the data set may <strong>in</strong>validate some exist<strong>in</strong>g<br />

frequent itemsets and <strong>in</strong>troduce some new ones, so the <strong>in</strong>cremental algorithms<br />

[5, 6] were proposed for address<strong>in</strong>g the problem. However, it is not efficient to<br />

directly use these <strong>in</strong>cremental algorithms <strong>in</strong> the update distorted database, because<br />

of the high count<strong>in</strong>g overhead to recompute support for itemsets. Although<br />

⋆ Supported by the Natural Science Foundation of Ch<strong>in</strong>a (No. 60402010), Zhejiang<br />

Prov<strong>in</strong>cial Natural Science Foundation of Ch<strong>in</strong>a (Y105250) and the Science-<br />

Technology Progrom of Zhejiang Prov<strong>in</strong>ce of Ch<strong>in</strong>a (No. 2004C31098).<br />

⋆⋆ Congfu Xu is the correspond<strong>in</strong>g author.


2 J<strong>in</strong>long Wang et al.<br />

[7] has proposed an algorithm for <strong>in</strong>cremental updat<strong>in</strong>g, the efficiency still cannot<br />

satisfy the reality.<br />

This paper <strong>in</strong>vestigates the problem of <strong>in</strong>cremental frequent itemset m<strong>in</strong><strong>in</strong>g<br />

<strong>in</strong> update distorted databases. We first develop an efficient <strong>in</strong>cremental updat<strong>in</strong>g<br />

computation method to quickly reconstruct an itemset’s support by us<strong>in</strong>g the<br />

additional <strong>in</strong>formation stored dur<strong>in</strong>g the earlier m<strong>in</strong><strong>in</strong>g process. Then, a new<br />

concept support<strong>in</strong>g aggregate (SA) is <strong>in</strong>troduced and represented with bit vector.<br />

In this way, the transaction database is transformed <strong>in</strong>to mach<strong>in</strong>e oriented<br />

model to perform fast support computation. F<strong>in</strong>ally, an efficient algorithm SA-<br />

IFIM (Support<strong>in</strong>g Aggregate based Incremental <strong>Frequent</strong> Itemset <strong>M<strong>in</strong><strong>in</strong>g</strong> <strong>in</strong><br />

update distorted databases) is presented to describe the process. The performance<br />

studies show the efficiency of our algorithm.<br />

The rema<strong>in</strong>der of this paper is organized as follows. Section 2 presents the<br />

SA-IFIM algorithm step by step. The performance studies are reported <strong>in</strong> Section<br />

3. F<strong>in</strong>ally, Section 4 concludes this paper.<br />

2 The SA-IFIM Algorithm<br />

In this section, the SA-IFIM algorithm is <strong>in</strong>troduced step by step. Before m<strong>in</strong><strong>in</strong>g,<br />

the data sets are distorted respectively us<strong>in</strong>g the method mentioned by EMASK<br />

[3]. In the follow<strong>in</strong>g, we first describe the prelim<strong>in</strong>aries about <strong>in</strong>cremental frequent<br />

itemsets m<strong>in</strong><strong>in</strong>g, then <strong>in</strong>vestigate the essence of the updat<strong>in</strong>g technique<br />

and use some additional <strong>in</strong>formation recorded dur<strong>in</strong>g the earlier m<strong>in</strong><strong>in</strong>g and the<br />

set theory for quick updat<strong>in</strong>g computation. Next, we <strong>in</strong>troduce the support<strong>in</strong>g<br />

aggregate and represent it with bit vector to transform the database <strong>in</strong>to mach<strong>in</strong>e<br />

oriented model for speed<strong>in</strong>g up computations. F<strong>in</strong>ally, the SA-IFIM algorithm<br />

is summarized.<br />

2.1 Prelim<strong>in</strong>aries<br />

In this subsection, some prelim<strong>in</strong>aries about the concept of <strong>in</strong>cremental frequent<br />

itemset m<strong>in</strong><strong>in</strong>g are presented, summariz<strong>in</strong>g the formal description <strong>in</strong> [5, 6].<br />

Let D be a set of transactions and I = {i 1 ,i 2 ,...,i m } a set of dist<strong>in</strong>ct<br />

literals (items). For a dynamic database, old transactions △ − are deleted from<br />

the database D and new transactions △ + are added. Naturally, △ − ⊆ D. Denote<br />

the updated database by D ′ , therefore D ′ = (D −△ − )∪△ + , and the unchanged<br />

transactions by D − = D − △ − . Let Fp express the frequent itemsets <strong>in</strong> the<br />

orig<strong>in</strong>al database D, Fp k denote k-frequent itemsets. The problem of <strong>in</strong>cremental<br />

m<strong>in</strong><strong>in</strong>g is to f<strong>in</strong>d frequent itemsets (denoted by Fp ′ ) <strong>in</strong> D ′ , given △ − ,D − , △ + ,<br />

and the m<strong>in</strong><strong>in</strong>g result Fp, with respect to the same user specified m<strong>in</strong>imum<br />

support s. Furthermore, the <strong>in</strong>cremental approach needs to take advantage of<br />

previously obta<strong>in</strong>ed <strong>in</strong>formation to avoid rerunn<strong>in</strong>g the m<strong>in</strong><strong>in</strong>g algorithms on<br />

the whole database when the database is updated. For the clarity, we present s<br />

as a relative support value, but δ + c , δ − c , σ c , and σ ′ c as absolute ones, respectively<br />

<strong>in</strong> △ + , △ − , D, D ′ . And set δ c as the change of support count of itemset c. Then<br />

δ c = δ + c − δ − c , σ ′ c = σ c + δ + c − δ − c .


The SA-IFIM Algorithm 3<br />

2.2 Efficient <strong>in</strong>cremental computation<br />

Generally, <strong>in</strong> dynamically updat<strong>in</strong>g environment, the important aspect of m<strong>in</strong><strong>in</strong>g<br />

is how to deal with the frequent itemsets <strong>in</strong> D, recorded <strong>in</strong> Fp, and how to add<br />

the itemsets, which are non-frequent <strong>in</strong> D (not exist<strong>in</strong>g <strong>in</strong> Fp) but frequent <strong>in</strong><br />

D ′ . In the follow<strong>in</strong>g, for simplicity, we def<strong>in</strong>e | • | as the tuple number <strong>in</strong> the<br />

transaction database.<br />

1. For the frequent itemsets <strong>in</strong> Fp, f<strong>in</strong>d the non-frequent or still available frequent<br />

itemsets <strong>in</strong> the updated database D ′ .<br />

Lemma 1 If c ∈ Fp (σ c ≥ |D| × s), and δ c ≥ (|△ + | − |△ − |) × s, then<br />

c ∈ Fp ′ .<br />

Proof. σ ′ c=σ c + δ + c − δ − c ≥ (|D| × s + |△ + | × s − |△ − | × s) =(|D| + |△ + | −<br />

|△ − |) × s = |D ′ | × s. ⊓⊔<br />

Property 1. When c ∈ Fp, and δ c < (|△ + | − |△ − |) × s, then c ∈ Fp ′ if and<br />

only if σ ′ c ≥ |D ′ | × s.<br />

2. For itemsets which are non-frequent <strong>in</strong> D, m<strong>in</strong>e the frequent itemsets <strong>in</strong> the<br />

changed database △ + − △ − and recompute their support counts through<br />

scann<strong>in</strong>g D − .<br />

Lemma 2 If c ∉ Fp, and δ c < (|△ + | − |△ − |) × s, then c ∉ Fp ′ .<br />

Proof. Refer to Lemma 1. ⊓⊔<br />

Property 2. When c ∉ Fp, and δ c ≥ (|△ + | − |△ − |) × s, then c ∈ Fp ′ if and<br />

only if σ ′ c ≥ |D ′ | × s.<br />

Under the framework of symbol-specific distortion process <strong>in</strong> [3], ‘1’ and ‘0’<br />

<strong>in</strong> the orig<strong>in</strong>al database are respectively flipped with (1−p) and (1−q). In <strong>in</strong>cremental<br />

frequent itemset m<strong>in</strong><strong>in</strong>g, the goal is to m<strong>in</strong>e frequent itemsets from the<br />

distorted databases with the <strong>in</strong>formation obta<strong>in</strong>ed dur<strong>in</strong>g the earlier process. To<br />

test the condition for an itemset not <strong>in</strong> Fp <strong>in</strong> the situation Property 2, we need reconstruct<br />

an itemset’s support <strong>in</strong> the unchanged database D − through scann<strong>in</strong>g<br />

D −∗ . Not only the distorted support of the itemset itself, but also some other<br />

counts related to it need to be tracked of. This makes that the support count<br />

comput<strong>in</strong>g <strong>in</strong> Property 2 is difficult and paramount important <strong>in</strong> <strong>in</strong>cremental<br />

m<strong>in</strong><strong>in</strong>g. And it is nontrivial to directly apply traditional <strong>in</strong>cremental algorithms<br />

to it. To address the problem, an efficient <strong>in</strong>cremental updat<strong>in</strong>g operation is first<br />

developed through computation with the support <strong>in</strong> the distorted database, then<br />

another method is presented to improve the support computation efficiency <strong>in</strong><br />

the section 2.3.<br />

In distorted databases, the support computations of frequent itemsets are<br />

tedious. Motivated by [3], the similar support computation method is used <strong>in</strong><br />

<strong>in</strong>cremental m<strong>in</strong><strong>in</strong>g. With the method, for comput<strong>in</strong>g an itemset’s support, we<br />

should have the support counts of all its subsets <strong>in</strong> the distorted database. However,<br />

if we save the support counts of all the itemsets, this will be unpractical


4 J<strong>in</strong>long Wang et al.<br />

and greatly <strong>in</strong>crease cost and degrade <strong>in</strong>dex<strong>in</strong>g efficiency. Thus <strong>in</strong> <strong>in</strong>cremental<br />

m<strong>in</strong><strong>in</strong>g, when record<strong>in</strong>g the frequent itemsets and their support counts, the<br />

correspond<strong>in</strong>g ones <strong>in</strong> each distorted database are registered at the same time.<br />

In this way, for a k-itemset not <strong>in</strong> Fp, s<strong>in</strong>ce all its subsets are frequent <strong>in</strong> the<br />

database, we can use the exist<strong>in</strong>g support counts <strong>in</strong> each distorted database to<br />

compute and reconstruct its support <strong>in</strong> the updated database quickly. Thus, the<br />

efficiency is improved.<br />

2.3 Support<strong>in</strong>g aggregate and database transformation<br />

In order to improve the efficiency, we <strong>in</strong>troduce the concept support<strong>in</strong>g aggregate<br />

and use bit vector to represent it. By virtue of elementary support<strong>in</strong>g aggregate<br />

based on bit vector, the database is transformed <strong>in</strong>to the mach<strong>in</strong>e oriented data<br />

model, which improves the efficiency of itemsets’ support computation.<br />

In the follow<strong>in</strong>g statement, for transaction database D, let U denote a set<br />

of objects (universe), as unique identifiers for the transactions. For simplicity,<br />

we refer U as the transactions without differences. For an itemset A ⊆ I, a<br />

transaction u ∈ U is said to conta<strong>in</strong> A if A ⊆ u.<br />

Def<strong>in</strong>ition 1. support<strong>in</strong>g aggregate (SA). For an attribute itemset A ⊆ I,<br />

denote S(A) = {u ∈ U|A ⊆ u} as its support<strong>in</strong>g aggregate, where S(A) is<br />

the aggregate, composed of the transactions <strong>in</strong>clud<strong>in</strong>g the attribute itemset A.<br />

Generally, S(A) ⊆ U. For the support<strong>in</strong>g aggregate of each attribute items, we<br />

call it elementary support<strong>in</strong>g aggregate (ESA).<br />

Us<strong>in</strong>g ESA, the orig<strong>in</strong>al transaction database is vertically <strong>in</strong>verted and transformed<br />

<strong>in</strong>to attribute-transaction list. Through the ESA, the SA of an itemset<br />

can be obta<strong>in</strong>ed quickly with set <strong>in</strong>tersection. And the itemsets’ support can<br />

be efficiently computed. In order to further improve process<strong>in</strong>g speed, for each<br />

SA (ESA), we denote it as BV-SA (BV-ESA) with a b<strong>in</strong>ary vector of |U| dimensions<br />

(|U| is the number of transaction <strong>in</strong> U). If an itemset’s SA conta<strong>in</strong>s<br />

the ith transaction, its b<strong>in</strong>ary vector’s ith dimension is set to 1, otherwise, the<br />

correspond<strong>in</strong>g position is set to 0. By this representation, the support count of<br />

each attribute item can be computed efficiently.<br />

With the vertical database representation, where each row presents an attribute’s<br />

BV-ESA, the attribute items can be removed sequentially due to download<br />

closure property [8], which efficiently reduced the size of the data set. On<br />

the other hand, the whole BV-ESA sometimes cannot be loaded <strong>in</strong>to memory<br />

entirely because of the memory constra<strong>in</strong>ts. Our approach seeks to solve the<br />

scalable problem through horizontally partition<strong>in</strong>g the transaction data set <strong>in</strong>to<br />

subsets, which is composed of partial objects (transactions), then load them partition<br />

by partition. Through the method, each partition is disjo<strong>in</strong>ted with each<br />

other, which makes it suitable for the parallel and distributed process<strong>in</strong>g. Furthermore,<br />

<strong>in</strong> reality, the optimizational memory swap strategy can be adopted<br />

to reduce the I/O cost.


The SA-IFIM Algorithm 5<br />

2.4 The process of SA-IFIM algorithm<br />

In this subsection, the algorithm SA-IFIM is summarized as Algorithm 1. When<br />

the distorted data sets D −∗ , △ −∗ and △ +∗ are firstly scanned, they are transformed<br />

<strong>in</strong>to the correspond<strong>in</strong>g vertical bit vector representations BV (D −∗ ),<br />

BV (△ −∗ ) and BV (△ +∗ ) partition by partition, and saved <strong>in</strong>to hard disk. From<br />

the representations, frequent k-itemsets Fp k can be obta<strong>in</strong>ed level by level. And<br />

based on the candidate set generation-and-test approach, candidate frequent<br />

k-itemsets (C k ) are generated from frequent (k-1)-itemsets (Fp k−1 ).<br />

Algorithm 1: Algorithm SA-IFIM<br />

Input: D −∗ , △ +∗ , △ −∗ , Fp (<strong>Frequent</strong> itemsets and the support counts <strong>in</strong> D),<br />

Fp ∗ (<strong>Frequent</strong> itemsets of Fp and the correspond<strong>in</strong>g support counts <strong>in</strong> D ∗ ),<br />

m<strong>in</strong>imum support s, and distortion parameter p, q as EMASK [3].<br />

Output: Fp ′ (<strong>Frequent</strong> itemsets and the support counts <strong>in</strong> D ′ )<br />

Method: As shown <strong>in</strong> Fig.1. In the algorithm, we use some temporal<br />

files to store the support counts <strong>in</strong> the distorted database for<br />

efficiency.<br />

Fig. 1. SA-IFIM algorithm diagram.


6 J<strong>in</strong>long Wang et al.<br />

3 Performance Evaluation<br />

This section performed comprehensive experiments to compare SA-IFIM with<br />

EMASK, provided by the authors <strong>in</strong> [9]. And for the better performance evaluation,<br />

we also implemented the algorithm IFIM (Similar as IPPFIM [7]). All<br />

programs were coded <strong>in</strong> C++ us<strong>in</strong>g Cygw<strong>in</strong> with gcc 2.9.5. The experiments<br />

were done on a P4, 3GHz Processor, with 1G memory. SA-IFIM and IFIM yield<br />

the same itemsets as EMASK with the same data set and the same m<strong>in</strong>imum<br />

support parameters.<br />

Our experiments were performed on the synthetic data sets by IBM synthetic<br />

market-basket data generator [8]. In the follow<strong>in</strong>g, we use the notation as D<br />

(number of transactions), T (average size of the transactions), I (average size<br />

of the maximal potentially large itemsets), and N (number of items), and set<br />

N=1000. In our method, the sizes of |△ + | and |△ − | are not required to be the<br />

same. Without loss of generality, let |d|= |△ + | = |△ − | for simplicity. For the<br />

sake of clarity, TxIyDmdn is used to represent an orig<strong>in</strong>al database with an<br />

update database, where the parameters T = x and I = y are the same, only<br />

different <strong>in</strong> the number of the orig<strong>in</strong>al transaction database |D| = m and the<br />

update transaction database |d| = n.<br />

In the follow<strong>in</strong>g, we used the distorted benchmark data sets as the <strong>in</strong>put<br />

databases to the algorithms. The distortion parameters are same as EMASK [3],<br />

with p=0.5 and q=0.97. In the experiments, for a fair comparison of algorithms<br />

and scalable requirements, SA-IFIM is run where only 5K transactions are loaded<br />

<strong>in</strong>to the ma<strong>in</strong> memory one time.<br />

3.1 Different support analysis<br />

In Fig.2, the relative performance of SA-IFIM, IFIM and EMASK are compared<br />

on two different data sets, T25I4D100Kd10K (sparse) and T40I10D100Kd10K<br />

(dense) with respect to various m<strong>in</strong>imum support. As shown <strong>in</strong> Fig.2, SA-IFIM<br />

leads to prom<strong>in</strong>ent performance improvement. Explicitly, on the sparse data<br />

sets (T25I4D100Kd10K), IFIM is close to EMASK, and SA-IFIM is orders of<br />

magnitude faster than them; on the dense data sets (T40I10D100Kd10K), IFIM<br />

is faster than EMASK, but SA-IFIM also outperforms IFIM, and the marg<strong>in</strong><br />

grows as the m<strong>in</strong>imum support decreases.<br />

3.2 Effect of the update size<br />

Two data sets T25I4D100Kdm and T40I10D100Kdm were experimented, and<br />

the results shown <strong>in</strong> Fig.3. As expected, when the same number of transactions<br />

are deleted and added, the time of rerunn<strong>in</strong>g EMASK ma<strong>in</strong>ta<strong>in</strong>s constant, but<br />

the one of IFIM <strong>in</strong>creases sharply and surpass EMASK quickly. In Fig.3, the<br />

execution time of SA-IFIM is much less than EMASK. SA-IFIM still significantly<br />

outperforms EMASK, even when the update size is much large.


The SA-IFIM Algorithm 7<br />

(a) T25I4D100Kd10K<br />

(b) T40I10D100Kd10K<br />

Fig. 2. Extensive analysis for different support<br />

(a) T25I4D100Kdm(s=0.6%)<br />

(b) T40I10D100Kdm(s=1.25%)<br />

Fig. 3. Different updat<strong>in</strong>g tuples analysis<br />

3.3 Scale up performance<br />

F<strong>in</strong>ally, to assess the scalability of the algorithm SA-IFIM, two experiments,<br />

T25I4Dmd(m/10) at s = 0.6% and T40I10Dmd(m/10) at s = 1.25%, were<br />

conducted to exam<strong>in</strong>e the scale up performance by enlarg<strong>in</strong>g the number of<br />

m<strong>in</strong>ed data set. The scale up results for the two data sets are obta<strong>in</strong>ed as Fig.4,<br />

which shows the impact of |D| and |d| to the algorithms SA-IFIM and EMASK.<br />

In the experiments, the size of the update database is as 10% of the orig<strong>in</strong>al<br />

database, and the size of the transaction database m was <strong>in</strong>creased from 100K<br />

to 1000K. As shown <strong>in</strong> Fig.4, EMASK is very sensitive to the updat<strong>in</strong>g tuple<br />

but SA-IFIM is not, and the execution time of SA-IFIM <strong>in</strong>creases l<strong>in</strong>early as the<br />

database size <strong>in</strong>creases. This shows that the algorithm can be applied to very<br />

large databases and demonstrates good scalability of it.


8 J<strong>in</strong>long Wang et al.<br />

(a) T25I4Dmd(m/10)(s=0.6%)<br />

(b) T40I10Dmd(m/10)(s=1.25%)<br />

Fig. 4. Scale up performance analysis<br />

4 Conclusions<br />

In this paper, we explore the issue of frequent itemset m<strong>in</strong><strong>in</strong>g under the dynamically<br />

updat<strong>in</strong>g distorted databases environment. We first develop an efficient<br />

<strong>in</strong>cremental updat<strong>in</strong>g computation method to quickly reconstruct an itemset’s<br />

support. Through the <strong>in</strong>troduction of the support<strong>in</strong>g aggregate represented with<br />

bit vector, the databases are transformed <strong>in</strong>to the representations more accessible<br />

and processible by computer. The support count comput<strong>in</strong>g can be accomplished<br />

efficiently. Experiments conducted show that SA-IFIM significantly outperforms<br />

EMASK of m<strong>in</strong><strong>in</strong>g the whole updated database, and also have the advantage of<br />

the <strong>in</strong>cremental algorithms only based on EMASK.<br />

References<br />

1. Agrawal, R., and Srikant, R.: Privacy-preserv<strong>in</strong>g data m<strong>in</strong><strong>in</strong>g. In: Proceed<strong>in</strong>gs of<br />

SIGMOD. (2000) 439-450<br />

2. Rizvi, S., and Haritsa, J.: Ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g data privacy <strong>in</strong> association rule m<strong>in</strong><strong>in</strong>g. In:<br />

Proceed<strong>in</strong>gs of VLDB. (2002) 682-693<br />

3. Agrawal, S., Krishnan, V., and Haritsa, J.: On address<strong>in</strong>g efficiency concerns <strong>in</strong><br />

privacy-preserv<strong>in</strong>g m<strong>in</strong><strong>in</strong>g. In: Proceed<strong>in</strong>gs of DASFAA. (2004) 113-124<br />

4. Xu, C., Wang, J., Dan, H., and Pan, Y.: An improved EMASK algorithm for<br />

privacy-preserv<strong>in</strong>g frequent pattern m<strong>in</strong><strong>in</strong>g. In: Proceed<strong>in</strong>gs of CIS. (2005) 752-<br />

757<br />

5. Cheung, D., Han, J., Ng, V., and Wong, C.: Ma<strong>in</strong>tenance of discovered association<br />

rules <strong>in</strong> large databases: An <strong>in</strong>cremental updat<strong>in</strong>g tedchnique. In: Proceed<strong>in</strong>gs of<br />

ICDE. (1996) 104-114<br />

6. Cheung, D., Lee, S., and Kao, B.: A general <strong>in</strong>cremental technique for updat<strong>in</strong>g<br />

discovered association rules. In: Proceed<strong>in</strong>gs of DASFAA. (1997) 106-114<br />

7. Wang, J., Xu, C., and Pan, Y.: An Incremental Algorithm for <strong>M<strong>in</strong><strong>in</strong>g</strong> Privacy-<br />

Preserv<strong>in</strong>g <strong>Frequent</strong> <strong>Itemsets</strong>. In: Proceed<strong>in</strong>gs of ICMLC. (2006)<br />

8. Agrawal, R., and Srikant, R.: Fast algorithms for m<strong>in</strong><strong>in</strong>g association rules. In:<br />

Proceed<strong>in</strong>gs of VLDB. (1994) 487-499<br />

9. http://dsl.serc.iisc.ernet.<strong>in</strong>/projects/software/software.html.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!