Download Full Issue in PDF - Academy Publisher

More documents

Recommendations

Info

1424 JOURNAL OF COMPUTERS, VOL. 8, NO. 6, JUNE 2013 (a) On the dataset T20I6D300K Running time (s) 1000 100 10 AT-Mine MBP UF-Growth CUFP-Mine (Memory Overflow) 0.10 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 (b) Minimum expected support threshold (%) On the dataset kosarak Figure 4. Running time comparison on sparse datasets Figure 4 shows the running time of three algorithms on two sparse datasets. CUFP-Mine is out of memory on these two sparse datasets. As shown in Figure 4, the time performance of our algorithm outperforms UF-Growth, MBP and CUFP-Mine under different minimum expected support thresholds. This is because that CUFP-Mine generates too many supersets and UF-Growth generates too many tree nodes and MBP generates many candidates, as shown in Tables 4-5. The time performance of MBP is dependent on the length of candidate itemsets, the length of transaction itemsets, and the size of dataset: the higher these values are, the lower the time performance of MBP will be. Thus the time performance of MBP decreases sharply with the decreasing of the threshold. Figure 4 indicates that AT-Mine has achieved a better time performance; moreover, its time performance is more stable on sparse dataset. B. Evaluation on Dense Datasets In this section, we test the performance of our proposed algorithm on dense datasets connect and mushroom. TABLE VI. DETAILS ANALYSIS ON THE DATASET CONNECT η (%) trees nodes (#) candidates (#) AT-Mine UF-Growth MBP 15.0 36,823 32,204,274 5,981 14.0 89,118 33,739,243 6,962 13.0 98,842 35,332,360 7,786 12.0 116,290 48,046,639 8,565 11.0 130,423 106,626,725 12,754 10.0 153,913 163,809,762 19,162 Tables 6-7 show the total number of tree nodes generated by AT-Mine and UF-Growth, and the number of candidate itemsets generated by MBP, on the dense datasets. As shown in Tables 6-7, UF-Growth creates too many tree nodes. For example, on the dataset connect, UF-Growth generates 163,809,762 nodes while AT-Mine generates 153,913 nodes when the minimum expected support threshold is 10%. This is because that UF- Growth just merges the nodes that have the same item name as well as the same probability, and it is a very dense and long dataset. Thus we can infer that our algorithm has achieved better performance than UF- Growth in terms of memory usage. MBP not only maintains candidates, but also maintain the dataset while our algorithms only maintain tree nodes using compact trees. Figure 5 shows the running time of three algorithms on the dense datasets connect and mushroom. CUFP- Mine is out of memory on these two dense datasets. As shown in Figure 5, the time performance of our algorithm prevails over UF-Growth, MBP and CUFP-Mine under different minimum expected support thresholds. This is because that CUFP-Mine generates too many supersets and UF-Growth generates too many tree nodes and MBP generates many candidates, as shown in Tables 6-7. Figure 5 shows that the time performance of AT-Mine obviously outperforms that of other algorithms on these two dense datasets; moreover, our time performance is also more stable on the dense datasets. Running time (s) TABLE VII. DETAILS ANALYSIS ON THE DATASET MUSHROOM η (%) trees nodes (#) candidates(#) AT-Mine UF-Growth MBP 7.0 12,041 1,011,721 1,917 6.0 14,420 1,344,369 2,501 5.0 16,243 1,947,609 3,460 4.0 18,685 2,760,249 5,024 3.0 25,884 4,125,745 8,222 2.0 37,395 8,076,099 16,764 10000 1000 100 AT-Mine MBP UF-Growth CUFP-Mine (Memory Overflow) 10 15.0 14.5 14.0 13.5 13.0 12.5 12.0 11.5 11.0 10.5 10.0 Minimum expected support threshold (%) (a) On the dataset connect © 2013 ACADEMY PUBLISHER
JOURNAL OF COMPUTERS, VOL. 8, NO. 6, JUNE 2013 1425 Running time (s) 60 50 40 30 20 10 0 AT-Mine MBP UF-Growth CUFP-Mine (Memory Overflow) 7.0 6.5 6.0 5.5 5.0 4.5 4.0 3.5 3.0 2.5 2.0 (b) Minimum expected support threshold (%) On the dataset mushroom Figure 5. Running time comparison on dense datasets VI. CONCLUSION AND DISCUSSION In this paper, we propose a novel tree structure named AT-Tree to maintain transaction itemsets of an uncertain dataset, and a corresponding algorithm named AT-Mine to mine frequent itemsets. AT-Mine requires two scans of dataset to create an AT-Tree. In the first scan, it creates a header table to maintain sorted frequent items in the descending order of support numbers of items. In the second scan, it maintains probability values of frequent items in each transaction itemsets to an array; it inserts frequent items in each transaction itemsets to an AT-Tree; it maintains probability information of each transaction itemsets to the tail node. So the AT-Tree is as compact as the original FP-Tree, and it does not lose the probability information of each transaction itemsets. Thus, AT-Mine can find frequent itemsets from AT-Tree without additional scan of dataset. Experiments were performed on sparse and dense datasets. We compared our proposed algorithm with some state-of-the-art level-wise and pattern-growth algorithms. The experimental results show that the proposed algorithm has better performance on dense datasets and large sparse datasets, and their time performance is stable on both dense and sparse datasets along with the decreasing of the minimum expected support threshold. REFERENCES [1] R. AGrawal and R. Srikant, Fast algorithms for mining association rules in large databases, in International Conference on Very Large Data Bases. 1994, pp.487-487. [2] J. Han, J. Pei and Y. Yin, Mining frequent patterns without candidate generation, in ACM SIGMOD International Conference on Management of Data. 2000, pp.1-12. [3] G. Grahne and J. Zhu, "Fast algorithms for frequent itemset mining using FP-trees," IEEE Transactions on Knowledge and Data Engineering, Vol.17, no.10, pp.1347-1362, 2005. [4] M. Song and S. Rajasekaran, "A transaction mapping algorithm for frequent itemsets mining," IEEE Transactions on Knowledge and Data Engineering, Vol.18, no.4, pp.472-481, 2006. [5] J. Pei, et al., "H-Mine: Fast and space-preserving frequent pattern mining in a large databases," IIE Transactions (Institute of Industrial Engineers), Vol.39, no.6, pp.593- 605, 2007. [6] P. Paranjape-Voditel and U. Deshpande, "A DIC-based distributed algorithm for frequent itemset generation," Journal of Software, Vol.6, no.2, pp.306-313, 2011. [7] L. Zhou and Z. Zhang, "Efficient mining algorithms of finding frequent datasets," Journal of Software, Vol.7, no.4, pp.727-732, 2012. [8] R.C. Agarwal, C.C. Aggarwal and V.V.V. Prasad, Depth first generation of long patterns, in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2001, pp.108-118. [9] R.J. Bayardo Jr., Efficiently mining long patterns from databases, in ACM SIGMOD International Conference on Management of Data. 1998, pp.85-93. [10] D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu, "MAFIA: A maximal frequent itemset algorithm," IEEE Transactions on Knowledge and Data Engineering, Vol.17, no.11, pp.1490-1504, 2005. [11] D.I. Lin and Z.M. Kedem, "Pincer search: A new algorithm for discovering the maximum frequent set," IEEE Transactions on Knowledge and Data Engineering, Vol.3, no.14, pp.553- 566, 2002. [12] H. Li, N. Zhang and Z. Chen, "A simple but effective maximal frequent itemset mining algorithm over streams," Journal of Software, Vol.7, no.1, pp.25-32, 2012. [13] J. Wang, J. Han and J. Pei, CLOSET+: Searching for the best strategies for mining frequent closed itemsets, in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '03). 2003, pp.236-245. [14] B. Vo, T. Hong and B. Le, "DBV-Miner: A Dynamic Bit- Vector approach for fast mining frequent closed itemsets," Expert Systems with Applications, Vol.39, no.8, pp.7196- 7206, 2012. [15] J. Wang, J. Han, Y. Lu, and P. Tzvetkov, "TFP: An efficient algorithm for mining top-k frequent closed itemsets," IEEE Transactions on Knowledge and Data Engineering, Vol.17, no.5, pp.652-664, 2005. [16] C.H. Wei and R. Gob, "Discovering patterns in categorical time series using IFS," Computational Statistics and Data Analysis, Vol.52, no.9, pp.4369-4379, 2008. [17] J.Y. Hu and A.M. Silovic, "High-utility pattern mining: A method for discovery of high-utility item sets," PATTERN RECOGNITION, Vol.40, no.11, pp.3317-3324, 2007. [18] C.F. Ahmed, S.K. Tanbeer, B.S. Jeong, and Y.K. Lee, "Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases," IEEE Transactions on Knowledge and Data Engineering, Vol.21, no.12, pp.1708- 1721, 2009. [19] V.S. Tseng, C.W. Wu, B.E. Shie, and P.S. Yu. UP-Growth: An efficient algorithm for high utility itemset mining, in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, pp.253-262. [20] V.S. Tseng, B. Shie, C. Wu, and P.S. Yu, "Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases," IEEE Transactions on Knowledge and Data Engineering, no.99(PrePrints), 2012. [21] A. Prasad Sistla, O. Wolfson, S. Chamberlain, and S. Dao, "Querying the uncertain position of moving objects," Vol.1399, pp.310-337, 1998. [22] N. Khoussainova, M. Balazinska and D. Suciu, Towards correcting input data errors probabilistically using integrity constraints, in MobiDE 2006: 5th ACM International Workshop on Data Engineering for Wireless and Mobile Access. 2006, pp.43-50. [23] C.W. Lin and T.P. Hong, "A new mining approach for uncertain databases using CUFP trees," Expert Systems with Applications, Vol.39(4), pp.4084–4093, 2012. [24] C.C. Aggarwal and P.S. Yu, "A survey of uncertain data algorithms and applications," Knowledge and Data © 2013 ACADEMY PUBLISHER
Page 1 and 2: Journal of Computers ISSN 1796-203X
Page 3: Corn Moisture Measurement using a C
Page 6 and 7: 1378 JOURNAL OF COMPUTERS, VOL. 8,
Page 102 and 103:
1474 JOURNAL OF COMPUTERS, VOL. 8,
Page 104 and 105:
Page 106 and 107:
Page 108 and 109:
Page 110 and 111:
Page 112 and 113:
Page 114 and 115:
Page 116 and 117:
Page 118 and 119:
Page 120 and 121:
Page 122 and 123:
Page 124 and 125:
Page 126 and 127:
Page 128 and 129:
Page 130 and 131:
Page 132 and 133:
Page 134 and 135:
Page 136 and 137:
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
Page 144 and 145:
Page 146 and 147:
Page 148 and 149:
Page 150 and 151:
Page 152 and 153:
Page 154 and 155:
Page 156 and 157:
Page 158 and 159:
Page 160 and 161:
Page 162 and 163:
Page 164 and 165:
Page 166 and 167:
Page 168 and 169:
Page 170 and 171:
Page 172 and 173:
Page 174 and 175:
Page 176 and 177:
Page 178 and 179:
Page 180 and 181:
Page 182 and 183:
Page 184 and 185:
Page 186 and 187:
Page 188 and 189:
Page 190 and 191:
Page 192 and 193:
Page 194 and 195:
Page 196 and 197:
Page 198 and 199:
Page 200 and 201:
Page 202 and 203:
Page 204 and 205:
Page 206 and 207:
Page 208 and 209:
Page 210 and 211:
Page 212 and 213:
Page 214 and 215:
Page 216 and 217:
Page 218 and 219:
Page 220 and 221:
Page 222 and 223:
Page 224 and 225:
Page 226 and 227:
Page 228 and 229:
Page 230 and 231:
Page 232 and 233:
Page 234 and 235:
Page 236 and 237:
Page 238 and 239:
Page 240 and 241:
Page 242 and 243:
Page 244 and 245:
Page 246 and 247:
Page 248 and 249:
Page 250 and 251:
Page 252 and 253:
Page 254 and 255:
Page 256 and 257:
Page 258 and 259:
Page 261:
Call for Papers and Special Issues
Page 264:
(Contents Continued from Back Cover
show all

Download Full Issue in PDF - Academy Publisher

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?