25.01.2015 Views

Download Full Issue in PDF - Academy Publisher

Download Full Issue in PDF - Academy Publisher

Download Full Issue in PDF - Academy Publisher

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1424 JOURNAL OF COMPUTERS, VOL. 8, NO. 6, JUNE 2013<br />

(a)<br />

On the dataset T20I6D300K<br />

Runn<strong>in</strong>g time (s)<br />

1000<br />

100<br />

10<br />

AT-M<strong>in</strong>e<br />

MBP<br />

UF-Growth<br />

CUFP-M<strong>in</strong>e (Memory Overflow)<br />

0.10 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01<br />

(b)<br />

M<strong>in</strong>imum expected support threshold (%)<br />

On the dataset kosarak<br />

Figure 4. Runn<strong>in</strong>g time comparison on sparse datasets<br />

Figure 4 shows the runn<strong>in</strong>g time of three algorithms<br />

on two sparse datasets. CUFP-M<strong>in</strong>e is out of memory on<br />

these two sparse datasets. As shown <strong>in</strong> Figure 4, the time<br />

performance of our algorithm outperforms UF-Growth,<br />

MBP and CUFP-M<strong>in</strong>e under different m<strong>in</strong>imum expected<br />

support thresholds. This is because that CUFP-M<strong>in</strong>e<br />

generates too many supersets and UF-Growth generates<br />

too many tree nodes and MBP generates many candidates,<br />

as shown <strong>in</strong> Tables 4-5. The time performance of MBP is<br />

dependent on the length of candidate itemsets, the length<br />

of transaction itemsets, and the size of dataset: the higher<br />

these values are, the lower the time performance of MBP<br />

will be. Thus the time performance of MBP decreases<br />

sharply with the decreas<strong>in</strong>g of the threshold. Figure 4<br />

<strong>in</strong>dicates that AT-M<strong>in</strong>e has achieved a better time<br />

performance; moreover, its time performance is more<br />

stable on sparse dataset.<br />

B. Evaluation on Dense Datasets<br />

In this section, we test the performance of our<br />

proposed algorithm on dense datasets connect and<br />

mushroom.<br />

TABLE VI.<br />

DETAILS ANALYSIS ON THE DATASET CONNECT<br />

η (%)<br />

trees nodes (#) candidates (#)<br />

AT-M<strong>in</strong>e UF-Growth MBP<br />

15.0 36,823 32,204,274 5,981<br />

14.0 89,118 33,739,243 6,962<br />

13.0 98,842 35,332,360 7,786<br />

12.0 116,290 48,046,639 8,565<br />

11.0 130,423 106,626,725 12,754<br />

10.0 153,913 163,809,762 19,162<br />

Tables 6-7 show the total number of tree nodes<br />

generated by AT-M<strong>in</strong>e and UF-Growth, and the number<br />

of candidate itemsets generated by MBP, on the dense<br />

datasets. As shown <strong>in</strong> Tables 6-7, UF-Growth creates too<br />

many tree nodes. For example, on the dataset connect,<br />

UF-Growth generates 163,809,762 nodes while AT-M<strong>in</strong>e<br />

generates 153,913 nodes when the m<strong>in</strong>imum expected<br />

support threshold is 10%. This is because that UF-<br />

Growth just merges the nodes that have the same item<br />

name as well as the same probability, and it is a very<br />

dense and long dataset. Thus we can <strong>in</strong>fer that our<br />

algorithm has achieved better performance than UF-<br />

Growth <strong>in</strong> terms of memory usage. MBP not only<br />

ma<strong>in</strong>ta<strong>in</strong>s candidates, but also ma<strong>in</strong>ta<strong>in</strong> the dataset while<br />

our algorithms only ma<strong>in</strong>ta<strong>in</strong> tree nodes us<strong>in</strong>g compact<br />

trees.<br />

Figure 5 shows the runn<strong>in</strong>g time of three algorithms<br />

on the dense datasets connect and mushroom. CUFP-<br />

M<strong>in</strong>e is out of memory on these two dense datasets. As<br />

shown <strong>in</strong> Figure 5, the time performance of our algorithm<br />

prevails over UF-Growth, MBP and CUFP-M<strong>in</strong>e under<br />

different m<strong>in</strong>imum expected support thresholds. This is<br />

because that CUFP-M<strong>in</strong>e generates too many supersets<br />

and UF-Growth generates too many tree nodes and MBP<br />

generates many candidates, as shown <strong>in</strong> Tables 6-7.<br />

Figure 5 shows that the time performance of AT-M<strong>in</strong>e<br />

obviously outperforms that of other algorithms on these<br />

two dense datasets; moreover, our time performance is<br />

also more stable on the dense datasets.<br />

Runn<strong>in</strong>g time (s)<br />

TABLE VII.<br />

DETAILS ANALYSIS ON THE DATASET MUSHROOM<br />

η (%)<br />

trees nodes (#) candidates(#)<br />

AT-M<strong>in</strong>e UF-Growth MBP<br />

7.0 12,041 1,011,721 1,917<br />

6.0 14,420 1,344,369 2,501<br />

5.0 16,243 1,947,609 3,460<br />

4.0 18,685 2,760,249 5,024<br />

3.0 25,884 4,125,745 8,222<br />

2.0 37,395 8,076,099 16,764<br />

10000<br />

1000<br />

100<br />

AT-M<strong>in</strong>e<br />

MBP<br />

UF-Growth<br />

CUFP-M<strong>in</strong>e (Memory Overflow)<br />

10<br />

15.0 14.5 14.0 13.5 13.0 12.5 12.0 11.5 11.0 10.5 10.0<br />

M<strong>in</strong>imum expected support threshold (%)<br />

(a)<br />

On the dataset connect<br />

© 2013 ACADEMY PUBLISHER

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!