18.01.2013 Views

S - UWSpace - University of Waterloo

S - UWSpace - University of Waterloo

S - UWSpace - University of Waterloo

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

to be fine-tuned to improve performance efficiency and accuracy. For<br />

example, in an association rule mining application, a significant frequency<br />

change in some <strong>of</strong> the itemsets may indicate a distribution change. In<br />

a stream classification application, a dramatic change <strong>of</strong> the size <strong>of</strong> one<br />

class may be a signal <strong>of</strong> the impending arrival <strong>of</strong> a new distribution. Since<br />

general approaches are not directly connected to the mining task, they<br />

cannot take advantage <strong>of</strong> such feedback information. Hence, for many<br />

streaming applications that only perform one specific mining task, taskspecific<br />

techniques with outstanding performance may be more useful than<br />

general approaches.<br />

There is a large number <strong>of</strong> task-specific techniques developed for each<br />

<strong>of</strong> the stream mining tasks. However, many <strong>of</strong> these techniques have<br />

problems in meeting all <strong>of</strong> the common requirements for mining streaming<br />

data: ability to process large volumes <strong>of</strong> data in real time, low memory<br />

usage, and ability to cope with time-changing data. This thesis looks<br />

into one <strong>of</strong> the most important mining tasks, which is frequency counting.<br />

An algorithm is proposed that meet all three common requirements and<br />

can out perform existing techniques for mining frequent itemsets in timechanging<br />

data streams.<br />

1.3.2 Multi-dimensional streams<br />

Most <strong>of</strong> the stream mining techniques for dynamic data streams only work<br />

over single-dimensional data, i.e., they assume there is only one attribute<br />

<strong>of</strong> interest in the stream. However, the data collected in a data stream<br />

from real-world applications usually contain several attributes. In practice,<br />

many stream processing applications need to take more than one<br />

attribute into consideration. For example, in modern quality control,<br />

several quality characteristics are usually monitored simultaneously. In<br />

e-commerce, where each data element in the stream is an order placed<br />

by customers, a positive linear correlation between items in the order<br />

may indicate similar purchase patterns. There has been little attention<br />

paid to the problem <strong>of</strong> extending change detection and mining to multidimensional<br />

streams.<br />

13

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!