18.01.2013 Views

S - UWSpace - University of Waterloo

S - UWSpace - University of Waterloo

S - UWSpace - University of Waterloo

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

over dynamic streams must find a “balance” between efficiency and accuracy.<br />

In some cases, some accuracy may be sacrificed to achieve higher<br />

efficiency, and vise versa. A more detailed survey <strong>of</strong> the related work on<br />

change detection will be given in Section 3.2.<br />

2.3 Survey on data stream mining<br />

Data stream mining is the process <strong>of</strong> extracting information and patterns<br />

from streaming data. It can be considered as an extension <strong>of</strong> traditional<br />

data mining and knowledge discovery from relational tables to the new<br />

type <strong>of</strong> continuous, unbounded, rapid, and time-changing data. Therefore,<br />

most <strong>of</strong> the issues identified in relational data mining must also be<br />

addressed for stream mining, with the additional difficulties introduced by<br />

new stream data as discussed in Chapter 1.<br />

2.3.1 Data refining<br />

Data refining approaches refine the data elements in the stream for the<br />

purpose <strong>of</strong> mining. These approaches do not extract complex information<br />

from streams, but a data stream after these processes will be cleaner, more<br />

compact, and better structured. Mining the stream after these processes<br />

may greatly improve the performance <strong>of</strong> the mining applications. Some<br />

information generated from the processes, such as data synopsis and distribution<br />

change alarms, can help the mining applications to adjust their<br />

parameters and strategies over time. Unlike relational data mining that<br />

process the entire data once, before the mining procedure starts, streaming<br />

data processing procedures continue during the entire life span <strong>of</strong> streams.<br />

Sampling<br />

Data stream sampling is the process <strong>of</strong> choosing a suitable representative<br />

subset from the stream <strong>of</strong> interest. The major purpose <strong>of</strong> stream sampling<br />

is to reduce the potentially infinite size <strong>of</strong> the stream to a bounded set <strong>of</strong><br />

26

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!