18.01.2013 Views

S - UWSpace - University of Waterloo

S - UWSpace - University of Waterloo

S - UWSpace - University of Waterloo

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1.1.4 Distribution changes in data streams<br />

In traditional DBMSs, it is reasonable to assume that the data set is static,<br />

i.e., the data elements are samples from a static distribution. However,<br />

this does not hold for many real-world data stream applications. Typically,<br />

fast data streams are created by continuous activity over long periods <strong>of</strong><br />

time. It is natural that the underlying phenomena can change over time<br />

and, thus, the distribution <strong>of</strong> the values <strong>of</strong> the data in the stream may<br />

show significant changes over time. This is referred to as data evolution,<br />

dynamic stream, time-changing data, or concept-drifting data [3, 69, 78,<br />

157].<br />

The distribution change in the stream can be either a slow and gradual<br />

long term process (we refer to this as distribution drift in the rest <strong>of</strong> the<br />

thesis), or a significant sudden change (we refer to this as distribution<br />

shift). Both types <strong>of</strong> changes can be commonly observed in many streambased<br />

applications.<br />

Example 1 (distribution drift). Scientists have been using temperature<br />

and precipitation detecting sensors to monitor annual hydrologic processes<br />

[33]. A study <strong>of</strong> the climate trends in California shows that, due to the<br />

increasing concentrations <strong>of</strong> atmospheric carbon dioxide, the mean value<br />

<strong>of</strong> the annual run<strong>of</strong>f shows a 37% decrease over a 100 year period [10].<br />

Example 2 (distribution shift). To gain a greater share <strong>of</strong> consumer<br />

expenditures, a special “loyalty program” is proposed by a retail brand.<br />

By monitoring the daily transactions <strong>of</strong> multiple outlets, a 14% increase<br />

in gross sales is observed immediately after the introduction <strong>of</strong> the loyalty<br />

program, indicating a positive impact <strong>of</strong> this proposed program [128].<br />

Distribution changes over data streams have significant impact on most<br />

<strong>of</strong> the DSMSs. A stream processing model built previously may not be<br />

efficient or accurate after the data evolve, since some characteristics observed<br />

earlier in the data will no longer hold. Hence, if a distribution<br />

change occurs in the stream, it is important for the user to be notified <strong>of</strong><br />

this change. The DSMS needs to be adjusted to reflect this change and<br />

new results should be generated for the data under the new distribution.<br />

There is a considerable amount <strong>of</strong> work that focus on distribution<br />

5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!