Real-time feature extraction from video stream data for stream ...

Real-time feature extraction from video stream data for stream ...

3.4. Learning on Streams

3.4. Learning on Streams

All the above mentioned machine learning algorithms are designed to work on a fixed

amount of data. This data is assumed to be stored in files or data bases and all data can

be accessed at any time. This, for example, enables a decision tree learner, as described

in section 3.2.2, to evaluate, which attribute splits the data best. For many years this

traditional batch data processing has been sufficient. But as the amount of data rapidly

increases, more and more data gets generated continuously, and consumers are interested

in reacting to data drifts in real-time, these traditional approaches reach their limits.

Hence learning on streams has become an upcoming part of machine learning in the last

years. Examples for such environments include sensor networks or web log analysis and

computer network traffic supervision for security reasons [Gaber et al., 2005].

The limitations given by operating on streaming data lead to the following requirements

for streaming algorithms [Bockermann and Blom, 2012b]:

C1 It is necessary to continuously process single items or small batches of data,

C2 the algorithm uses only a single pass over the data,

C3 the algorithm may consume only limited resources (memory, time) , and

C4 the algorithm provides anytime services, which means that models and statistics

have to be deliverable at any time.

In recent years, many approaches have been developed, solving most machine learning

task on data streams. These algorithms include the training of classifiers on streams

[Domingos and Hulten, 2000], stream clustering algorithms [O’Callaghan et al., 2002]

[Aggarwal et al., 2003] (example given Birch [Zhang et al., 1996] or D-Stream

[Chen and Tu, 2007]), quantile computation [Arasu and Manku, 2004], and approximate

or lossy counting algorithms on streams (example given Lossy Counting

[Manku and Motwani, 2002], Count(Min) Sketch [Charikar et al., 2004]), just to mention

a very few. For further literature and explanations to the above mentioned algorithms

I recommend the final report of the Projektgruppe 542, that took place at TU

Dortmund University in 2010 [Balke et al., 2010].

The video data I am coping with in this thesis is provided in a streaming manner as

well. The video data stream is infinite and the amount of data makes it impossible to

store everything. Hence we do not have random access to the data and algorithms can

only use a single pass over it. On these grounds it suggested itself to view IP-TV video

streams as streaming data. Nevertheless only a very few of the video segmentation and

tagging approaches described in chapter 2 make use of streaming algorithms. The reason

is that there is most often no need for online learning. For tasks like cut detection or

anchorshot detection, it is sufficient to build models offline and only apply them on the

incoming video stream. Hence we are working in a batch learning but stream application

environment and do not focus on the aspect of stream mining in too much detail.


More magazines by this user
Similar magazines