10.07.2015 Views

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.5 Markov Models 593.4.4 Graph based MethodsGraph based semi-supervised approaches commonly conduct their learning processes in agraph, where each node represents one data (or example) in the labeled <strong>and</strong> unlabeled datasets.The edge betw<strong>ee</strong>n two nodes indicates the similarity of the nodes. It has b<strong>ee</strong>n until recentlythat the semi-supervised researchers have begun to shift toward developing strategies thatutilize the graph structure obtained by capturing similarities betw<strong>ee</strong>n the labeled <strong>and</strong> unlabeleddata. In this section, we will make a brief introduction on this issue. Refer to [278] for amore comprehensive survey. S<strong>tud</strong>ies on other related issues of graph-based semi-supervisedlearning can be also found in [59, 277], to name a few.Blum et al. may propose the first graph based approach for semi-supervised learning.The intuitive idea is that the graph is minimum cut <strong>and</strong> then the nodes in each remainingconnecting component should have the same label. The same author extended the work [36]by introducing a novel strategy called majority voting to conduct the minimum cut. Both ofthese works assume that prediction function on the unlabeled data is discrete.To generalize the assumption, Zhu et al. [279] proposed to tackle the issue when the predictionfunction is continuous. They found that the function with the lowest energy has theharmonic property by conducting experiments on graph with Gaussian r<strong>and</strong>om fields. Therefore,the authors proposed a label propagation approach to conduct semi-supervised learningon the graph with the harmonic property. Other interesting <strong>and</strong> important works on graphbased semi-supervised learning include [26, 27, 275, 229].Although the research on proposing effective approaches on graphs are important forsemi-supervised learning, it is also very critical to build the graph that can best match thedata. There are some distinct methods introduced [278]:• knowledge from experts. In [19], the authors used domain knowledge to construct graphsfor video surveillance. Without the knowledge, the graphs can not be built appropriately.• Neighbor graphs. In [4], Carreira-Perpinan <strong>and</strong> Zemel constructed graphs from multipleminimum spanning tr<strong>ee</strong>s by conducting perturbation <strong>and</strong> edge removal. The difficulty isthat how to choose the b<strong>and</strong>width of the Gaussian. Zhang <strong>and</strong> L<strong>ee</strong> (2006) proposed astrategy to tune the b<strong>and</strong>width for each feature dimension.• Local fit. In [272], the authors conducted LLE-like (i.e., locally linear embedding) operationson the data points while letting the weights of LLE be non-negative, which will beused as graph weights. Hein <strong>and</strong> Maier [115] proposed that a preprocessing is necessarythat the noisy data should be removed first. Then the graph can be constructed from theremaining better data. The accuracy is therefore can be improved.3.5 Markov ModelsMarkov model is one of the well-known approaches because of its broad applications, suchas sp<strong>ee</strong>ch recognition, financial forecasting, gene prediction, cryptanalysis, natural languageprocessing, data compression, <strong>and</strong> so forth. The common point of these applications is thattheir goal is to predict or recover a data sequence that is not immediately observable. For aconcrete example, i.e., stock price prediction, people always want to “guess” the trend of somestock, i.e., whether its price will go up in the next day.In fact, Markov model may be the simplest <strong>and</strong> most effective mathematical model foranalyzing such kind of time-series processes, while the class of Markov model makes it richenough to tackle all of these applications.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!