29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 1. Framework of the thesis<br />

they should be integrated in the knowledge extraction process.<br />

In the multimodal clustering context, the field that has motivated the largest amount<br />

of research efforts is the clustering of web image search results based not only on visual<br />

features, but also using the surrounding text and also link information—as organizing the<br />

results into different semantic clusters might facilitates users browsing (Cai et al., 2004).<br />

In that work, each image returned by the search engine is represented using three kinds<br />

of information: visual information, textual information and link information (text and link<br />

data are recovered from the surroundings of the image). The rationale of this approach is<br />

based on the fact that the textual and link based representations can reflect the semantic<br />

relationship of images better than visual features. The proposed system implements a<br />

two level clustering algorithm: in the first level, clustering is conducted using the textual<br />

and link representation of images (separately or jointly). In the second level, clustering<br />

is conducted on the images assigned to each cluster resulting from the previous stage. In<br />

this case, low level visual features are employed to re-organize the images in the first level<br />

clusters, so as to group visually similar images to facilitate users browsing.<br />

A second paper dealing with web image search results clustering was (Gao et al., 2005).<br />

In that work, a tripartite graph was used to model the relations among low-level features,<br />

images and their surrounding texts. Thus, the method was formulated as a constrained<br />

multiobjective optimization problem, which can be efficiently solved by semi-definite programming.<br />

In a similar context, clustering was applied for image sense discrimination for web images<br />

retrieved from ambiguous keywords (Loeff, Ovesdotter-Alm, and Forsyth, 2006). Its goal<br />

was presenting the image search results in semantically sensible clusters for improved image<br />

browsing. To do so, spectral clustering was applied on multimodal features: simple local and<br />

global image features, and a bag of words representation of the text in the embedding web<br />

page. Multimodal fusion was achieved by combining pairwise object similarities measured<br />

on both image and textual features in the graph affinity matrix of the spectral clustering<br />

algorithm.<br />

Finally, the notion that each of the multiple modalities in a multimedia collection contributes<br />

its own perspective to the collections organization was the driving force behind the<br />

proposal in (Bekkerman and Jeon, 2007). That work presents the Comraf* model, a lightweight<br />

version of combinatorial Markov random fields. In Comraf*, multimodal clustering<br />

is faced as the problem of simultaneously constructing a partition of each data modality.<br />

By clustering modalities simultaneously, the statistical sparseness of the data representation<br />

can be overcome, obtaining a dense and smooth joint distribution of the modalities.<br />

However, not every modality has to be clustered, as long as the so-called target modality<br />

is.<br />

The reader interested in multimedia indexing and retrieval is referred to the recent<br />

and complete survey of (Chen, 2006), although it is mainly focused on text plus image<br />

modalities.<br />

1.4 Clustering indeterminacies<br />

As mentioned at the end of section 1.1, the accomplishment of a knowledge discovery process<br />

requires making several critical decisions at each of its stages, which may have to be re-<br />

19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!