29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1.5. Motivation and contributions of the thesis<br />

Obj Object<br />

representation<br />

Multimedia<br />

data set R df D df df ,<br />

X Clustering E<br />

df df A<br />

( hard / soft)<br />

Flat/<br />

hi hierarchical hi l<br />

(serial/parallel)<br />

consensus architecture<br />

( hard / soft)<br />

<br />

c<br />

or<br />

<br />

c<br />

Consensus<br />

final<br />

c<br />

self- or<br />

refining final<br />

c<br />

Figure 1.6: Block diagram of the robust multimodal clustering system based on self-refining<br />

hierarchical consensus architectures.<br />

– the construction of multimodal cluster ensembles and the application of self-refining<br />

hierarchical consensus architectures for robust multimodal clustering (see chapter 5).<br />

– consensus functions based on voting strategies for combining fuzzy partitions contained<br />

in soft cluster ensembles —see chapter 6.<br />

These contributions can be articulated in a unitary proposal for robust multimodal<br />

clustering based on cluster ensembles, a block diagram of which is shown in figure 1.6. The<br />

procedure for deriving the partition of a multimodal data collection X accordingtoour<br />

proposal goes as follows: firstly, multiple representations of the objects contained in X are<br />

created by the application of a set of representational and dimensional diversity factors<br />

provided by the user (denoted as dfR and dfD in figure 1.6). Next, a set of either hard or<br />

soft clustering algorithms (referred to as the algorithmic diversity factor dfA) are applied on<br />

the distinct object representations obtained from the previous step, giving rise to a set of<br />

clusterings compiled in the cluster ensemble E. Notice that, up to this point, the only choices<br />

made by the user refer to the object representation techniques and clustering algorithms<br />

employed for creating the ensemble. As mentioned earlier, the user is encouraged to employ<br />

the widest possible range of diversity factors, thus creating maximally diverse clusterings<br />

so as to break free from the indeterminacies inherent to clustering. The obviously high<br />

computational cost associated to this cluster ensemble generation strategy can somehow be<br />

mitigated considering it is a highly parallelizable process (Hore, Hall, and Goldgof, 2006).<br />

Subsequently, the process for deriving the partition of the data set X upon the cluster<br />

ensemble E starts by applying a consensus clustering procedure. This can either be<br />

conducted according to a flat or a hierarchical consensus architecture, a decision that is<br />

automatically made by the system based on the characteristics of the data set X, the cluster<br />

ensemble E and the consensus function F employed for combining the clusterings in E<br />

—which is selected by the user. In case a hierarchical consensus architecture is employed,<br />

an additional decision (also made with no user supervision) is the one related to its serial or<br />

parallel execution, which ultimately depends on the availability of computational resources.<br />

As a result, a consensus clustering solution is obtained, which can either be represented<br />

by a consensus label vector λc or a consensus clustering matrix Λc, depending on whether<br />

a crisp or a fuzzy clustering approach is taken. Subsequently, this consensus clustering is<br />

subjected to an almost fully autonomous self-refining procedure, which requires the user<br />

to specify a percentage threshold (denoted by symbol ‘%’ in figure 1.6). Finally, the final<br />

partition of the data set X is obtained, denoted as λ final<br />

c (or Λfinal c in the fuzzy case).<br />

Before proceeding with the description of our proposals, the next chapter presents an<br />

overview of related work in the area of cluster ensembles.<br />

26<br />

%

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!