29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.3. Selection-based self-refining<br />

4.3 Selection-based self-refining<br />

The consensus self-refining procedure proposed in section 4.1 is based on using a consensus<br />

clustering solution as a reference for computing the φ (NMI) of the cluster ensemble components,<br />

which, at the same time, constitutes the guiding principle of the creation of the select<br />

cluster ensemble Ep, upon which the self-refining process is conducted.<br />

In this section, we propose an alternative procedure for obtaining a self-refined consensus<br />

clustering solution. The only difference between this proposal and the one put forward in<br />

section 4.1 lies in the fact that the computation of the φ (NMI) of the cluster ensemble<br />

components –a step prior to the creation of the select cluster ensemble Ep– is not referred<br />

to a previously obtained consensus labeling λc, but to one of the components of the cluster<br />

ensemble E.<br />

By doing so, we aim to devise an alternative means for obtaining a high quality clustering<br />

from a large cluster ensemble that does not require the execution of any consensus process<br />

for obtaining the reference clustering with respect to which the cluster ensemble components<br />

are compared in terms of normalized mutual information, with the obvious computational<br />

savings it conveys.<br />

Due to the fact that this proposed method is based on selecting one of the cluster<br />

ensemble components for initiating the consensus self-refining process, we have called it<br />

selection-based self-refining, and its constituting steps are presented in table 4.11.<br />

In the next paragraphs, we will analyze the performance of this second self-refining<br />

proposal, following the same experimental scheme employed in section 4.2. That is, we firstly<br />

review the results obtained on the Zoo data collection at a qualitative level (the analysis<br />

corresponding to the remaining data collections is presented in appendix D.2), followed by<br />

a quantitative study of the quality of the self-refined consensus clustering solutions across<br />

all the experiments conducted.<br />

With the objective of making the results of selection-based consensus self-refining comparable<br />

to those presented in the previous section, we have followed the same experimental<br />

procedure, that is: i) the experiments have been replicated for a set of self-refining percentage<br />

values in the range p ∈ [2, 90], ii) the experiments have been executed on the cluster<br />

ensembles corresponding to the highest diversity scenario.<br />

For starters, figure 4.3 depicts the boxplot charts of the φ (NMI) values corresponding to<br />

the selection-based consensus self-refining process. Each chart depicts –from left to right–<br />

the φ (NMI) values of: i) the components of the cluster ensemble E, ii) the cluster ensemble<br />

component with maximum φ (ANMI) with respect to the whole ensemble, i.e. λref, andiii)<br />

the self-refined consensus clusterings λcpi obtained upon select cluster ensembles created<br />

using percentages pi = {2, 5, 10, 15, 20, 30, 40, 50, 60, 75, 90}.<br />

Firstly, we can notice the high quality of the selected cluster ensemble component λref,<br />

the φ (NMI) of which is pretty close to the highest quality component of the cluster ensemble<br />

E. Thus, it seems that the proposed selection procedure constitutes, by itself, a fairly good<br />

approach for obtaining clustering solutions that are robust to the inherent indeterminacies<br />

of the clustering problem. When the self-refining procedure is applied on the select cluster<br />

ensemble created upon λref, distinct performances are observed. Whereas in some cases<br />

none of the self-refined clustering solutions λcpi attains higher φ (NMI) values than λref (see,<br />

for instance, figure 4.3(a)), the opposite is observed when self-refining based on the EAC and<br />

122

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!