29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.1. Description of the consensus self-refining procedure<br />

computational efficiency criteria solely. While put forward in a hard clustering scenario, this<br />

proposal could be exported to a fuzzy context by introducing several minor modifications.<br />

This chapter is organized as follows: section 4.1 describes the proposed self-refining consensus<br />

procedure. Next, several experiments regarding the application of the self-refining<br />

process on the consensus clustering solutions output by the three types of consensus architectures<br />

described in the previous chapter are presented in section 4.2. An alternative<br />

procedure based on cluster ensemble component selection for obtaining refined consensus<br />

clustering solutions upon a given cluster ensemble is described in section 4.3, and finally,<br />

the discussion and conclusions presented in section 4.4 put the end to this chapter.<br />

4.1 Description of the consensus self-refining procedure<br />

The proposed approach for refining the quality of the consensus clustering solution λc is<br />

pretty straightforward, and it is based on the notion of average normalized mutual information<br />

φ (ANMI) (Strehl and Ghosh, 2002) between a cluster ensemble E and a consensus<br />

clustering solution λc built upon it, as defined by equation (4.1).<br />

φ (ANMI) (E, λc) = 1<br />

l<br />

l<br />

φ (NMI) (λi, λc) (4.1)<br />

where l represents the number of partitions (or components) contained in the cluster ensemble<br />

E and λi is the ith of these components.<br />

The higher φ (ANMI) (E, λc), the more information the consensus clustering solution λc<br />

shares with all the clusterings in E, thus it can be considered to capture the information<br />

contained in the ensemble to a larger extent. In fact, the computation of the φ (ANMI)<br />

between a given cluster ensemble E and a set of consensus clustering solutions obtained<br />

by means of different consensus functions is proposed in (Strehl and Ghosh, 2002) as a<br />

means for choosing among them in a unsupervised fashion, giving rise to what is called a<br />

supraconsensus function.<br />

It is important to notice that each term of the summation in equation (4.1) measures<br />

the resemblance between each cluster ensemble component and λc. As a consequence, those<br />

cluster ensemble components more similar to the consensus clustering solution contribute<br />

in a greater proportion to the sum in equation (4.1).<br />

Assuming that the consensus function F applied for obtaining the consensus clustering<br />

solution λc delivers a moderately good performance –in the sense that the quality of λc will<br />

be reasonably higher than the one of the poorest components of the cluster ensemble E–,<br />

then the normalized mutual information (φ (NMI) ) between λc and each cluster ensemble<br />

component λi (∀i ∈ [1,...,l]), gives an approximate measure of the quality of the latter<br />

(Fern and Lin, 2008).<br />

Allowing for this fact, the proposed consensus self-refining procedure is based on ranking<br />

the l components of the cluster ensemble E accordingtotheirφ (NMI) with respect the<br />

consensus clustering solution λc. The results of this sorting process is represented by means<br />

of an ordered list Oφ (NMI) = {λ<br />

φ (NMI) 1, λ φ (NMI)2,...λφ (NMI)l}, the subindices of which refer<br />

to the aforementioned φ (NMI) based ranking, i.e. λ<br />

φ (NMI) 1 denotes the cluster ensemble<br />

component that attains the highest normalized mutual information with respect to λc,<br />

110<br />

i=1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!