29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 4. Self-refining consensus architectures<br />

λ φ (NMI) 2 is the component with the second highest φ(NMI) respect the consensus clustering<br />

solution, and so on.<br />

Subsequently, a percentage p of the highest ranked l individual partitions is selected so<br />

as to form a select cluster ensemble Ep –see equation (4.2)–, upon which a refined consensus<br />

clustering solution λcp will be derived through the application of the consensus function F.<br />

Notice that, the larger the percentage p, the more components are included in the select<br />

cluster ensemble Ep —ultimately, Ep = E if p = 100.<br />

⎛<br />

⎜<br />

Ep = ⎜<br />

⎝<br />

λ<br />

φ (NMI) 1<br />

λ<br />

φ (NMI) 2<br />

.<br />

λ p<br />

φ (NMI) ⌊<br />

100 l⌉<br />

⎞<br />

⎟<br />

⎠<br />

(4.2)<br />

Following the rationale of the proposed self-refining procedure, it can be assumed that,<br />

with a high probability, the worst components of the initial cluster ensemble E will have been<br />

excluded from Ep. Therefore, the self-refined consensus clustering solution λcp obtained<br />

through the application of the consensus function F on Ep will probably improve the initial<br />

consensus labeling λc, as we will experimentally demonstrate in later sections.<br />

Finally, three additional remarks so as to conclude this description: firstly, notice that<br />

the consensus process run on the select cluster ensemble Ep can be conducted following<br />

either a flat or a hierarchical approach, depending on the consensus function applied, the<br />

characteristics of the data set and the value of p, which, as aforementioned, determines<br />

thesizeofEp. As reported in the previous chapter, the proposed running time estimation<br />

methodologies constitute an efficient means for deciding whether the self-refined consensus<br />

solution λcp should be derived following either a flat or a hierarhical consensus architecture.<br />

Secondly, notice that the proposed consensus self-refining process is entirely automatic<br />

and unsupervised (hence its name), as it is solely based on the cluster ensemble E, the<br />

consensus clustering solution λc and a similarity measure –φ (NMI) – that requires no external<br />

knowledge for its computation. The only user-driven decision refers to the selection of the<br />

value of the percentage p used for creating the select cluster ensemble Ep.<br />

And the third remark deals just with this latter issue. Quite obviously, the selection<br />

of the percentage p is made blindly. So as to avoid the negative consequences of choosing<br />

a suboptimal value of p at random, our consensus self-refining proposal is completed by<br />

the (possibly parallelized) creation of multiple refined consensus clustering solutions using<br />

P distinct percentage values p = {p1,p2,...,pP }, i.e. λc p i ,fori = {1, 2,...,P}, selecting<br />

as the final refined consensus clustering solution λ final<br />

c<br />

the one maximizing φ (ANMI) with<br />

respect to the cluster ensemble E, as defined by equation (4.3) —in fact, this unsupervised a<br />

posteriori clustering selection process is equivalent to the supraconsensus function proposed<br />

in (Strehl and Ghosh, 2002).<br />

λ final<br />

c<br />

<br />

<br />

= λ max φ (ANMI) <br />

(E, λ) , λ ∈{λc, λcp1 ,...,λcpP } (4.3)<br />

For summarization purposes, table 4.1 describes the steps that constitute the proposed<br />

consensus self-refining procedure.<br />

111

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!