29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 6<br />

Voting based consensus functions<br />

for soft cluster ensembles<br />

As outlined in section 1.2.1, clustering algorithms can be bisected into two large categories,<br />

depending on the number of clusters every object is assigned to. On one hand, hard (or<br />

crisp) clustering algorithms assign each object to a single cluster. For this reason, the<br />

result of applying a hard clustering process on a data set containing n objects is usually<br />

represented as a n-dimensional integer-valued row vector of labels (or labeling) λ, each<br />

component of which identifies to which of the the k clusters each object is assigned to, that<br />

is:<br />

λ =[λ1 λ2 ... λn] (6.1)<br />

where λi ∈ [1,k], ∀i ∈ [1,n].<br />

On the other hand, soft (or fuzzy) clustering algorithms allow the objects to belong to<br />

all clusters to a certain extent. Thus, the results of their application for partitioning a data<br />

set containing n objects into k clusters is usually represented by means of a k×n real-valued<br />

clustering matrix Λ –see equation (6.2)–, the (i,j)th entry of which indicates the degree of<br />

association between the jth object and the ith cluster.<br />

⎛<br />

⎜<br />

Λ = ⎜<br />

⎝<br />

λ11 λ12 ... λ1n<br />

λ21 λ22 ... λ2n<br />

.<br />

. ..<br />

λk1 λk2 ... λkn<br />

.<br />

⎞<br />

⎟<br />

⎠<br />

(6.2)<br />

where λij ∈ R, ∀i ∈ [1,k]and∀j ∈ [1,n].<br />

For illustration purposes, we resort to the toy clustering example presented in chapter<br />

2, in which clustering is conducted on the two-dimensional artificial data set presented in<br />

figure 6.1. This toy data collection contains n = 9 objects, and the desired number of<br />

clusters k is set to 3.<br />

If the classic k-means hard clustering algorithm is applied on this data set, the label<br />

vector presented in equation (6.3) is obtained. Recall that the labels λi contained in λ<br />

are purely symbolic (i.e. the labelings λ =[111222333]orλ =[333222111]<br />

represent exactly the same partition of the data).<br />

163

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!