29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Appendix B. Experiments on clustering indeterminacies<br />

Data representation<br />

CAL500<br />

Data set<br />

Corel InternetAds IsoLetters<br />

MM 28 28 28 28<br />

Baseline M1 28 28 28 28<br />

M2 28 28 28 28<br />

MM 504 420 308 532<br />

PCA M1 280 196 308 196<br />

M2 140 224 392 532<br />

MM 504 420 308 532<br />

ICA M1 280 196 308 196<br />

M2 140 224 392 532<br />

MM – 420 – 532<br />

NMF M1 – 196 – 196<br />

M2 – 224 – 532<br />

MM 504 420 308 532<br />

RP M1 280 196 308 196<br />

M2 140 224 392 532<br />

Table B.3: Number of individual clusterings per data representation on each multimodal<br />

data set, where MM, M1 and M2 stand for multimodal, mode #1 and mode #2, respectively.<br />

B.2.1 CAL500 data set<br />

The φ (NMI) histograms presented in figure B.13 summarize the clustering results obtained by<br />

running the aforementioned twenty-eight algorithms on each type of object representation<br />

for each one of the two modalities, and for the multimodal representations as well.<br />

If the histograms are compared representationwise, we observe that all representations<br />

yield clustering solutions whose quality spans over similarly wide ranges below φ (NMI) =0.5.<br />

For a given modality, there exists no clearly superior object representation.<br />

However, if the histograms are compared across the modalities, it can be observed that<br />

better results are obtained when clustering is conducted on the audio modality of this data<br />

set, regardless of the type of representation employed. Moreover, the multimodal data<br />

representation seems to yield intermediate quality clustering results (i.e. slightly better<br />

than clustering on text only, but worse than clustering solely on audio), which reveals that<br />

the early fusion of acoustic and textual features is not beneficial in this case.<br />

B.2.2 Corel data set<br />

Figure B.14 presents the φ (NMI) histograms corresponding to the multimodal and unimodal<br />

clustering of the captioned images of the Corel data set.<br />

As regards the comparison across object representations, it can be observed that, specially<br />

for the image and multimodal modalities, the RP representation offers a large amount<br />

of good clustering solutions, whereas the quality of the clusterings obtained on the remaining<br />

representations is scattered over a wide range of φ (NMI) values.<br />

If the clustering results obtained on the two modalities are compared, we can see that<br />

the image modality is the one yielding the best clustering results (up to φ (NMI) =0.68),<br />

243

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!