30.01.2014 Views

Annual Report 2010 - Fachgruppe Informatik an der RWTH Aachen ...

Annual Report 2010 - Fachgruppe Informatik an der RWTH Aachen ...

Annual Report 2010 - Fachgruppe Informatik an der RWTH Aachen ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

data such that objects within groups are similar while objects in different groups are<br />

dissimilar. In scenarios with m<strong>an</strong>y attributes or with noise, clusters are often hidden in<br />

subspaces of the data <strong>an</strong>d do not show up in the full dimensional space. For these<br />

applications, subspace clustering methods aim at detecting clusters in <strong>an</strong>y subspace.<br />

We propose new subspace clustering models which remove redund<strong>an</strong>t information <strong>an</strong>d ensure<br />

the comparability of different clusters to enh<strong>an</strong>ce the quality <strong>an</strong>d interpretability of the<br />

clustering results. At the same time the efficiency of the clustering process is guar<strong>an</strong>teed by<br />

the development of new algorithms. Additionally we focus our research on the evaluation <strong>an</strong>d<br />

visualization of patterns to benefit from hum<strong>an</strong> cognitive abilities for the knowledge<br />

generation.<br />

Figure 4: Multiple hidden concepts in subspaces of a high dimensional database<br />

OpenSubspace: An Open Source Framework for Evaluation <strong>an</strong>d Exploration of<br />

Subspace Clustering Algorithms in WEKA<br />

Emm<strong>an</strong>uel Müller, Steph<strong>an</strong> Günnem<strong>an</strong>n<br />

Subspace clustering <strong>an</strong>d projected clustering are recent research areas for clustering in high<br />

dimensional spaces. As the field is rather young, there is a lack of comparative studies on the<br />

adv<strong>an</strong>tages <strong>an</strong>d disadv<strong>an</strong>tages of the different algorithms. Part of the un<strong>der</strong>lying problem is<br />

the lack of available open source implementations that could be used by researchers to<br />

un<strong>der</strong>st<strong>an</strong>d, compare, <strong>an</strong>d extend subspace <strong>an</strong>d projected clustering algorithms. We propose<br />

OpenSubspace, <strong>an</strong> open source framework that meets these requirements. OpenSubspace<br />

integrates state-of-the-art perform<strong>an</strong>ce measures <strong>an</strong>d visualization techniques to foster<br />

research in subspace <strong>an</strong>d projected clustering. We currently use this framework both in our<br />

lectures for teaching <strong>an</strong>d in our research projects for experiment evaluation. Our recent<br />

evaluation study published at VLDB 2009 is based on this framework. For further details<br />

please refer to our paper <strong>an</strong>d to the supplementary material to this evaluation study. There,<br />

you c<strong>an</strong> also find further details about possible parameterization of the un<strong>der</strong>lying algorithms<br />

for running experiments. The system is available at<br />

http://dme.rwth-aachen.de/OpenSubspace/.<br />

306

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!