30.01.2014 Views

Annual Report 2010 - Fachgruppe Informatik an der RWTH Aachen ...

Annual Report 2010 - Fachgruppe Informatik an der RWTH Aachen ...

Annual Report 2010 - Fachgruppe Informatik an der RWTH Aachen ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Outlier R<strong>an</strong>king in High Dimensional Data<br />

Emm<strong>an</strong>uel Müller<br />

Detecting outliers is <strong>an</strong> import<strong>an</strong>t task for m<strong>an</strong>y applications including fraud detection or<br />

consistency validation in real world data. Particularly in the presence of uncertain or<br />

imprecise data, similar objects regularly deviate in their attribute values. Thus, the notion of<br />

outliers has to be defined carefully. When consi<strong>der</strong>ing outlier detection as a task which is<br />

complementary to clustering, binary decisions whether <strong>an</strong> object is regarded to be <strong>an</strong> outlier<br />

or not seem to be near at h<strong>an</strong>d. However, for high dimensional data objects may belong to<br />

different clusters in different subspaces. More fine-grained concepts to define outliers are<br />

therefore dem<strong>an</strong>ded. By our new outlier r<strong>an</strong>king approaches, we address outlier detection in<br />

subspaces of high dimensional data. We propose novel scoring functions that provide<br />

consistent models for r<strong>an</strong>king outliers in the presence of object deviation in arbitrary subspace<br />

projections.<br />

.<br />

Figure 7: Outliers hidden in arbitrary subsets of the attributes<br />

Clustering in Attributed Graphs<br />

Steph<strong>an</strong> Günnem<strong>an</strong>n, Brigitte Boden<br />

The aim of data mining approaches is to extract novel knowledge from large sets of data.<br />

These data c<strong>an</strong> be represented in different m<strong>an</strong>ners: high-dimensional attribute data to<br />

characterize single objects <strong>an</strong>d graph data to represent the relations between objects. While<br />

the first data type is <strong>an</strong>alyzed by subspace clustering approaches, the second one is <strong>an</strong>alyzed<br />

by dense subgraph clustering methods. For m<strong>an</strong>y applications both types of data (attributes<br />

<strong>an</strong>d relationships) are available <strong>an</strong>d c<strong>an</strong> be modeled as graphs with attributed nodes.<br />

Analyzing both data sources simult<strong>an</strong>eously c<strong>an</strong> increase the quality of mining methods.<br />

However, most clustering approaches deal only with one of these data types. In our works, we<br />

develop novel methods that use both data types simult<strong>an</strong>eously <strong>an</strong>d thereby obtain better<br />

clustering results.<br />

309

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!