08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

less than n, then richness is not reasonable to demand. In the next section, we will see a<br />

possibility result to contrast with this impossibility theorem.<br />

8.12.3 Relaxing the axioms<br />

Given that no clustering algorithm can satisfy scale invariance, richness, and consistency,<br />

one might want to relax the axioms in some way. Then one gets the following results.<br />

1. Single linkage with a distance stopping condition satisfies a relaxed scale-invariance<br />

property that states that for α > 1, then f (αd) is a refinement <strong>of</strong> f(d).<br />

2. Define refinement consistency to be that shrinking distances within a cluster or<br />

expanding distances between clusters gives a refinement <strong>of</strong> the clustering. Single linkage<br />

with α stopping condition satisfies scale invariance, refinement consistency and richness<br />

except for the trivial clustering <strong>of</strong> all singletons.<br />

8.12.4 A Satisfiable Set <strong>of</strong> Axioms<br />

In this section, we propose a different set <strong>of</strong> axioms that are reasonable for distances<br />

between points in Euclidean space and show that the clustering measure, the sum <strong>of</strong><br />

squared distances between all pairs <strong>of</strong> points in the same cluster, slightly modified, is consistent<br />

with the new axioms. We assume through the section that points are in Euclidean<br />

d-space. Our three new axioms follow.<br />

We say that a clustering algorithm satisfies the consistency condition if, for the clustering<br />

produced by the algorithm on a set <strong>of</strong> points, moving a point so that its distance<br />

to any point in its own cluster is not increased and its distance to any point in a different<br />

cluster is not decreased, then the algorithm returns the same clustering after the move.<br />

Remark: Although it is not needed in the sequel, it is easy to see that for an infinitesimal<br />

perturbation dx <strong>of</strong> x, the perturbation is consistent if and only if each point in the cluster<br />

containing x lies in the half space through x with dx as the normal and each point in a<br />

different cluster lies in the other half space.<br />

An algorithm is scale-invariant if multiplying all distances by a positive constant does<br />

not change the clustering returned.<br />

An algorithm has the richness property if for any set K <strong>of</strong> k distinct points in the<br />

ambient space, there is some placement <strong>of</strong> a set S <strong>of</strong> n points to be clustered so that the<br />

algorithm returns a clustering with the points in K as centers. So there are k clusters,<br />

each cluster consisting <strong>of</strong> all points <strong>of</strong> S closest to one particular point <strong>of</strong> K.<br />

293

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!