07.01.2015 Views

An efficient algorithm for maximal margin clustering - ResearchGate

An efficient algorithm for maximal margin clustering - ResearchGate

An efficient algorithm for maximal margin clustering - ResearchGate

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

J Glob Optim<br />

logic, the <strong>maximal</strong> <strong>margin</strong> <strong>clustering</strong> with a fixed number of features can also be cast as an<br />

approximate solution to the original <strong>maximal</strong> <strong>margin</strong> <strong>clustering</strong> problem and be used in a<br />

semi-supervised learning setting. Later, we investigate an <strong>algorithm</strong> based on this idea and<br />

analyze its convergence properties.<br />

The rest of the paper is organized as follows. In Sect. 2, we discuss the complexity of<br />

the <strong>maximal</strong> <strong>margin</strong> problem and present an <strong>algorithm</strong>. Then, we analyze how the idea of<br />

<strong>maximal</strong> <strong>margin</strong> separation can be extended to some existing graph-cut models <strong>for</strong> <strong>clustering</strong>.<br />

In Sect. 3, we discuss the <strong>maximal</strong> <strong>margin</strong> <strong>clustering</strong> problem with a fixed number<br />

of features, which can be viewed as a combination of the <strong>maximal</strong> <strong>margin</strong> <strong>clustering</strong> and<br />

feature selection. In Sect. 4, we propose a new successive procedure to approximately solve<br />

the original hard <strong>maximal</strong> <strong>margin</strong> <strong>clustering</strong> problem and study its convergence properties.<br />

We also discuss how to deal with the semi-supervised learning case. Experimental results<br />

based on these ideas are reported in Sect. 5 and we conclude the paper in Sect. 6.<br />

2 Complexity of hard <strong>maximal</strong> <strong>margin</strong> <strong>clustering</strong><br />

In this section, we investigate the complexity of the hard <strong>maximal</strong> <strong>margin</strong> <strong>clustering</strong>. Throughout<br />

the paper, we make the following assumption:<br />

Assumption 1 [General Position] All the points in the input data set V ={v 1 ,v 2 ,...,v n }<br />

are in general position, i.e., any d points in V will define precisely one hyperplane in R d .<br />

We remark that the above assumption is quite reasonable, because if the data set V does<br />

not satisfy the assumption, we may perturb the data set slightly so that the assumption holds.<br />

We can now state the following result.<br />

Theorem 1 Suppose that the input data set satisfies Assumption 1 and (w ∗ ,b ∗ ) is the global<br />

solution to problem (1). Then, there must exist a subset V ∗ ⊂ V with either d or d + 1 points<br />

such that at least one of the following two conditions hold:<br />

(1) The hyperplane defined by the points in V ∗ will create the same separation as the<br />

optimal separating hyperplane.<br />

(2) The optimal separating hyperplane is also the global solution of problem (1) where V<br />

is replaced by V ∗ .<br />

Proof Suppose that (w ∗ ,b ∗ ) is the global solution to problem (1), we can then separate the<br />

data set V into two subsets V 1 and V 2 such that<br />

V 1 ={v ∈ V : (w ∗ ) T v + b ∗ > 0},<br />

V 2 ={v ∈ V : (w ∗ ) T v + b ∗ < 0}. (3)<br />

Further, by the definition of <strong>maximal</strong> <strong>margin</strong>, there exist two subsets Ṽ 1 , Ṽ 2 such that ∀ v 1 ,v 2 :<br />

v 1 ∈ Ṽ 1 ⊂ V 1 ,v 2 ∈ Ṽ 2 ⊂ V 2 ,<br />

(<br />

(w ∗ ) T v 1 + b ∗ =− (w ∗ ) T v 2 + b ∗) = 1. (4)<br />

For a given data set V, let|V| denote the number of points in V. Since all points in V are in<br />

general position, we can claim that<br />

|Ṽ 1 |≤d, |Ṽ 2 |≤d, |Ṽ 1 |+|Ṽ 2 |≥d. (5)<br />

If there exist at least d + 1 points in Ṽ 1 ∪ Ṽ 2 , we may select d + 1 points from Ṽ 1 ∪ Ṽ 2 and<br />

these d + 1 points will define the separating hyperplane (w ∗ ) T v + b ∗ = 0 due to the general<br />

123

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!