An efficient algorithm for maximal margin clustering - ResearchGate
An efficient algorithm for maximal margin clustering - ResearchGate
An efficient algorithm for maximal margin clustering - ResearchGate
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
J Glob Optim<br />
logic, the <strong>maximal</strong> <strong>margin</strong> <strong>clustering</strong> with a fixed number of features can also be cast as an<br />
approximate solution to the original <strong>maximal</strong> <strong>margin</strong> <strong>clustering</strong> problem and be used in a<br />
semi-supervised learning setting. Later, we investigate an <strong>algorithm</strong> based on this idea and<br />
analyze its convergence properties.<br />
The rest of the paper is organized as follows. In Sect. 2, we discuss the complexity of<br />
the <strong>maximal</strong> <strong>margin</strong> problem and present an <strong>algorithm</strong>. Then, we analyze how the idea of<br />
<strong>maximal</strong> <strong>margin</strong> separation can be extended to some existing graph-cut models <strong>for</strong> <strong>clustering</strong>.<br />
In Sect. 3, we discuss the <strong>maximal</strong> <strong>margin</strong> <strong>clustering</strong> problem with a fixed number<br />
of features, which can be viewed as a combination of the <strong>maximal</strong> <strong>margin</strong> <strong>clustering</strong> and<br />
feature selection. In Sect. 4, we propose a new successive procedure to approximately solve<br />
the original hard <strong>maximal</strong> <strong>margin</strong> <strong>clustering</strong> problem and study its convergence properties.<br />
We also discuss how to deal with the semi-supervised learning case. Experimental results<br />
based on these ideas are reported in Sect. 5 and we conclude the paper in Sect. 6.<br />
2 Complexity of hard <strong>maximal</strong> <strong>margin</strong> <strong>clustering</strong><br />
In this section, we investigate the complexity of the hard <strong>maximal</strong> <strong>margin</strong> <strong>clustering</strong>. Throughout<br />
the paper, we make the following assumption:<br />
Assumption 1 [General Position] All the points in the input data set V ={v 1 ,v 2 ,...,v n }<br />
are in general position, i.e., any d points in V will define precisely one hyperplane in R d .<br />
We remark that the above assumption is quite reasonable, because if the data set V does<br />
not satisfy the assumption, we may perturb the data set slightly so that the assumption holds.<br />
We can now state the following result.<br />
Theorem 1 Suppose that the input data set satisfies Assumption 1 and (w ∗ ,b ∗ ) is the global<br />
solution to problem (1). Then, there must exist a subset V ∗ ⊂ V with either d or d + 1 points<br />
such that at least one of the following two conditions hold:<br />
(1) The hyperplane defined by the points in V ∗ will create the same separation as the<br />
optimal separating hyperplane.<br />
(2) The optimal separating hyperplane is also the global solution of problem (1) where V<br />
is replaced by V ∗ .<br />
Proof Suppose that (w ∗ ,b ∗ ) is the global solution to problem (1), we can then separate the<br />
data set V into two subsets V 1 and V 2 such that<br />
V 1 ={v ∈ V : (w ∗ ) T v + b ∗ > 0},<br />
V 2 ={v ∈ V : (w ∗ ) T v + b ∗ < 0}. (3)<br />
Further, by the definition of <strong>maximal</strong> <strong>margin</strong>, there exist two subsets Ṽ 1 , Ṽ 2 such that ∀ v 1 ,v 2 :<br />
v 1 ∈ Ṽ 1 ⊂ V 1 ,v 2 ∈ Ṽ 2 ⊂ V 2 ,<br />
(<br />
(w ∗ ) T v 1 + b ∗ =− (w ∗ ) T v 2 + b ∗) = 1. (4)<br />
For a given data set V, let|V| denote the number of points in V. Since all points in V are in<br />
general position, we can claim that<br />
|Ṽ 1 |≤d, |Ṽ 2 |≤d, |Ṽ 1 |+|Ṽ 2 |≥d. (5)<br />
If there exist at least d + 1 points in Ṽ 1 ∪ Ṽ 2 , we may select d + 1 points from Ṽ 1 ∪ Ṽ 2 and<br />
these d + 1 points will define the separating hyperplane (w ∗ ) T v + b ∗ = 0 due to the general<br />
123