Unsupervised Color Segmentation - Institute for Computer Graphics ...
Unsupervised Color Segmentation - Institute for Computer Graphics ...
Unsupervised Color Segmentation - Institute for Computer Graphics ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
The Mean Shift algorithm proposed by Comaniciu and<br />
Meer [4] is a non parametric kernel-based density estimation<br />
technique. In general, the Mean Shift concept enables<br />
clustering of a set of data-points into C different clusters,<br />
without prior knowledge of the number of clusters. It is<br />
based on the non parametric density estimator at the d-<br />
dimensional feature vector ⃗x in the feature space, which can<br />
be obtained with a kernel K(⃗x) and a window size h by<br />
f(⃗x) = 1<br />
nh d<br />
n ∑<br />
i=1<br />
( ∥∥∥∥ ⃗x − ⃗x i<br />
)<br />
K ∥<br />
h<br />
∥2<br />
, (2)<br />
where ⃗x i are the n feature vectors of the data set. Calculating<br />
the gradient of the density estimator shows that the<br />
so-called Mean Shift defined by<br />
m(⃗x) =<br />
n∑<br />
⃗x i K (⃗x − ⃗x i )<br />
n∑<br />
, (3)<br />
K (⃗x − ⃗x i )<br />
i=1<br />
i=1<br />
points toward the direction of the maximum increase in the<br />
density. The main part of the algorithm is to move the window<br />
iteratively by<br />
⃗x t+1 = ⃗x t + m(⃗x t ). (4)<br />
It is guaranteed that the shift converges to a point where<br />
the gradient of the underlying density function is zero.<br />
These points are the detected cluster centers of the distribution<br />
and represent an estimated mean value <strong>for</strong> the cluster.<br />
We are not interested in the mean shift clusters itself.<br />
We just run the Mean Shift procedure to find the stationary<br />
points of the density estimates, which are the modes of the<br />
distribution. These modes serve as initialization of the EM<br />
algorithm to find the maximum likelihood parameters of the<br />
GMM model.<br />
2.2. Definition of ordering relationship<br />
The next step is to order the pixels of the input color image.<br />
There<strong>for</strong>e, <strong>for</strong> every pixel a unique distance value β<br />
between a single Gaussian distribution fitted to the Luv values<br />
within a x × y window around the pixel and the Gaussian<br />
Mixture Model (GMM) of the ROI, as obtained by the<br />
step described in Section 2.1, is calculated. While the single<br />
Gaussian distributions have to be recalculated <strong>for</strong> all of the<br />
windows located on every pixel, the GMM of the ROI stays<br />
the same during the entire computation.<br />
The comparison between the GMM and the single Gaussian<br />
distributions is based on calculating Bhattacharyya distances<br />
[1]. The Bhattacharyya distance β compares two<br />
d - dimensional Gaussian distributions N 1 = {⃗µ 1 , Σ ⃗ 1 } and<br />
N 2 = {⃗µ 2 , Σ ⃗ 2 } by<br />
∣ ∣∣<br />
β (N 1 , N 2 ) = 1 Σ1+ ⃗ Σ ⃗ 2<br />
2 ln 2<br />
∣<br />
√ ∣∣∣ ∣ ∣ ∣ +<br />
Σ1 ⃗ ∣∣ ∣∣ Σ2 ⃗ ∣∣<br />
[ ⃗Σ1 + ⃗ Σ 2<br />
2<br />
] −1<br />
(⃗µ 2 − ⃗µ 1 ).<br />
1<br />
8 (⃗µ 2 − ⃗µ 1 ) t<br />
The Bhattacharyya distance allows a quantitative statement<br />
about the discriminability of two Gaussian distributions<br />
by the Bayes decision rule. There<strong>for</strong>e, <strong>for</strong> our analysis<br />
in the three-dimensional CIE Luv feature space, the Bhattacharyya<br />
distance represents a quantitative statement if two<br />
color distributions are likely to be equal or not.<br />
To be able to compare a single Gaussian distribution to a<br />
GMM, C different Bhattacharyya distance values to every<br />
component of the GMM have to be calculated. Then, the<br />
final distance value β is computed by<br />
β =<br />
(5)<br />
C∑<br />
ω c β (N c , N w ), (6)<br />
c=1<br />
where ω c is the c-th GMM weight, N c = {⃗µ c , Σ ⃗ c } denotes<br />
the c-th component of the GMM and N w = {⃗µ w , Σ ⃗ w } is<br />
the single Gaussian fitted to the window pixels.<br />
To be able to calculate a Bhattacharyya distance <strong>for</strong> every<br />
pixel, the window has to be moved all over the image.<br />
For every window location the mean and the covariance matrix<br />
of the corresponding Luv values within the window<br />
have to be computed which is a time-consuming process.<br />
There<strong>for</strong>e, we introduce an adaption of the Summed-Area-<br />
Table (SAT) approach to efficiently calculate the Bhattacharyya<br />
distances. The SAT idea was originally proposed<br />
<strong>for</strong> texture mapping and brought back to the computer vision<br />
community by Viola and Jones [20] as integral image.<br />
In general, the integral image Int(r, c) is defined <strong>for</strong> a<br />
gray scale input image I(x, y) by<br />
Int(r, c) =<br />
∑<br />
I(x, y), (7)<br />
x≤r,y≤c<br />
as the sum of all pixel values inside the rectangle bounded<br />
by the upper left corner.<br />
Tuzel et al. [18] proposed an efficient method to calculate<br />
covariances based on the integral image concept. They<br />
build d integral images P i <strong>for</strong> the sum of the values <strong>for</strong> each<br />
dimension and d ∗ (d + 1)/2 images Q ij <strong>for</strong> each product<br />
between the values of any two dimensions, where d is the<br />
number of dimensions of the feature space.<br />
Thus, the integral images P i , with i = 1 . . . d, are defined<br />
by<br />
P i (r, c) =<br />
∑<br />
I i (x, y), (8)<br />
x≤r,y≤c