Advanced Data Analytics Using Python_ With Machine Learning, Deep Learning and NLP Examples ( 2023)
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 4
Unsupervised Learning: Clustering
the points assigned to the respective cluster. All points within a cluster are
nearer in distance to their centroid than any other centroid. The condition
for the K clusters C k and the K centroids μ k can be expressed as follows:
minimize
K
åå n
-
k= 1 xnÎC
k
k
2
x m with respect to C k , μ k .
However, this optimization problem cannot be solved in polynomial
time. But Lloyd has proposed an iterative method as a solution. It consists
of two steps that iterate until the program converges to the solution.
1. It has a set of K centroids, and each point is assigned
to a unique cluster or centroid, where the distance
of the concerned centroid from that particular point
is the minimum.
2. It recalculates the centroid of each cluster by using
the following formula:
{ }
C = k
Xn : Xn - mk £ all Xn - m l
(1)
1
m k
=
Ck
å
XnÎCk
Xn
(2)
The two-step procedure continues until there is no further re-arrangement
of cluster points. The convergence of the algorithm is guaranteed, but it may
converge to a local minima.
The following is a simple implementation of Lloyd’s algorithm for
performing K-means clustering in Python:
import random
def ED(source, target):
if source == "":
return len(target)
if target == "":
return len(source)
79