09.10.2023 Views

Advanced Data Analytics Using Python_ With Machine Learning, Deep Learning and NLP Examples ( 2023)

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 4

Unsupervised Learning: Clustering

the points assigned to the respective cluster. All points within a cluster are

nearer in distance to their centroid than any other centroid. The condition

for the K clusters C k and the K centroids μ k can be expressed as follows:

minimize

K

åå n

-

k= 1 xnÎC

k

k

2

x m with respect to C k , μ k .

However, this optimization problem cannot be solved in polynomial

time. But Lloyd has proposed an iterative method as a solution. It consists

of two steps that iterate until the program converges to the solution.

1. It has a set of K centroids, and each point is assigned

to a unique cluster or centroid, where the distance

of the concerned centroid from that particular point

is the minimum.

2. It recalculates the centroid of each cluster by using

the following formula:

{ }

C = k

Xn : Xn - mk £ all Xn - m l

(1)

1

m k

=

Ck

å

XnÎCk

Xn

(2)

The two-step procedure continues until there is no further re-arrangement

of cluster points. The convergence of the algorithm is guaranteed, but it may

converge to a local minima.

The following is a simple implementation of Lloyd’s algorithm for

performing K-means clustering in Python:

import random

def ED(source, target):

if source == "":

return len(target)

if target == "":

return len(source)

79

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!