09.10.2023 Views

Advanced Data Analytics Using Python_ With Machine Learning, Deep Learning and NLP Examples ( 2023)

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 4

Unsupervised Learning: Clustering

To create a cluster from a dendrogram, you need a threshold of

distance or similarity. An easy way to do this is to plot the distribution

of distance or similarity and find the inflection point of the curve. For

Gaussian distribution data, the inflection point is located at x = mean +

n*std and x = mean – n*std, as shown Figure 4-3.

Figure 4-3. The inflection point

The following code creates a hierarchical cluster using Python:

From numpy import *

class cluster_node:

def \ __init__(self,vec1,left1=None,right1=None,distance1=0.0,i

d1=None,count1=1):

self.left1=left1

self.right1=right1

self.vec1=vec1

self.id1=id1

self.distance1=distance1

self.count1=count1 #only used for weighted average

def L2dist(v1,v2):

return sqrt(sum((v1-v2)**2))

def hcluster(features1,distanc1e=L2dist):

94

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!