09.10.2023 Views

Advanced Data Analytics Using Python_ With Machine Learning, Deep Learning and NLP Examples ( 2023)

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 4

Unsupervised Learning: Clustering

• Jaccard

The Jaccard coefficient is given by the following:

B

Sim(q,d j ) = J(A,B) =

B

The Jaccard measure signifies the degree of relevance.

• Cosine

The cosine of the angle between two vectors is given by

the following:

B

Sim(q,d j ) = C(A,B) =

AB

Distance and similarity are two opposite measures. For example,

numeric data correlation is a similarity measure, and Euclidian distance

is a distance measure. Generally, the value of the similarity measure is

limited to between 0 and 1, but distance has no such upper boundary.

Similarity can be negative, but by definition, distance cannot be negative.

The clustering algorithms are almost the same as from the beginning of

this field, but researchers are continuously finding new distance measures

for varied applications.

What Is Hierarchical Clustering?

Hierarchical clustering is an iterative method of clustering data objects.

There are two types.

• Agglomerative hierarchical algorithms, or a bottom-up

approach

• Divisive hierarchical algorithms, or a top-down

approach

88

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!