Advanced Data Analytics Using Python_ With Machine Learning, Deep Learning and NLP Examples ( 2023)
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 4
Unsupervised Learning: Clustering
• Jaccard
The Jaccard coefficient is given by the following:
AÇ
B
Sim(q,d j ) = J(A,B) =
AÈ
B
The Jaccard measure signifies the degree of relevance.
• Cosine
The cosine of the angle between two vectors is given by
the following:
AÇ
B
Sim(q,d j ) = C(A,B) =
AB
Distance and similarity are two opposite measures. For example,
numeric data correlation is a similarity measure, and Euclidian distance
is a distance measure. Generally, the value of the similarity measure is
limited to between 0 and 1, but distance has no such upper boundary.
Similarity can be negative, but by definition, distance cannot be negative.
The clustering algorithms are almost the same as from the beginning of
this field, but researchers are continuously finding new distance measures
for varied applications.
What Is Hierarchical Clustering?
Hierarchical clustering is an iterative method of clustering data objects.
There are two types.
• Agglomerative hierarchical algorithms, or a bottom-up
approach
• Divisive hierarchical algorithms, or a top-down
approach
88