Advanced Data Analytics Using Python_ With Machine Learning, Deep Learning and NLP Examples ( 2023)
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 4
Graph Theoretical Approach
Unsupervised Learning: Clustering
The clustering problem can be mapped to a graph, where every node in
the graph is an input data point. If the distance between two graphs is
less than the threshold, then the corresponding nodes are connected.
Now using the graph partition algorithm, you can cluster the graph. One
industry example of clustering is in investment banking, where the cluster
instruments depend on the correlation of their time series of price and
performance trading of each cluster taken together. This is known as
basket trading in algorithmic trading. So, by using the similarity measure,
you can construct the graph where the nodes are instruments and the
edges between the nodes indicate that the instruments are correlated. To
create the basket, you need a set of instruments where all are correlated
to each other. In a graph, this is a set of nodes or subgraphs where all the
nodes in the subgraph are connected to each other. This kind of subgraph
is known as a clique. Finding the clique of maximum size is an NPcomplete
problem. People use heuristic solutions to solve this problem of
clustering.
How Do You Know If the Clustering Result Is
Good?
After applying the clustering algorithm, verifying the result as good or bad
is a crucial step in cluster analysis. Three parameters are used to measure
the quality of cluster, namely, centroid, radius, and diameter.
å
N
Centroid = Cm
= i=1 N
t mi
97