09.10.2023 Views

Advanced Data Analytics Using Python_ With Machine Learning, Deep Learning and NLP Examples ( 2023)

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 4

Unsupervised

Learning: Clustering

In Chapter 3 we discussed how training data can be used to categorize

customer comments according to sentiment (positive, negative, neutral),

as well as according to context. For example, in the airline domain, the

context can be punctuality, food, comfort, entertainment, and so on. Using

this analysis, a business owner can determine the areas that his business

he needs to concentrate on. For instance, if he observes that the highest

percentage of negative comments has been about food, then his priority

will be the quality of food being served to the customers. However, there

are scenarios where business owners are not sure about the context. There

are also cases where training data is not available. Moreover, the frame

of reference can change with time. Classification algorithms cannot be

applied where target classes are unknown. A clustering algorithm is used

in these kinds of situations. A conventional application of clustering is

found in the wine-making industry where each cluster represents a brand

of wine, and wines are clustered according to their component ratio in

wine. In Chapter 3, you learned that classification can be used to recognize

a type of image, but there are scenarios where one image has multiple

shapes and an algorithm is needed to separate the figures. Clustering

algorithms are used in this kind of use case.

© Sayan Mukhopadhyay 2018

S. Mukhopadhyay, Advanced Data Analytics Using Python,

https://doi.org/10.1007/978-1-4842-3450-1_4

77

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!