11.04.2024 Views

Thinking-data-science-a-data-science-practitioners-guide

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

12 1 Data Science Process

Fig. 1.8 Supervised/

unsupervised learning

Model Training

For both regression and classification models, the model training can be a supervised

learning or unsupervised learning, as seen in Fig. 1.8.

In case of supervised learning, you must have a labeled dataset, so to say that for

each data point in the dataset, the target value is known. Using this labeled dataset,

the model will tune its hyper-parameters and get ready to infer an unknown data

point. You may test the model’s accuracy on a validation dataset, which is part of

your original labeled dataset, but not used during training.

In case of unsupervised learning, you do not have labeled data, or rather, it is

impossible to create a labeled dataset because of its size. In such cases, you will use

those machine learning algorithms which will do the data analysis of your dataset on

their own. To cite an example here, an object detection model like OpenCV and

YOLO was trained using unsupervised learning.

Algorithm Selection

The major daunting task for a data scientist is to decide which algorithm to use. For

both regression and classification problems, there are many algorithms available in

our repositories. The challenge is to select the one which is most suitable for the

dataset and which can achieve a very high accuracy while predicting an unseen data

point. This book will help you understand several algorithms and select an appropriate

one for your application. To give you a quick overview of the several

algorithms available for machine learning, look at Fig. 1.9.

It is important for a data scientist to understand these algorithms, if not full

implementation, at least the concepts behind it. If you understand the algorithm

conceptually and its purpose, then you would know which one to use for your

current need. After all, there is somebody who has efficiently implemented these

algorithms, and we may not do a better job than those hard-cored developers. Even if

the implementation is not fully optimized, it is still okay in most cases. If your model

is doing real-time inferences, then only this high-level optimization would be called

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!