Advanced Data Analytics Using Python_ With Machine Learning, Deep Learning and NLP Examples ( 2023)
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 3
Supervised Learning Using Python
Semisupervised Learning
Classification and regression are types of supervised learning. In this type
of learning, you have a set of training data where you train your model.
Then the model is used to predict test data. For example, suppose you
want to classify text according to sentiment. There are three target classes:
positive, negative, and neutral. To train your model, you have to choose
some sample text and label it as positive, negative, and neutral. You use
this training data to train the model. Once your model is trained, you can
apply your model to test data. For example, you may use the Naive Bayes
classifier for text classification and try to predict the sentiment of the
sentence “Food is good.” In the training phase, the program will calculate
the probability of a sentence being positive or negative or neutral when
the words Food, is, and good are presented separately and stored in the
model, and in the test phase it will calculate the joint probability when
Food, is, and good all come together. Conversely, clustering is an example
of unsupervised learning where there is no training data or target class
available. The program learns from data in one shot. There is an instance
of semisupervised learning also. Suppose you are classifying the text as
positive and negative sentiments but your training data has only positives.
The training data that is not positive is unlabeled. In this case, as the first
step, you train the model assuming all unlabeled data is negative and apply
the trained model on the training data. In the output, the data coming in
as negative should be labeled as negative. Finally, train your model with
the newly labeled data. The nearest neighbor classifier is also considered
as semisupervised learning. It has training data, but it does not have the
training phase of the model.
58