09.10.2023 Views

Advanced Data Analytics Using Python_ With Machine Learning, Deep Learning and NLP Examples ( 2023)

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 3

Supervised Learning Using Python

Information gain, which is the expected reduction in entropy caused

by partitioning the examples according to this attribute, is the measure

used in this case.

Specifically, the information gain, Gain(S,A), of an attribute A relative

to a collection of examples S is defined as follows:

Sv

Gain( SA , ) º Entropy( S)- å

S Entropy ( S v )

vÎ Values( A)

So, an attribute with a higher information gain will come first in the

decision tree.

from sklearn.tree import DecisionTreeClassifier

df = pd.read_csv('csv file path', index_col=0)

y = df[target class column ]

X = df[ col1, col2 ..]

clf= DecisionTreeClassifier()

clf.fit(X,y)

clf.predict(X_test)

Random Forest Classifier

A random forest classifier is an extension of a decision tree in which the

algorithm creates N number of decision trees where each tree has M

number of features selected randomly. Now a test data will be classified by

all decision trees and be categorized in a target class that is the output of

the majority of the decision trees.

60

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!