09.10.2023 Views

Advanced Data Analytics Using Python_ With Machine Learning, Deep Learning and NLP Examples ( 2023)

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 3

Supervised Learning Using Python

Likewise, if r is nearer to 0, it means that x and y are not correlated. A

simplified formula to calculate r is shown here:

r=

n( åxy)-( åx)( åy)

å -( å ) å -( å )

é 2

2

n x x ù é 2

n y y

ëê

ûú ëê

2

ù

ûú

You can easily use correlation for dimensionality reduction. Let’s say

Y is a variable that is a weighted sum of n variables: X1, X2, ... Xn. You

want to reduce this set of X to a smaller set. To do so, you need to calculate

the correlation coefficient for each X pair. Now, if Xi and Xj are highly

correlated, then you will investigate the correlation of Y with Xi and Xj. If

the correlation of Xi is greater than Xj, then you remove Xj from the set, and

vice versa. The following function is an example of the dropping feature

using correlation:

from scipy.stats.stats import pearsonr

def drop_features(y_train,X_train,X,index):

i1 = 0

processed = 0

while(1):

flag = True

for i in range(X_train.shape[1]):

if i > processed :

i1 = i1 + 1

corr = pearsonr(X_train[:,i], y_train)

PEr= .674 * (1- corr[0]*corr[0])/ (len(X_

train[:,i])**(1/2.0))

if corr[0] < PEr:

X_train = np.delete(X_train,i,1)

51

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!