09.10.2023 Views

Advanced Data Analytics Using Python_ With Machine Learning, Deep Learning and NLP Examples ( 2023)

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 3

Supervised Learning Using Python

Dealing with Categorical Data

For algorithm-like support, vector or regression input data must be

numeric. So, if you are dealing with categorical data, you need to convert

to numeric data. One strategy for conversion is to use an ordinal number

as the numerical score. A more sophisticated way to do this is to use

an expected value of the target variable for that value. This is good for

regression.

for col in X.columns:

avgs = df.groupby(col, as_index=False)['floor'].

aggregate(np.mean)

fori,row in avgs.iterrows():

k = row[col]

v = row['floor']

X.loc[X[col] == k, col] = v

For logistic regression, you can use the expected probability of the

target variable for that categorical value.

for col in X.columns:

if str(col) != 'success':

if str(col) not in index:

feature_prob = X.groupby(col).size().

div(len(df))

cond_prob = X.groupby(['success',

str(col)]).size().div(len(df)).div(feature_

prob, axis=0, level=str(col)).reset_

index(name="Probability")

cond_prob = cond_prob[cond_prob.success != '0']

cond_prob.drop("success",inplace=True, axis=1)

cond_prob['feature_value'] = cond_

prob[str(col)].apply(str).as_matrix()

73

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!