Machine Learning in Python Essential Techniques for Predictive Analysis by Michael Bowles (z-lib.org).epub

Recommendations

Info

attributes were available for predicting outcomes for each of the data sets, and itshows what percentage of the examples were positive.Table 1.1 Sketch of Problems in Machine Learning Comparison StudyDATA SETNAMENUMBER OFATTRIBUTESAdult 14 25Bact 11 69Cod 15 50Calhous 9 52Cov_Type 54 36HS 200 24Letter.p1 16 3Letter.p2 16 53Medis 63 11Mg 124 17Slac 59 50% OF EXAMPLES THAT AREPOSITIVEThe term positive example in a classification problem means an experiment (a line ofdata from the input data set) in which the outcome is positive. For example, if theclassifier is being designed to determine whether a radar return signal indicates thepresence of an airplane, then the positive example would be those returns where therewas actually an airplane in the radar's field of view. The term positive comes from thissort of example where the two outcomes represent presence or absence. Otherexamples include presence or absence of disease in a medical test or presence orabsence of cheating on a tax return.Not all classification problems deal with presence or absence. For example,determining the gender of an author by machine-reading their text or machineanalyzinga handwriting sample has two classes—male and female—but there's nosense in which one is the absence of the other. In these cases, there's somearbitrariness in the assignment of the designations “positive” and “negative.” Theassignments of positive and negative can be arbitrary, but once chosen must be usedconsistently.Some of the problems in the first study had many more examples of one class than theother. These are called unbalanced. For example, the two data sets Letter.p1 andLetter.p2 pose closely related problems in correctly classifying typed uppercaseletters in a wide variety of fonts. The task with Letter.p1 is to correctly classify theletter O in a standard mix of letters. The task with Letter.p2 is to correctly classify A–M versus N–Z. The percentage of positives shown in Table 1.1 reflects this difference.Table 1.1 also shows the number of “attributes” in each of the data sets. Attributes arethe variables you have available to base a prediction on. For example, to predictwhether an airplane will arrive at its destination on time or not, you might incorporateattributes such as the name of the airline company, the make and year of the airplane,the level of precipitation at the destination airport, the wind speed and direction along3
the flight path, and so on. Having a lot of attributes upon which to base a predictioncan be a blessing and a curse. Attributes that relate directly to the outcomes beingpredicted are a blessing. Attributes that are unrelated to the outcomes are a curse.Telling the difference between blessed and cursed attributes requires data. Chapter 3,“Predictive Model Building: Balancing Performance, Complexity, and Big Data,”goes into that in more detail.Table 1.2 shows how the algorithms covered in this book fared relative to the otheralgorithms used in the study. Table 1.2 shows which algorithms showed the top fiveperformance scores for each of the problems listed in Table 1.1. Algorithms coveredin this book are spelled out (boosted decision trees, Random Forests, bagged decisiontrees, and logistic regression). The first three of these are ensemble methods.Penalized regression was not fully developed when the study was done and wasn'tevaluated. Logistic regression is a close relative and is used to gauge the success ofregression methods. Each of the 9 algorithms used in the study had 3 different datareduction techniques applied, for a total of 27 combinations. The top five positionsrepresent roughly the top 20 percent of performance scores. The row next to theheading Covt indicates that the boosted decision trees algorithm was the first andsecond best relative to performance, Random Forests algorithm was the fourth andfifth best, and bagged decision trees algorithm was the third best. In the cases wherealgorithms not covered here were in the top five, an entry appears in the Othercolumn. The algorithms that show up there are k nearest neighbors (KNNs), artificialneural nets (ANNs), and support vector machines (SVMs).Table 1.2 How the Algorithms Covered in This Book Compare on DifferentProblemsALGORITHM BOOSTEDDECISIONTREESRANDOM BAGGEDFORESTS DECISIONTREESCovt 1, 2 4, 5 3Adult 1, 4 2 3, 5LOGISTIC OTHERREGRESSIONLTR.P1 1 SVM,KNNLTR.P2 1, 2 4, 5 SVMMEDIS 1, 3 5 ANNSLAC 1, 2, 3 4, 5HS 1, 3 ANNMG 2, 4, 5 1, 3CALHOUS 1, 2 5 3, 4COD 1, 2 3, 4, 5BACT 2, 5 1, 3, 4Logistic regression captures top-five honors in only one case in Table 1.2. The reasonfor that is that these data sets have few attributes (at most 200) relative to examples(5,000 in each data set). There's plenty of data to resolve a model with so few
Page 3 and 4: IntroductionExtracting actionable i
Page 5 and 6: Who This Book Is ForThis book is in
Page 7 and 8: save Chapter 2 until they start loo
Page 9 and 10: evolved from overcoming the limitat
Page 11 and 12: ConventionsTo help you get the most
Page 13: CHAPTER 1The Two Essential Algorith
Page 17 and 18: become an issue. In some problems,
Page 19 and 20: Figure 1.2 Fitting lines with only
Page 21 and 22: Figure 1.3 Binary decision tree exa
Page 23 and 24: For example, if the training set is
Page 25 and 26: NOTE The first pass can usually be
Page 27 and 28: convergence divergence), and RSI (r
Page 29 and 30: runs through code for the core algo
Page 31 and 32: CHAPTER 2Understand the Problem byU
Page 33 and 34: A unique identifier may or may not
Page 35 and 36: The attributes shown in Table 2.1 c
Page 37 and 38: THINGS TO NOTICE ABOUT YOUR NEW DAT
Page 39 and 40: PHYSICAL CHARACTERISTICS OF THE ROC
Page 41 and 42: facilitate iterating through the pr
Page 43 and 44: colCounts.append(type)type = [0]*3s
Page 45 and 46: LISTING 2-3: SUMMARY STATISTICS FOR
Page 47 and 48: catCount = [0]*2for elt in colData:
Page 49 and 50: from the line. That means that the
Page 51 and 52: Figure 2.1 Quantile-quantile plot o
Page 53 and 54: makes it possible to read data into
Page 55 and 56: LISTING 2-5: USING PYTHON PANDAS TO
Page 57 and 58: Pandas makes it possible to automat
Page 59 and 60: Figure 2.3 Parallel coordinates gra
Page 61 and 62: of representative pairs of attribut
Page 63 and 64: Figure 2.5 Cross- plot of rocks ver
Page 65 and 66:
Figure 2.7 Target-attribute cross-p
Page 67 and 68:
# and add some ditherif rocksVMines
Page 69 and 70:
Equation 2-2: Average values of the
Page 71 and 72:
(sqrt(var2*var3) * numElt)corr221 +
Page 73 and 74:
Figure 2.8 Heat map showing attribu
Page 75 and 76:
that you saw in “Classification P
Page 77 and 78:
plot.ylabel(("Quartile Ranges"))sho
Page 79 and 80:
As an alternative to the listing of
Page 81 and 82:
Figure 2.10 Box plot of real-valued
Page 83 and 84:
work for the abalone problem. Rocks
Page 85 and 86:
for i in range(nrows):#plot rows of
Page 87 and 88:
Figure 2.13 Graph of the logit func
Page 89 and 90:
LISTING 2-12: CORRELATIONCALCULATIO
Page 91 and 92:
Figure 2.15 Correlation heat map fo
Page 93 and 94:
LISTING 2-13: WINE DATA SUMMARY—W
Page 95 and 96:
mean 0.087467 15.87492246.467792 0.
Page 97 and 98:
Figure 2.17 Parallel coordinate plo
Page 99 and 100:
#Try again with normalized valuesfo
Page 101 and 102:
Figure 2.19 Correlation heat map fo
Page 103 and 104:
LISTING 2-15: SUMMARY OF GLASS DATA
Page 105 and 106:
75% 0.610000 9.172500 0.000000 0.10
Page 107 and 108:
Figure 2.21 Parallel coordinate plo
Page 109 and 110:
plot.ylabel(("Attribute Values"))pl
Page 111 and 112:
building predictive models. The too
Page 113 and 114:
The variable that you are attemptin
Page 115 and 116:
Referring to the data set in Table
Page 117 and 118:
ASSESSING PERFORMANCE OF PREDICTIVE
Page 119 and 120:
training on the remaining data. Sta
Page 121 and 122:
Figure 3.2 A complicated classifica
Page 123 and 124:
CONTRAST BETWEEN A SIMPLE MODEL AND
Page 125 and 126:
Figure 3.5 Linear model fit to comp
Page 127 and 128:
Figure 3.7 Linear model fit to smal
Page 129 and 130:
used for predictive modeling. Addin
Page 131 and 132:
Model to Balance Problem Complexity
Page 133 and 134:
LISTING 3-1: COMPARISON OF MSE, MAE
Page 135 and 136:
from the mean) and the standard dev
Page 137 and 138:
Figure 3.9 Confusion matrix example
Page 139 and 140:
LISTING 3-2: MEASURING PERFORMANCEF
Page 141 and 142:
print("Shape of yTrain array", yTra
Page 143 and 144:
examples after it has been deployed
Page 145 and 146:
associated with its removal. If the
Page 147 and 148:
Figure 3.10 In-sample ROC for rocks
Page 149 and 150:
demonstration that performance esti
Page 151 and 152:
preserve the statistical peculiarit
Page 153 and 154:
vector Y contains the labels. And t
Page 155 and 156:
Initialize: Out_of_sample_error = N
Page 157 and 158:
indices = range(len(xList))xListTes
Page 159 and 160:
#scatter plot of actual versus pred
Page 161 and 162:
LISTING 3-4: FORWARD STEPWISEREGRES
Page 163 and 164:
Figure 3.15 Histogram of wine taste
Page 165 and 166:
regression throttle back ordinary r
Page 167 and 168:
xTrain = numpy.array(xListTrain); y
Page 169 and 170:
Figure 3.16 Wine quality prediction
Page 171 and 172:
Figure 3.18 Histogram of wine taste
Page 173 and 174:
LISTING 3-7: ROCKS VERSUS MINES USI
Page 175 and 176:
Listing 3-8 shows the AUC and assoc
Page 177 and 178:
Figure 3.19 AUC for the rocks-versu
Page 179 and 180:
different problem types (regression
Page 181 and 182:
penalty. This chapter explains how
Page 183 and 184:
model for evaluation speed. The num
Page 185 and 186:
In this table, the outcomes are rea
Page 187 and 188:
Equation 4-4: Linear relation betwe
Page 189 and 190:
real numbers—the ones included in
Page 191 and 192:
between zero and the vector space p
Page 193 and 194:
Figure 4.2 Optimum solutions with s
Page 195 and 196:
the sum of squares penalties (the c
Page 197 and 198:
refinement to the forward stepwise
Page 199 and 200:
LISTING 4-1: LARS ALGORITHM FORPRED
Page 201 and 202:
#calculate correlation between attr
Page 203 and 204:
numeric values that are fixed by th
Page 205 and 206:
Figure 4.3 Coefficient curves for L
Page 207 and 208:
LISTING 4-2: 10-FOLD CROSS-VALIDATI
Page 209 and 210:
#Define test and training index set
Page 211 and 212:
looping nxval times. In this case n
Page 213 and 214:
predictably on new data. The more c
Page 215 and 216:
ElasticNet problem given by Equatio
Page 217 and 218:
Initializing and Iterating the Glmn
Page 219 and 220:
#calculate means and variancesxMean
Page 221 and 222:
value of betalabelHat = sum([xNorma
Page 223 and 224:
Figure 4.7 Coefficient curves for r
Page 225 and 226:
This section has gone through two s
Page 227 and 228:
LISTING 4-4: CONVERTING ACLASSIFICA
Page 229 and 230:
#number of steps to takenSteps = 35
Page 231 and 232:
Some problems require deciding amon
Page 233 and 234:
LISTING 4-5: BASIS EXPANSION FOR WI
Page 235 and 236:
squared, logarithmic, and sinusoida
Page 237 and 238:
numRow = [float(row[i]) for i inran
Page 239 and 240:
print(nameList)for i in range(ncols
Page 241 and 242:
classification problem to an ordina
Page 243 and 244:
CHAPTER 5Building Predictive Models
Page 245 and 246:
to use the cross-validation version
Page 247 and 248:
Another way to think about how to p
Page 249 and 250:
LISTING 5-1: USING CROSS-VALIDATION
Page 251 and 252:
#Call LassoCV from sklearn.linear_m
Page 253 and 254:
Figure 5.2 Out-of-sample error with
Page 255 and 256:
make a material difference in the r
Page 257 and 258:
#calculate means and variancesxMean
Page 259 and 260:
#different orderingabsCoef = [abs(a
Page 261 and 262:
Changing Y to un-normalized changes
Page 263 and 264:
LISTING 5-3: USING OUT-OF-SAMPLEERR
Page 265 and 266:
#Convert list of list to np array f
Page 267 and 268:
Figure 5.6 Cross-validation error c
Page 269 and 270:
other. Listing 5-4 initializes an e
Page 271 and 272:
tendency to be overfit. It’s more
Page 273 and 274:
Figure 5.9 Receiver operating chara
Page 275 and 276:
You can accomplish this by replicat
Page 277 and 278:
attrRow = [float(elt) for elt in ro
Page 279 and 280:
elif (predList[irow] >= 0.0) and (y
Page 281 and 282:
TP = tpr[52] * P#FN = False negativ
Page 283 and 284:
LISTING 5-5: COEFFICIENT TRAJECTORI
Page 285 and 286:
alphas, coefs, _ = enet_path(X, Y,l
Page 287 and 288:
0.038531154796719078, 0.00355153481
Page 289 and 290:
Figure 5.10 plots the coefficient c
Page 291 and 292:
the coefficients problematic becaus
Page 293 and 294:
xList.append(row)#separate labels f
Page 295 and 296:
#begin iterationnSteps = 100lamMult
Page 297 and 298:
sumBeta = sum([abs(betaInner[n]) fo
Page 299 and 300:
class. Ones that are large negative
Page 301 and 302:
LISTING 5-7: MULTICLASSCLASSIFICATI
Page 303 and 304:
ySD.append(stdDev)yNormalized = []f
Page 305 and 306:
groups, one plane will do it. If yo
Page 307 and 308:
the prediction. Then that is compar
Page 309 and 310:
CHAPTER 6Ensemble MethodsEnsemble m
Page 311 and 312:
LISTING 6-1: BUILDING A DECISION TR
Page 313 and 314:
HOW A BINARY DECISION TREE GENERATE
Page 315 and 316:
LISTING 6-2: TRAINING A DECISION TR
Page 317 and 318:
lhSse = sum([(s - lhAvg) * (s - lhA
Page 319 and 320:
Figure 6.3 Block diagram of depth 1
Page 321 and 322:
Figure 6.5 shows how the sum square
Page 323 and 324:
Figure 6.6 Prediction using depth 2
Page 325 and 326:
Figure 6.8 Prediction using depth 6
Page 327 and 328:
LISTING 6-3: CROSS-VALIDATION AT AR
Page 329 and 330:
Figure 6.9 Out-of-sample error vers
Page 331 and 332:
different figures of merit than reg
Page 333 and 334:
plot is similar to the plot of the
Page 335 and 336:
#maximum number of models to genera
Page 337 and 338:
Figure 6.11 MSE versus number of tr
Page 339 and 340:
Figure 6.13 shows the curve of MSE
Page 341 and 342:
LISTING 6-5: PREDICTING WINE QUALIT
Page 343 and 344:
mse = []allPredictions = []for iMod
Page 345 and 346:
solitary attributes and therefore c
Page 347 and 348:
demonstrate its variance reduction
Page 349 and 350:
#maximum number of models to genera
Page 351 and 352:
With gradient boosting, tree depth
Page 353 and 354:
Figure 6.19 Gradient boosting predi
Page 355 and 356:
Page 357 and 358:
Page 359 and 360:
LISTING 6-7: GRADIENT BOOSTING FORP
Page 361 and 362:
#build cumulative prediction from f
Page 363 and 364:
only needing tree depth when there
Page 365 and 366:
idxTest = random.sample(range(nrows
Page 367 and 368:
plot.plot(nModels,mse)plot.axis('ti
Page 369 and 370:
Page 371 and 372:
Page 373 and 374:
1. 1. Panda Biswanath, Joshua S. He
Page 375 and 376:
given in Chapter 6 helps you unders
Page 377 and 378:
the parameter being ignored, which
Page 379 and 380:
makes it possible to assign differe
Page 381 and 382:
Figure 7.1 Wine taste prediction pe
Page 383 and 384:
xTrain, xTest, yTrain, yTest = trai
Page 385 and 386:
Figure 7.2 Relative importance of v
Page 387 and 388:
ls Least mean squared error.lad Lea
Page 389 and 390:
If the type of max_features is floa
Page 391 and 392:
4. Once the oos performance curve i
Page 393 and 394:
test_size=0.30,random_state=531)# T
Page 395 and 396:
might want to try them both to make
Page 397 and 398:
LISTING 7-3: BUILDING A REGRESSIONM
Page 399 and 400:
latestPrediction = modelList[-1].pr
Page 401 and 402:
Figure 7.5 Wine taste error for Bag
Page 403 and 404:
Predictive Models Using Penalized L
Page 405 and 406:
#list of names forabaloneNames = nu
Page 407 and 408:
ASSESSING PERFORMANCE AND THEIMPORT
Page 409 and 410:
LISTING 7-5: PREDICTING ABALONE AGE
Page 411 and 412:
#plot training and test errors vs n
Page 413 and 414:
Figure 7.9 Abalone age prediction e
Page 415 and 416:
Figure 7.11 Abalone age prediction
Page 417 and 418:
outcomes might be “clicked on the
Page 419 and 420:
classification the labels are 0 or
Page 421 and 422:
LISTING 7-7: CLASSIFYING SONARRETUR
Page 423 and 424:
# Plot feature importancefeatureImp
Page 425 and 426:
# ('Threshold Value = ', 0.46564102
Page 427 and 428:
Figure 7.14 Variable importance for
Page 429 and 430:
This function predicts class probab
Page 431 and 432:
else:labels.append(0)attrRow = [flo
Page 433 and 434:
ctClass = [i*0.01 for i in range(10
Page 435 and 436:
# ('Threshold Value = ', 2.02817801
Page 437 and 438:
Figure 7.16 AUC versus ensemble siz
Page 439 and 440:
Figure 7.18 Mine detection ROC curv
Page 441 and 442:
Figure 7.20 Variable importance for
Page 443 and 444:
Solving Multiclass ClassificationPr
Page 445 and 446:
ncols = len(xNum[1])#Labels are int
Page 447 and 448:
featureImportance = featureImportan
Page 449 and 450:
Figure 7.23 is a bar chart showing
Page 451 and 452:
nrows = len(xNum)ncols = len(xNum[1
Page 453 and 454:
print("Best Missclassification Erro
Page 455 and 456:
As before, the Gradient Boosting ve
Page 457 and 458:
Figure 7.25 Glass classifier built
Page 459 and 460:
Figure 7.27 Glass classifier built
Page 461 and 462:
Table 7.1 Performance and Training
Page 463 and 464:
The chapter demonstrated the use of
Page 468 and 469:
®Machine Learning in Python : Esse
Page 470 and 471:
To my children, Scott, Seth, and Ca
Page 472 and 473:
About the Technical EditorDaniel Po
Page 474 and 475:
IndexerJohnna VanHoose DinseCover D
Page 476:
WILEY END USER LICENSEAGREEMENTGo t
show all

Machine Learning in Python Essential Techniques for Predictive Analysis by Michael Bowles (z-lib.org).epub

Create successful ePaper yourself

Delete template?

Save as template?