Machine Learning in Python Essential Techniques for Predictive Analysis by Michael Bowles (z-lib.org).epub

Recommendations

Info

have correct answers to use for training. These are denoted as targets in Figure 1.5.The algorithms covered in this book learn by detecting patterns in past behaviors, butit is important that they not merely memorize past behavior; after all, a customermight not repeat a purchase of something he bought yesterday. Chapter 3 discusses indetail how this process of training without memorizing works.Usually, several aspects of the problem formulation can be done in more than oneway. This leads to some iteration between framing the problem, selecting and traininga model, and producing performance estimates. Figure 1.6 depicts this process.Figure 1.6 Iteration from formulation to performanceThe problem may come with specific quantitative training objectives, or part of thejob might be extracting these data (called targets or labels). Consider, for instance,the problem of building a system to automatically trade securities. To tradeautomatically, a first step might be to predict changes in the price of a security. Theprices are easily available, so it is conceptually simple to use historical data to buildtraining examples for which the future price changes are known. But even thatinvolves choices and experimentation. Future price change could be computed inseveral different ways. The change could be the difference between the current priceand the price 10 minutes in the future. It could also be the change between the currentprice and the price 10 days in the future. It could also be the difference between thecurrent price and the maximum/minimum price over the next 10 minutes. The changein price could be characterized by a two-state variable taking values “higher” of“lower” depending on whether the price is higher or lower 10 minutes in the future.Each of these choices will lead to a predictive model, and the predictions will be usedfor deciding whether to buy or sell the security. Some experimentation will berequired to determine the best choice.FEATURE EXTRACTION AND FEATURE ENGINEERINGDeciding which variables to use for making predictions can also involveexperimentation. This process is known as feature extraction and feature engineering.Feature extraction is the process of taking data from a free-form arrangement, such aswords in a document or on a web page, and arranging them into rows and columns ofnumbers. For example, a spam-filtering problem begins with text from emails andmight extract things such as the number of capital letters in the document and thenumber of words in all caps, the number of times the word "buy" appears in thedocument and other numeric features selected to highlight the differences betweenspam and non-spam emails.Feature engineering is the process of manipulating and combining features to arrive atmore informative ones. Building a system for trading securities involves featureextraction and feature engineering. Feature extraction would be deciding what thingswill be used to predict prices. Past prices, prices of related securities, interest rates,and features extracted from news releases have all been incorporated into varioustrading systems that have been discussed publicly. In addition, securities prices have anumber of engineered features with names like stochastic, MACD (moving average
convergence divergence), and RSI (relative strength index) that are basicallyfunctions of past prices that their inventors believed to be useful in securities trading.After a reasonable set of features is developed, you can train a predictive model likethe ones described in this book, assess its performance, and make a decision aboutdeploying the model. Generally, you'll want to make changes to the features used, iffor no other reason than to confirm that your model's performance is adequate. Oneway to determine which features to use is to try all combinations, but that can take alot of time. Inevitably, you'll face competing pressures to improve performance butalso to get a trained model into use quickly. The algorithms discussed in this bookhave the beneficial property of providing metrics on the utility of each attribute inproducing predictions. One training pass will generate rankings on the features toindicate their relative importance. This information helps speed the featureengineering process.NOTE Data preparation and feature engineering is estimated to take 80 to90 percent of the time required to develop a machine learning model.The model training process, which begins each time a baseline set of features isattempted, also involves a process. A modern machine learning algorithm, such as theones described in this book, trains something like 100 to 5,000 different models thathave to be winnowed down to a single model for deployment. The reason forgenerating so many models is to provide models of all different shades of complexity.This makes it possible to choose the model that is best suited to the problem and dataset. You don't want a model that's too simple or you give up performance, but youdon't want a model that's too complicated or you'll overfit the problem. Havingmodels in all shades of complexity lets you pick one that is just right.DETERMINING PERFORMANCE OF A TRAINED MODELThe fit of a model is determined by how well it performs on data that were not used totrain the model. This is an important step and conceptually simple. Just set aside somedata. Don't use it in training. After the training is finished, use the data you set asideto determine the performance of your algorithm. This book discusses severalsystematic ways to hold out data. Different methods have different advantages,depending mostly on the size of the training data. As easy as it sounds, peoplecontinually figure out complicated ways to let the test data “leak” into the trainingprocess. At the end of the process, you'll have an algorithm that will sift throughincoming data and make accurate predictions for you. It might need monitoring aschanging conditions alter the underlying statistics.
Page 3 and 4: IntroductionExtracting actionable i
Page 5 and 6: Who This Book Is ForThis book is in
Page 7 and 8: save Chapter 2 until they start loo
Page 9 and 10: evolved from overcoming the limitat
Page 11 and 12: ConventionsTo help you get the most
Page 13 and 14: CHAPTER 1The Two Essential Algorith
Page 15 and 16: the flight path, and so on. Having
Page 17 and 18: become an issue. In some problems,
Page 19 and 20: Figure 1.2 Fitting lines with only
Page 21 and 22: Figure 1.3 Binary decision tree exa
Page 23 and 24: For example, if the training set is
Page 25: NOTE The first pass can usually be
Page 29 and 30: runs through code for the core algo
Page 31 and 32: CHAPTER 2Understand the Problem byU
Page 33 and 34: A unique identifier may or may not
Page 35 and 36: The attributes shown in Table 2.1 c
Page 37 and 38: THINGS TO NOTICE ABOUT YOUR NEW DAT
Page 39 and 40: PHYSICAL CHARACTERISTICS OF THE ROC
Page 41 and 42: facilitate iterating through the pr
Page 43 and 44: colCounts.append(type)type = [0]*3s
Page 45 and 46: LISTING 2-3: SUMMARY STATISTICS FOR
Page 47 and 48: catCount = [0]*2for elt in colData:
Page 49 and 50: from the line. That means that the
Page 51 and 52: Figure 2.1 Quantile-quantile plot o
Page 53 and 54: makes it possible to read data into
Page 55 and 56: LISTING 2-5: USING PYTHON PANDAS TO
Page 57 and 58: Pandas makes it possible to automat
Page 59 and 60: Figure 2.3 Parallel coordinates gra
Page 61 and 62: of representative pairs of attribut
Page 63 and 64: Figure 2.5 Cross- plot of rocks ver
Page 65 and 66: Figure 2.7 Target-attribute cross-p
Page 67 and 68: # and add some ditherif rocksVMines
Page 69 and 70: Equation 2-2: Average values of the
Page 71 and 72: (sqrt(var2*var3) * numElt)corr221 +
Page 73 and 74: Figure 2.8 Heat map showing attribu
Page 75 and 76: that you saw in “Classification P
Page 77 and 78:
plot.ylabel(("Quartile Ranges"))sho
Page 79 and 80:
As an alternative to the listing of
Page 81 and 82:
Figure 2.10 Box plot of real-valued
Page 83 and 84:
work for the abalone problem. Rocks
Page 85 and 86:
for i in range(nrows):#plot rows of
Page 87 and 88:
Figure 2.13 Graph of the logit func
Page 89 and 90:
LISTING 2-12: CORRELATIONCALCULATIO
Page 91 and 92:
Figure 2.15 Correlation heat map fo
Page 93 and 94:
LISTING 2-13: WINE DATA SUMMARY—W
Page 95 and 96:
mean 0.087467 15.87492246.467792 0.
Page 97 and 98:
Figure 2.17 Parallel coordinate plo
Page 99 and 100:
#Try again with normalized valuesfo
Page 101 and 102:
Figure 2.19 Correlation heat map fo
Page 103 and 104:
LISTING 2-15: SUMMARY OF GLASS DATA
Page 105 and 106:
75% 0.610000 9.172500 0.000000 0.10
Page 107 and 108:
Figure 2.21 Parallel coordinate plo
Page 109 and 110:
plot.ylabel(("Attribute Values"))pl
Page 111 and 112:
building predictive models. The too
Page 113 and 114:
The variable that you are attemptin
Page 115 and 116:
Referring to the data set in Table
Page 117 and 118:
ASSESSING PERFORMANCE OF PREDICTIVE
Page 119 and 120:
training on the remaining data. Sta
Page 121 and 122:
Figure 3.2 A complicated classifica
Page 123 and 124:
CONTRAST BETWEEN A SIMPLE MODEL AND
Page 125 and 126:
Figure 3.5 Linear model fit to comp
Page 127 and 128:
Figure 3.7 Linear model fit to smal
Page 129 and 130:
used for predictive modeling. Addin
Page 131 and 132:
Model to Balance Problem Complexity
Page 133 and 134:
LISTING 3-1: COMPARISON OF MSE, MAE
Page 135 and 136:
from the mean) and the standard dev
Page 137 and 138:
Figure 3.9 Confusion matrix example
Page 139 and 140:
LISTING 3-2: MEASURING PERFORMANCEF
Page 141 and 142:
print("Shape of yTrain array", yTra
Page 143 and 144:
examples after it has been deployed
Page 145 and 146:
associated with its removal. If the
Page 147 and 148:
Figure 3.10 In-sample ROC for rocks
Page 149 and 150:
demonstration that performance esti
Page 151 and 152:
preserve the statistical peculiarit
Page 153 and 154:
vector Y contains the labels. And t
Page 155 and 156:
Initialize: Out_of_sample_error = N
Page 157 and 158:
indices = range(len(xList))xListTes
Page 159 and 160:
#scatter plot of actual versus pred
Page 161 and 162:
LISTING 3-4: FORWARD STEPWISEREGRES
Page 163 and 164:
Figure 3.15 Histogram of wine taste
Page 165 and 166:
regression throttle back ordinary r
Page 167 and 168:
xTrain = numpy.array(xListTrain); y
Page 169 and 170:
Figure 3.16 Wine quality prediction
Page 171 and 172:
Figure 3.18 Histogram of wine taste
Page 173 and 174:
LISTING 3-7: ROCKS VERSUS MINES USI
Page 175 and 176:
Listing 3-8 shows the AUC and assoc
Page 177 and 178:
Figure 3.19 AUC for the rocks-versu
Page 179 and 180:
different problem types (regression
Page 181 and 182:
penalty. This chapter explains how
Page 183 and 184:
model for evaluation speed. The num
Page 185 and 186:
In this table, the outcomes are rea
Page 187 and 188:
Equation 4-4: Linear relation betwe
Page 189 and 190:
real numbers—the ones included in
Page 191 and 192:
between zero and the vector space p
Page 193 and 194:
Figure 4.2 Optimum solutions with s
Page 195 and 196:
the sum of squares penalties (the c
Page 197 and 198:
refinement to the forward stepwise
Page 199 and 200:
LISTING 4-1: LARS ALGORITHM FORPRED
Page 201 and 202:
#calculate correlation between attr
Page 203 and 204:
numeric values that are fixed by th
Page 205 and 206:
Figure 4.3 Coefficient curves for L
Page 207 and 208:
LISTING 4-2: 10-FOLD CROSS-VALIDATI
Page 209 and 210:
#Define test and training index set
Page 211 and 212:
looping nxval times. In this case n
Page 213 and 214:
predictably on new data. The more c
Page 215 and 216:
ElasticNet problem given by Equatio
Page 217 and 218:
Initializing and Iterating the Glmn
Page 219 and 220:
#calculate means and variancesxMean
Page 221 and 222:
value of betalabelHat = sum([xNorma
Page 223 and 224:
Figure 4.7 Coefficient curves for r
Page 225 and 226:
This section has gone through two s
Page 227 and 228:
LISTING 4-4: CONVERTING ACLASSIFICA
Page 229 and 230:
#number of steps to takenSteps = 35
Page 231 and 232:
Some problems require deciding amon
Page 233 and 234:
LISTING 4-5: BASIS EXPANSION FOR WI
Page 235 and 236:
squared, logarithmic, and sinusoida
Page 237 and 238:
numRow = [float(row[i]) for i inran
Page 239 and 240:
print(nameList)for i in range(ncols
Page 241 and 242:
classification problem to an ordina
Page 243 and 244:
CHAPTER 5Building Predictive Models
Page 245 and 246:
to use the cross-validation version
Page 247 and 248:
Another way to think about how to p
Page 249 and 250:
LISTING 5-1: USING CROSS-VALIDATION
Page 251 and 252:
#Call LassoCV from sklearn.linear_m
Page 253 and 254:
Figure 5.2 Out-of-sample error with
Page 255 and 256:
make a material difference in the r
Page 257 and 258:
#calculate means and variancesxMean
Page 259 and 260:
#different orderingabsCoef = [abs(a
Page 261 and 262:
Changing Y to un-normalized changes
Page 263 and 264:
LISTING 5-3: USING OUT-OF-SAMPLEERR
Page 265 and 266:
#Convert list of list to np array f
Page 267 and 268:
Figure 5.6 Cross-validation error c
Page 269 and 270:
other. Listing 5-4 initializes an e
Page 271 and 272:
tendency to be overfit. It’s more
Page 273 and 274:
Figure 5.9 Receiver operating chara
Page 275 and 276:
You can accomplish this by replicat
Page 277 and 278:
attrRow = [float(elt) for elt in ro
Page 279 and 280:
elif (predList[irow] >= 0.0) and (y
Page 281 and 282:
TP = tpr[52] * P#FN = False negativ
Page 283 and 284:
LISTING 5-5: COEFFICIENT TRAJECTORI
Page 285 and 286:
alphas, coefs, _ = enet_path(X, Y,l
Page 287 and 288:
0.038531154796719078, 0.00355153481
Page 289 and 290:
Figure 5.10 plots the coefficient c
Page 291 and 292:
the coefficients problematic becaus
Page 293 and 294:
xList.append(row)#separate labels f
Page 295 and 296:
#begin iterationnSteps = 100lamMult
Page 297 and 298:
sumBeta = sum([abs(betaInner[n]) fo
Page 299 and 300:
class. Ones that are large negative
Page 301 and 302:
LISTING 5-7: MULTICLASSCLASSIFICATI
Page 303 and 304:
ySD.append(stdDev)yNormalized = []f
Page 305 and 306:
groups, one plane will do it. If yo
Page 307 and 308:
the prediction. Then that is compar
Page 309 and 310:
CHAPTER 6Ensemble MethodsEnsemble m
Page 311 and 312:
LISTING 6-1: BUILDING A DECISION TR
Page 313 and 314:
HOW A BINARY DECISION TREE GENERATE
Page 315 and 316:
LISTING 6-2: TRAINING A DECISION TR
Page 317 and 318:
lhSse = sum([(s - lhAvg) * (s - lhA
Page 319 and 320:
Figure 6.3 Block diagram of depth 1
Page 321 and 322:
Figure 6.5 shows how the sum square
Page 323 and 324:
Figure 6.6 Prediction using depth 2
Page 325 and 326:
Figure 6.8 Prediction using depth 6
Page 327 and 328:
LISTING 6-3: CROSS-VALIDATION AT AR
Page 329 and 330:
Figure 6.9 Out-of-sample error vers
Page 331 and 332:
different figures of merit than reg
Page 333 and 334:
plot is similar to the plot of the
Page 335 and 336:
#maximum number of models to genera
Page 337 and 338:
Figure 6.11 MSE versus number of tr
Page 339 and 340:
Figure 6.13 shows the curve of MSE
Page 341 and 342:
LISTING 6-5: PREDICTING WINE QUALIT
Page 343 and 344:
mse = []allPredictions = []for iMod
Page 345 and 346:
solitary attributes and therefore c
Page 347 and 348:
demonstrate its variance reduction
Page 349 and 350:
#maximum number of models to genera
Page 351 and 352:
With gradient boosting, tree depth
Page 353 and 354:
Figure 6.19 Gradient boosting predi
Page 355 and 356:
Page 357 and 358:
Page 359 and 360:
LISTING 6-7: GRADIENT BOOSTING FORP
Page 361 and 362:
#build cumulative prediction from f
Page 363 and 364:
only needing tree depth when there
Page 365 and 366:
idxTest = random.sample(range(nrows
Page 367 and 368:
plot.plot(nModels,mse)plot.axis('ti
Page 369 and 370:
Page 371 and 372:
Page 373 and 374:
1. 1. Panda Biswanath, Joshua S. He
Page 375 and 376:
given in Chapter 6 helps you unders
Page 377 and 378:
the parameter being ignored, which
Page 379 and 380:
makes it possible to assign differe
Page 381 and 382:
Figure 7.1 Wine taste prediction pe
Page 383 and 384:
xTrain, xTest, yTrain, yTest = trai
Page 385 and 386:
Figure 7.2 Relative importance of v
Page 387 and 388:
ls Least mean squared error.lad Lea
Page 389 and 390:
If the type of max_features is floa
Page 391 and 392:
4. Once the oos performance curve i
Page 393 and 394:
test_size=0.30,random_state=531)# T
Page 395 and 396:
might want to try them both to make
Page 397 and 398:
LISTING 7-3: BUILDING A REGRESSIONM
Page 399 and 400:
latestPrediction = modelList[-1].pr
Page 401 and 402:
Figure 7.5 Wine taste error for Bag
Page 403 and 404:
Predictive Models Using Penalized L
Page 405 and 406:
#list of names forabaloneNames = nu
Page 407 and 408:
ASSESSING PERFORMANCE AND THEIMPORT
Page 409 and 410:
LISTING 7-5: PREDICTING ABALONE AGE
Page 411 and 412:
#plot training and test errors vs n
Page 413 and 414:
Figure 7.9 Abalone age prediction e
Page 415 and 416:
Figure 7.11 Abalone age prediction
Page 417 and 418:
outcomes might be “clicked on the
Page 419 and 420:
classification the labels are 0 or
Page 421 and 422:
LISTING 7-7: CLASSIFYING SONARRETUR
Page 423 and 424:
# Plot feature importancefeatureImp
Page 425 and 426:
# ('Threshold Value = ', 0.46564102
Page 427 and 428:
Figure 7.14 Variable importance for
Page 429 and 430:
This function predicts class probab
Page 431 and 432:
else:labels.append(0)attrRow = [flo
Page 433 and 434:
ctClass = [i*0.01 for i in range(10
Page 435 and 436:
# ('Threshold Value = ', 2.02817801
Page 437 and 438:
Figure 7.16 AUC versus ensemble siz
Page 439 and 440:
Figure 7.18 Mine detection ROC curv
Page 441 and 442:
Figure 7.20 Variable importance for
Page 443 and 444:
Solving Multiclass ClassificationPr
Page 445 and 446:
ncols = len(xNum[1])#Labels are int
Page 447 and 448:
featureImportance = featureImportan
Page 449 and 450:
Figure 7.23 is a bar chart showing
Page 451 and 452:
nrows = len(xNum)ncols = len(xNum[1
Page 453 and 454:
print("Best Missclassification Erro
Page 455 and 456:
As before, the Gradient Boosting ve
Page 457 and 458:
Figure 7.25 Glass classifier built
Page 459 and 460:
Figure 7.27 Glass classifier built
Page 461 and 462:
Table 7.1 Performance and Training
Page 463 and 464:
The chapter demonstrated the use of
Page 468 and 469:
®Machine Learning in Python : Esse
Page 470 and 471:
To my children, Scott, Seth, and Ca
Page 472 and 473:
About the Technical EditorDaniel Po
Page 474 and 475:
IndexerJohnna VanHoose DinseCover D
Page 476:
WILEY END USER LICENSEAGREEMENTGo t
show all

Machine Learning in Python Essential Techniques for Predictive Analysis by Michael Bowles (z-lib.org).epub

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?