Data Mining: Practical Machine Learning Tools and ... - LIDeCC

More documents

Recommendations

Info

4.6 LINEAR MODELS 123n( i)i iÂ 1-x log 1 Pr 1 a1 , a2 , . . . , a i k x i log Pr 1 a i 1 , a i2 , . . . , ai=1( ) ( ) ( ) ( ) ( ) ( ) ( i)( ) ( - [ ])+ ( [ k ])where the x (i) are either zero or one.The weights w i need to be chosen to maximize the log-likelihood. There areseveral methods for solving this maximization problem. A simple one is toiteratively solve a sequence of weighted least-squares regression problems untilthe log-likelihood converges to a maximum, which usually happens in a fewiterations.To generalize logistic regression to several classes, one possibility is to proceedin the way described previously for multiresponse linear regression by performinglogistic regression independently for each class. Unfortunately, theresulting probability estimates will not sum to one. To obtain proper probabilitiesit is necessary to couple the individual models for each class. This yields ajoint optimization problem, and there are efficient solution methods for this.A conceptually simpler, and very general, way to address multiclass problemsis known as pairwise classification. Here a classifier is built for every pair ofclasses, using only the instances from these two classes. The output on anunknown test example is based on which class receives the most votes. Thismethod generally yields accurate results in terms of classification error. It canalso be used to produce probability estimates by applying a method called pairwisecoupling, which calibrates the individual probability estimates from the differentclassifiers.If there are k classes, pairwise classification builds a total of k(k - 1)/2 classifiers.Although this sounds unnecessarily computation intensive, it is not. Infact, if the classes are evenly populated pairwise classification is at least as fastas any other multiclass method. The reason is that each of the pairwise learningproblem only involves instances pertaining to the two classes under consideration.If n instances are divided evenly among k classes, this amounts to 2n/kinstances per problem. Suppose the learning algorithm for a two-class problemwith n instances takes time proportional to n seconds to execute. Then the runtime for pairwise classification is proportional to k(k - 1)/2 ¥ 2n/k seconds,which is (k - 1)n. In other words, the method scales linearly with the numberof classes. If the learning algorithm takes more time—say proportional to n 2 —the advantage of the pairwise approach becomes even more pronounced.The use of linear functions for classification can easily be visualized ininstance space. The decision boundary for two-class logistic regression lieswhere the prediction probability is 0.5, that is:Pr[ 1 a , a ,..., a ]= 1 ( 1+ exp (-w -w a -...-w a )) = 0. 5.1 2 k 0 1 1k kThis occurs when
124 CHAPTER 4 | ALGORITHMS: THE BASIC METHODS-w -w a -...- w a = .0 1 1 k k 0Because this is a linear equality in the attribute values, the boundary is a linearplane, or hyperplane, in instance space. It is easy to visualize sets of points thatcannot be separated by a single hyperplane, and these cannot be discriminatedcorrectly by logistic regression.Multiresponse linear regression suffers from the same problem. Each classreceives a weight vector calculated from the training data. Focus for the momenton a particular pair of classes. Suppose the weight vector for class 1 is( ) ( ) ( ) ( )0 1 1 2 2k k1 1 11w + w a + w a + ... + w aand the same for class 2 with appropriate superscripts. Then, an instance willbe assigned to class 1 rather than class 2 if( ) ( ) ( ) ( ) ( ) ( )0 1 1k k 0 1 1k k1 112 22w + w a + ... + w a > w + w a + ... + w aIn other words, it will be assigned to class 1 if( 1) ( 2) ( 1) ( 2) ( 1) ( 2)( w0- w0)+( w1-w1) a1+ ... + ( wk -wk ) ak> 0.This is a linear inequality in the attribute values, so the boundary between eachpair of classes is a hyperplane. The same holds true when performing pairwiseclassification. The only difference is that the boundary between two classes isgoverned by the training instances in those classes and is not influenced by theother classes.Linear classification using the perceptronLogistic regression attempts to produce accurate probability estimates by maximizingthe probability of the training data. Of course, accurate probability estimateslead to accurate classifications. However, it is not necessary to performprobability estimation if the sole purpose of the model is to predict class labels.A different approach is to learn a hyperplane that separates the instances pertainingto the different classes—let’s assume that there are only two of them. Ifthe data can be separated perfectly into two groups using a hyperplane, it is saidto be linearly separable. It turns out that if the data is linearly separable, thereis a very simple algorithm for finding a separating hyperplane.The algorithm is called the perceptron learning rule. Before looking at it indetail, let’s examine the equation for a hyperplane again:wa + wa+ wa + ... + wa = .0 0 1 1 2 2 k k 0Here, a 1 , a 2 ,...,a k are the attribute values, and w 0 , w 1 ,...,w k are the weightsthat define the hyperplane. We will assume that each training instance a 1 , a 2 ,. . . is extended by an additional attribute a 0 that always has the value 1 (as wedid in the case of linear regression). This extension, which is called the bias, just
Page 2:
Data MiningPractical Machine Learni
Page 5 and 6:
Publisher:Publishing Services Manag
Page 7 and 8:
viFOREWORDThis book presents this n
Page 10 and 11:
CONTENTSix4 Algorithms: The basic m
Page 12 and 13:
CONTENTSxiGenerating good rules 202
Page 14 and 15:
CONTENTSxiii8 Moving on: Extensions
Page 16:
CONTENTSxv13 The command-line inter
Page 19 and 20:
xviiiLIST OF FIGURESFigure 4.10 The
Page 21 and 22:
xxLIST OF FIGURESFigure 10.13 Worki
Page 23 and 24:
xxiiLIST OF TABLESTable 5.2 Confide
Page 25 and 26:
xxivPREFACEalchemy. Instead, there
Page 27 and 28:
xxviPREFACEwho interprets them, and
Page 29 and 30:
xxviiiPREFACEin Section 6.3. We hav
Page 31 and 32:
xxxPREFACEration. All who have work
Page 34:
partIMachine Learning Toolsand Tech
Page 37 and 38:
4 CHAPTER 1 | WHAT’S IT ALL ABOUT
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
10 CHAPTER 1 | WHAT’S IT ALL ABOU
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Page 74 and 75:
chapter 2Input:Concepts, Instances,
Page 76 and 77:
2.1 WHAT’S A CONCEPT? 43increase
Page 78 and 79:
2.2 WHAT’S IN AN EXAMPLE? 45data
Page 80 and 81:
2.2 WHAT’S IN AN EXAMPLE? 47Table
Page 82 and 83:
2.3 WHAT’S IN AN ATTRIBUTE? 49Tab
Page 84 and 85:
2.3 WHAT’S IN AN ATTRIBUTE? 51Not
Page 86 and 87:
2.4 PREPARING THE INPUT 53cleaned u
Page 88 and 89:
missing values in this dataset). Th
Page 90 and 91:
dividing by the range between the m
Page 92 and 93:
2.4 PREPARING THE INPUT 59a record
Page 94 and 95:
chapter 3Output:Knowledge Represent
Page 96 and 97:
3.2 DECISION TREES 63Alternatively,
Page 98 and 99:
3.3 CLASSIFICATION RULES 65nated by
Page 100 and 101:
3.3 CLASSIFICATION RULES 671abx = 1
Page 102 and 103:
3.4 ASSOCIATION RULES 69yes, then i
Page 104 and 105:
3.5 RULES WITH EXCEPTIONS 71If peta
Page 106 and 107: 3.6 RULES INVOLVING RELATIONS 73ica
Page 108 and 109: Standard relations include equality
Page 110 and 111: PRP =-56.1+0.049 MYCT+0.015 MMIN+0.
Page 112 and 113: 3.8 INSTANCE-BASED REPRESENTATION 7
Page 114 and 115: 3.9 Clusters3.9 CLUSTERS 81When clu
Page 116 and 117: chapter 4Algorithms:The Basic Metho
Page 118 and 119: 4.1 INFERRING RUDIMENTARY RULES 85F
Page 120 and 121: described overfitting-avoidance bia
Page 122 and 123: 4.2 STATISTICAL MODELING 89Table 4.
Page 124 and 125: just as we calculated previously. A
Page 126 and 127: 4.2 STATISTICAL MODELING 93Table 4.
Page 128 and 129: of a document. Instead, a document
Page 130 and 131: 4.3 DIVIDE-AND-CONQUER: CONSTRUCTIN
Page 138 and 139: 4.4 COVERING ALGORITHMS: CONSTRUCTI
Page 146 and 147: 4.5 MINING ASSOCIATION RULES 113acc
Page 148 and 149: 4.5 MINING ASSOCIATION RULES 115Tab
Page 150 and 151: 4.5 MINING ASSOCIATION RULES 117whi
Page 152 and 153: 4.6 LINEAR MODELS 119through the da
Page 154 and 155: 4.6 LINEAR MODELS 121However, linea
Page 158 and 159: 4.6 LINEAR MODELS 125Set all weight
Page 160 and 161: 4.6 LINEAR MODELS 127While some ins
Page 162 and 163: 4.7 INSTANCE-BASED LEARNING 129When
Page 164 and 165: 4.7 INSTANCE-BASED LEARNING 131Figu
Page 170 and 171: 4.8 CLUSTERING 137As we saw in Sect
Page 172 and 173: 4.9 FURTHER READING 139can be updat
Page 174 and 175: 4.9 FURTHER READING 141Bayes was an
Page 176 and 177: chapter 5Credibility:Evaluating Wha
Page 178 and 179: 5.1 TRAINING AND TESTING 145of each
Page 180 and 181: ather than error rate, so this corr
Page 182 and 183: 5.3 CROSS-VALIDATION 149mediate con
Page 184 and 185: 5.4 OTHER ESTIMATES 151A single 10-
Page 186 and 187: 90% used in 10-fold cross-validatio
Page 188 and 189: 5.5 COMPARING DATA MINING METHODS 1
Page 190 and 191: 5.6 PREDICTING PROBABILITIES 157In
Page 192 and 193: 5.6 PREDICTING PROBABILITIES 159whe
Page 194 and 195: mental job expected of a loss funct
Page 196 and 197: y the total number of positives, wh
Page 198 and 199: 5.7 COUNTING THE COST 165different
Page 200 and 201: 5.7 COUNTING THE COST 167Table 5.6D
Page 202 and 203: 5.7 COUNTING THE COST 169100%80%tru
Page 204 and 205: 5.7 COUNTING THE COST 171should cho
Page 206 and 207:
Different terms are used in differe
Page 208 and 209:
5.7 COUNTING THE COST 1750.5Anormal
Page 210 and 211:
5.8 EVALUATING NUMERIC PREDICTION 1
Page 212 and 213:
5.9 THE MINIMUM DESCRIPTION LENGTH
Page 214 and 215:
5.9 THE MINIMUM DESCRIPTION LENGTH
Page 216 and 217:
5.10 APPLYING THE MDL PRINCIPLE TO
Page 218:
5.11 FURTHER READING 185tion theory
Page 221 and 222:
188 CHAPTER 6 | IMPLEMENTATIONS: RE
Page 223 and 224:
Page 225 and 226:
Page 227 and 228:
Page 229 and 230:
Page 231 and 232:
Page 233 and 234:
Page 235 and 236:
Page 237 and 238:
Page 239:
Page 243 and 244:
Page 245 and 246:
Page 247 and 248:
Page 249 and 250:
Page 251 and 252:
Page 253 and 254:
Page 255 and 256:
Page 257 and 258:
Page 259 and 260:
Page 261 and 262:
Page 263 and 264:
Page 265 and 266:
Page 267 and 268:
Page 269 and 270:
Page 271 and 272:
Page 273 and 274:
Page 275 and 276:
Page 277 and 278:
Page 279 and 280:
Page 281 and 282:
Page 283 and 284:
Page 285 and 286:
Page 287 and 288:
Page 289 and 290:
Page 291 and 292:
Page 293 and 294:
Page 295 and 296:
Page 297 and 298:
Page 299 and 300:
Page 301 and 302:
Page 303 and 304:
Page 305 and 306:
Page 307 and 308:
Page 309 and 310:
Page 311 and 312:
Page 313 and 314:
Page 315 and 316:
Page 318 and 319:
chapter 7Transformations:Engineerin
Page 320 and 321:
7.1 ATTRIBUTE SELECTION 287attribut
Page 322 and 323:
7.1 ATTRIBUTE SELECTION 289and less
Page 324 and 325:
tion—and it is much easier to und
Page 326 and 327:
7.1 ATTRIBUTE SELECTION 293outlook
Page 328 and 329:
7.1 ATTRIBUTE SELECTION 295the t-te
Page 330 and 331:
7.2 DISCRETIZING NUMERIC ATTRIBUTES
Page 332 and 333:
Page 334 and 335:
Page 336 and 337:
Page 338 and 339:
7.3 SOME USEFUL TRANSFORMATIONS 305
Page 340 and 341:
Page 342 and 343:
Page 344 and 345:
Page 346 and 347:
7.4 AUTOMATIC DATA CLEANSING 313Int
Page 348 and 349:
7.5 COMBINING MULTIPLE MODELS 315da
Page 350 and 351:
7.5 COMBINING MULTIPLE MODELS 317pa
Page 352 and 353:
7.5 COMBINING MULTIPLE MODELS 319mo
Page 354 and 355:
7.5 COMBINING MULTIPLE MODELS 321Ra
Page 356 and 357:
7.5 COMBINING MULTIPLE MODELS 323Ho
Page 358 and 359:
7.5 COMBINING MULTIPLE MODELS 325Th
Page 360 and 361:
7.5 COMBINING MULTIPLE MODELS 327of
Page 362 and 363:
7.5 COMBINING MULTIPLE MODELS 329ou
Page 364 and 365:
7.5 COMBINING MULTIPLE MODELS 331ev
Page 366 and 367:
7.5 COMBINING MULTIPLE MODELS 333be
Page 368 and 369:
7.5 COMBINING MULTIPLE MODELS 335Ta
Page 370 and 371:
7.6 Using unlabeled data7.6 USING U
Page 372 and 373:
7.6 USING UNLABELED DATA 339automat
Page 374 and 375:
7.7 FURTHER READING 341deal with we
Page 376:
7.7 FURTHER READING 343Domingos (19
Page 379 and 380:
346 CHAPTER 8 | MOVING ON: EXTENSIO
Page 381 and 382:
Page 383 and 384:
Page 385 and 386:
Page 387 and 388:
Page 389 and 390:
Page 391 and 392:
Page 393 and 394:
Page 395 and 396:
Page 398 and 399:
chapter 9Introduction to WekaExperi
Page 400 and 401:
9.2 How do you use it?9.2 HOW DO YO
Page 402 and 403:
chapter 10The ExplorerWeka’s main
Page 404 and 405:
10.1 GETTING STARTED 371(a)(b)(c)Fi
Page 406 and 407:
10.1 GETTING STARTED 373deviation.
Page 408 and 409:
10.1 GETTING STARTED 375=== Run inf
Page 410 and 411:
10.1 GETTING STARTED 377Doing it ag
Page 412 and 413:
10.1 GETTING STARTED 379(a)(b)Figur
Page 414 and 415:
10.2 EXPLORING THE EXPLORER 381(a)(
Page 416 and 417:
10.2 EXPLORING THE EXPLORER 383(b)(
Page 418 and 419:
10.2 EXPLORING THE EXPLORER 385(a)(
Page 420 and 421:
10.2 EXPLORING THE EXPLORER 387+ 0.
Page 422 and 423:
10.2 EXPLORING THE EXPLORER 389posi
Page 424 and 425:
10.2 EXPLORING THE EXPLORER 391Figu
Page 426 and 427:
10.3 FILTERING ALGORITHMS 393method
Page 428 and 429:
10.3 FILTERING ALGORITHMS 395(b)Fig
Page 430 and 431:
10.3 FILTERING ALGORITHMS 397or fir
Page 432 and 433:
10.3 FILTERING ALGORITHMS 399attrib
Page 434 and 435:
10.3 FILTERING ALGORITHMS 401it int
Page 436 and 437:
10.4 LEARNING ALGORITHMS 403There i
Page 438 and 439:
10.4 LEARNING ALGORITHMS 405Table 1
Page 440 and 441:
10.4 LEARNING ALGORITHMS 407Figure
Page 442 and 443:
10.4 LEARNING ALGORITHMS 409value (
Page 444 and 445:
10.4 LEARNING ALGORITHMS 411Neural
Page 446 and 447:
10.4 LEARNING ALGORITHMS 413there a
Page 448 and 449:
10.5 METALEARNING ALGORITHMS 415Tab
Page 450 and 451:
10.5 METALEARNING ALGORITHMS 417Com
Page 452 and 453:
10.7 ASSOCIATION-RULE LEARNERS 419N
Page 454 and 455:
10.8 ATTRIBUTE SELECTION 421Table 1
Page 456 and 457:
10.8 ATTRIBUTE SELECTION 423tribute
Page 458:
10.8 ATTRIBUTE SELECTION 425with on
Page 461 and 462:
428 CHAPTER 11 | THE KNOWLEDGE FLOW
Page 463 and 464:
Page 465 and 466:
Page 467 and 468:
Page 470 and 471:
chapter 12The ExperimenterThe Explo
Page 472 and 473:
12.1 GETTING STARTED 439Dataset,Run
Page 474 and 475:
12.2 SIMPLE SETUP 441cance test of
Page 476 and 477:
12.4 THE ANALYZE PANEL 443advanced
Page 478 and 479:
12.5 DISTRIBUTING PROCESSING OVER S
Page 480:
12.5 DISTRIBUTING PROCESSING OVER S
Page 483 and 484:
450 CHAPTER 13 | THE COMMAND-LINE I
Page 485 and 486:
Page 487 and 488:
Page 489 and 490:
Page 491 and 492:
Page 494 and 495:
chapter 14Embedded Machine Learning
Page 496 and 497:
14.2 GOING THROUGH THE CODE 463/***
Page 498 and 499:
14.2 GOING THROUGH THE CODE 465// I
Page 500 and 501:
14.2 GOING THROUGH THE CODE 467}//
Page 502:
14.2 GOING THROUGH THE CODE 469filt
Page 505 and 506:
472 CHAPTER 15 | WRITING NEW LEARNI
Page 507 and 508:
Page 509 and 510:
Page 511 and 512:
Page 513 and 514:
Page 515 and 516:
Page 518 and 519:
ReferencesAdriaans, P., and D. Zant
Page 520 and 521:
Bouckaert, R. R. 2004. Bayesian net
Page 522 and 523:
REFERENCES 489Cypher, A., editor. 1
Page 524 and 525:
REFERENCES 491Fix, E., and J. L. Ho
Page 526 and 527:
REFERENCES 493Gennari, J. H., P. La
Page 528 and 529:
Conference on Knowledge Discovery a
Page 530 and 531:
REFERENCES 497Kushmerick, N., D. S.
Page 532 and 533:
Moore, A. W., and M. S. Lee. 1994.
Page 534 and 535:
REFERENCES 501editor, Proceedings o
Page 536:
REFERENCES 503Webb, G. I., J. Bough
Page 539 and 540:
506 INDEXanomaly detection systems,
Page 541 and 542:
508 INDEXcausal relations, 350CfsSu
Page 543 and 544:
510 INDEXCSVLoader, 381cumulative m
Page 545 and 546:
512 INDEXExperimenter, 437-447advan
Page 547 and 548:
514 INDEXimplementation—real-worl
Page 549 and 550:
516 INDEXlistOptions(), 482literary
Page 551 and 552:
518 INDEXnumeric prediction (contin
Page 553 and 554:
520 INDEXrelational data, 49relatio
Page 555 and 556:
522 INDEXsupport vector, 216support
Page 557 and 558:
524 INDEXWinnow, 410Winnow algorith
show all

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?