Machine Learning - DISCo

More documents

Recommendations

Info

CHAPTER INTRODUCTION Ever since computers were invented, we have wondered whether they might be made to learn. If we could understand how to program them to learn-to improve automatically with experience-the impact would be dramatic. Imagine computers learning from medical records which treatments are most effective for new diseases, houses learning from experience to optimize energy costs based on the particular usage patterns of their occupants, or personal software assistants learning the evolving interests of their users in order to highlight especially relevant stories from the online morning newspaper. A successful understanding of how to make computers learn would open up many new uses of computers and new levels of competence and customization. And a detailed understanding of informationprocessing algorithms for machine learning might lead to a better understanding of human learning abilities (and disabilities) as well. We do not yet know how to make computers learn nearly as well as people learn. However, algorithms have been invented that are effective for certain types of learning tasks, and a theoretical understanding of learning is beginning to emerge. Many practical computer programs have been developed to exhibit useful types of learning, and significant commercial applications have begun to appear. For problems such as speech recognition, algorithms based on machine learning outperform all other approaches that have been attempted to date. In the field known as data mining, machine learning algorithms are being used routinely to discover valuable knowledge from large commercial databases containing equipment maintenance records, loan applications, financial transactions, medical records, and the like. As our understanding of computers continues to mature, it
Page 2 and 3: Machine Learning Tom M. Mitchell Pr
Page 4 and 5: xvi PREFACE A third principle that
Page 14 and 15: 2 MACHINE LEARNING seems inevitable
Page 16 and 17: 4 MACHINE LEARNING Artificial intel
Page 18 and 19: present. Notice in this last case t
Page 20 and 21: 4. if b is a not a final state in t
Page 22 and 23: 10 MACHINE LEARNING Below we descri
Page 24 and 25: 12 MACHINE LEARNING Experiment Gene
Page 26 and 27: particular features, then our progr
Page 28 and 29: 16 MACHINE LEARNING 1.4 HOW TO READ
Page 30 and 31: 18 MACHINE LEARNING 0 Learning invo
Page 32 and 33: CHAPTER CONCEPT LEARNING AND THE GE
Page 34 and 35: 22 MACHINE LEARNING The most genera
Page 36 and 37: 3 .2 2 .2 2 .2 = 96 distinct instan
Page 38 and 39: 1. Initialize h to the most specifi
Page 40 and 41: populated by many different algorit
Page 42 and 43: called the version space with respe
Page 44 and 45: As long as the sets G and S are wel
Page 46 and 47: 34 MACHINE LEARNING 1 S 1 : { } t S
Page 48 and 49: S 3: {) I S 4: I ( ) I Training Ex
Page 50 and 51: general? Clearly, the learner shoul
Page 52 and 53: size of this hypothesis space influ
Page 54 and 55: hypothesis h' in the power set that
Page 56 and 57:
44 MACHINE LEARNING Training exampl
Page 58 and 59:
The FINDS algorithm utilizes this g
Page 60 and 61:
2.2. Give the sequence of S and G b
Page 62 and 63:
50 MACHINE LEARNING 2.8. In this ch
Page 64 and 65:
CHAPTER DECISION TREE LEARNING Deci
Page 66 and 67:
54 MACHINE LEARNWG 3.3 APPROPRIATE
Page 68 and 69:
ID3(Examples, Targetattribute, Attr
Page 70 and 71:
elative to a collection of examples
Page 72 and 73:
the algorithm, in which the topmost
Page 74 and 75:
+ - + F: FIGURE 3.5 Hypothesis spac
Page 76 and 77:
Consider the difference between the
Page 78 and 79:
e true for instances that are class
Page 80 and 81:
further elaboration of the tree dec
Page 82 and 83:
original over the validation set. T
Page 84 and 85:
set, minus 1.96 times the estimated
Page 86 and 87:
74 MACHINE LEARNING where S1 throug
Page 88 and 89:
y the cost of the attribute, so tha
Page 90 and 91:
(a) What is the entropy of this col
Page 92 and 93:
80 MACHINE LEARNING Quinlan, J. R.,
Page 94 and 95:
4.1.1 Biological Motivation The stu
Page 96 and 97:
Straight E2' Ahead 1 1 1 30 Output
Page 98 and 99:
multilayer networks of such units a
Page 100 and 101:
the inputs are fed to multiple unit
Page 102 and 103:
we assume these are fixed during tr
Page 104 and 105:
gradient can be obtained by differe
Page 106 and 107:
94 MACHINE LEARNING descent step si
Page 108 and 109:
head hid 4 who'd hood . 0 bad hid +
Page 110 and 111:
B~c~~~o~~GATIO~(trainingaxamp~es, q
Page 112 and 113:
100 MACHINE LEARNING criterion. The
Page 114 and 115:
102 MACHINE LEARNING chain rule to
Page 116 and 117:
4.6 REMARKS ON THE BACKPROPAGATION
Page 118 and 119:
keep in mind that the network weigh
Page 120 and 121:
greater detailt. The network in Fig
Page 122 and 123:
110 MACHINE LEARNING Error versus w
Page 124 and 125:
I 112 MACHINE LEARNWG severe for sm
Page 126 and 127:
the target function quite well. Aft
Page 128 and 129:
number of training iterations was s
Page 130 and 131:
118 MACHINE LEARNING tkd should var
Page 132 and 133:
I 120 MACHINE LEARNING (4 Feedforwa
Page 134 and 135:
122 MACHINE LEARNING grown in this
Page 136 and 137:
simple functions such as XOR could
Page 138 and 139:
have problems with local minima? Ho
Page 140 and 141:
CHAPTER EVALUATING HYPOTHESES Empir
Page 142 and 143:
about whether x is a positive or ne
Page 144 and 145:
ConfidencelevelN%: 50% 68% 80% 90%
Page 146 and 147:
section, if we were to repeat this
Page 148 and 149:
4. The probability that the random
Page 150 and 151:
estimation bias is a numerical quan
Page 152 and 153:
mean p and standard deviation a. Fo
Page 154 and 155:
142 MACHINE LEARNING 5.4 A GENERAL
Page 156 and 157:
144 MACHINE LEARNING follow a distr
Page 158 and 159:
of the difference in their errors w
Page 160 and 161:
Confidence level N 90% 95% 98% 99%
Page 162 and 163:
above, in which we assume unlimited
Page 164 and 165:
Geman et al. (1992) discuss the tra
Page 166 and 167:
CHAPTER BAYESIAN LEARNING Bayesian
Page 168 and 169:
The remainder of this chapter is or
Page 170 and 171:
test with two possible outcomes: $
Page 172 and 173:
160 MACHINE LEARNING This algorithm
Page 174 and 175:
where IVSH,DI is the number of hypo
Page 176 and 177:
6.4 MAXIMUM LIKELIHOOD AND LEAST-SQ
Page 178 and 179:
Given that the noise ei obeys a Nor
Page 180 and 181:
Consider the setting in which we wi
Page 182 and 183:
similar, but makes the additional a
Page 184 and 185:
and a closely related principle cal
Page 186 and 187:
as its correct classification (whic
Page 188 and 189:
For example, in learning boolean co
Page 190 and 191:
e estimated from the training data
Page 192 and 193:
priors; that is, if an attribute ha
Page 194 and 195:
P(a, = wkJvj) for all i, j, k, m. T
Page 196 and 197:
TABLE 6.3 Twenty usenet newsgroups
Page 198 and 199:
S,B S,-B 7S.B 1s.-B -C 0.6 0.9 0.2
Page 200 and 201:
networks can be NP-hard (Dagum and
Page 202 and 203:
Applying Bayes theorem to rewrite P
Page 204 and 205:
FIGURE 6.4 Instances generated by a
Page 206 and 207:
6.12.2 General Statement of EM Algo
Page 208 and 209:
Finally we must take the expected v
Page 210 and 211:
There are many good introductory te
Page 212 and 213:
200 MACHINE LEARNING Heckerman, D.,
Page 214 and 215:
202 MACHINE LEARNING will make befo
Page 216 and 217:
describable by conjunctions of the
Page 218 and 219:
a hypothesis h for which errorD(h)
Page 220 and 221:
The significance of the version spa
Page 222 and 223:
210 MACHINE LEARNING Note that the
Page 224 and 225:
212 MACHINE LEARNING any given hypo
Page 226 and 227:
ewritten as a k-CNF expression (but
Page 228 and 229:
form a < x < b, where a and b may b
Page 230 and 231:
This theorem states that if the num
Page 232 and 233:
220 MACHINE LEARNING trains a netwo
Page 234 and 235:
222 MACHINE LEARNING This combinati
Page 236 and 237:
ai denotes the if* prediction algor
Page 238 and 239:
est possible hypothesis in H, after
Page 240 and 241:
7.5. Consider the space of instance
Page 242 and 243:
CHAPTER INSTANCE-BASED LEARNING In
Page 244 and 245:
I Euclidean distance. More precisel
Page 246 and 247:
This can be accomplished by replaci
Page 248 and 249:
One additional practical issue in a
Page 250 and 251:
this approach requires computation
Page 252 and 253:
A second approach is to choose a se
Page 254 and 255:
a "-" label indicates that the vari
Page 256 and 257:
244 MACHINE LEARNING captured by th
Page 258 and 259:
1 246 MACHINE LEARNING 0 Advantages
Page 260 and 261:
1 248 MACHINE LEARNING Bishop, C. M
Page 262 and 263:
a collection of hypotheses called t
Page 264 and 265:
Thus, the probability that a hypoth
Page 266 and 267:
The crossover operator produces two
Page 268 and 269:
1 256 MACHINE LEARNING tournament s
Page 270 and 271:
then the two resulting offspring wi
Page 272 and 273:
9.4.1 Population Evolution and the
Page 274 and 275:
containing a small number of define
Page 276 and 277:
FIGURE 9.3 A block-stacking problem
Page 278 and 279:
producing 10% of the successor popu
Page 280 and 281:
268 MACHINE LEARNING was allowed, t
Page 282 and 283:
the population itself is the rule s
Page 284 and 285:
Holland, J. H. (1986). Escaping bri
Page 286 and 287:
CHAPTER LEARNING SETS OF RULES One
Page 288 and 289:
positive examples and few of the ne
Page 290 and 291:
search, there is a danger that a su
Page 292 and 293:
above discussion is based. Like CN2
Page 294 and 295:
A final dimension is the particular
Page 296 and 297:
First-order Horn clauses may also r
Page 298 and 299:
more restricted than general Horn c
Page 300 and 301:
0 The negation of either of the abo
Page 302 and 303:
This Foil-Gain function has a strai
Page 304 and 305:
Note that the target literal Child(
Page 306 and 307:
It is easiest to introduce the reso
Page 308 and 309:
generate candidate hypotheses hi th
Page 310 and 311:
Father (Shannon, Tom) GrandChild(Bo
Page 312 and 313:
What is the relationship between th
Page 314 and 315:
This approach is exemplified by the
Page 316 and 317:
10.6. Apply inverse resolution to t
Page 318 and 319:
Quinlan, J. R. (1991). Improved est
Page 320 and 321:
algorithms are all examples of indu
Page 322 and 323:
11.1.1 Inductive and Analytical Lea
Page 324 and 325:
describing why black would lose its
Page 326 and 327:
PROWG-EBG(TargetConcept, TrainingEx
Page 328 and 329:
The above rule constitutes a signif
Page 330 and 331:
R~~~~ss(Frontier, Rule, Literal, &i
Page 332 and 333:
and (b) classify the observed train
Page 334 and 335:
alternative hypotheses that satisfy
Page 336 and 337:
call LEMMA-ENUMERATOR. The LEMMA-EN
Page 338 and 339:
Exactly how should we formulate the
Page 340 and 341:
and deleting rules later found to h
Page 342 and 343:
(1983); and Silver (1983). DeJong a
Page 344 and 345:
DeJong, G. (1981). Generalizations
Page 346 and 347:
CHAPTER COMBINING INDUCTIVE AND ANA
Page 348 and 349:
Inductive learning plentiful data N
Page 350 and 351:
the data a little better at the exp
Page 352 and 353:
addition of multiple literals in a
Page 354 and 355:
Domain theory: Cup t Stable, Lzpabl
Page 356 and 357:
344 MACHINE LEARNING Expensive a;.-
Page 358 and 359:
Hypothesis Space Initial hypothesis
Page 360 and 361:
providing training derivatives, or
Page 362 and 363:
12.4.3 Remarks To summarize, TANGEN
Page 364 and 365:
explained by the domain theory and
Page 366 and 367:
354 MACHINE LEARNING hold when x =
Page 368 and 369:
EBNN generalizes more accurately th
Page 370 and 371:
Cup C [2+,3-I I( i \ Cup C Fragile
Page 372 and 373:
preconditions, thereby allowing the
Page 374 and 375:
12.7 SUMMARY AND FURTHER READING Th
Page 376 and 377:
each hypothesis h(x) is of the form
Page 378 and 379:
Thrun, S. (1996). Explanation based
Page 380 and 381:
This chapter is concerned with how
Page 382 and 383:
ehind it. In such cases, it may be
Page 384 and 385:
(s, a) (immediate reward) values Q(
Page 386 and 387:
374 MACHINE LEARNING 13.3.1 The Q F
Page 388 and 389:
ounded, and actions are chosen so t
Page 390 and 391:
convergence results later. However,
Page 392 and 393:
goal state and placing it at a new
Page 394 and 395:
which is the generalization of the
Page 396 and 397:
with no contribution from the curre
Page 398 and 399:
to solving MDPs. Bellman's equation
Page 400 and 401:
EXERCISES 13.1. Give a second optim
Page 402 and 403:
Mahadevan, S. (1996). Average rewar
Page 404 and 405:
E : r]: P: n: VE(G): C: D : A bound
Page 406 and 407:
400 SUBJECT INDEX SUBJECT INDEX Pag
Page 408 and 409:
CART system, 77 CASCADE-CORRELATION
Page 410 and 411:
Entailment, 321n relationship with
Page 412 and 413:
Hypotheses, estimation of accuracy
Page 414 and 415:
LEARN-ONE-RULE algorithm: FOIL algo
Page 416 and 417:
410 SUBJECT INDEX Normal distributi
Page 418 and 419:
Resolution rule, 293-294 first-orde
Page 420:
Variables, in logic, 284, 285 Varia
show all

Machine Learning - DISCo

Create successful ePaper yourself

Delete template?

Save as template?