Berry

More documents

Recommendations

Info

Putting Data Mining to Work 607 based on arithmetic operations. When data has many categorical variables, then decision trees are quite useful, although association rules and link analysis may be appropriate in some cases. Number of Input Fields In directed data mining applications, there should be a single target field or dependent variable. The rest of the fields (except for those that are either clearly irrelevant or clearly dependent on the target variable) are treated as potential inputs to the model. Data mining methods vary in their ability to successfully process large numbers of input fields. This can be a factor in deciding on the right technique for a particular application. In general, techniques that rely on adjusting a vector of weights that has an element for each input field run into trouble when the number of fields grows very large. Neural networks and memory-based reasoning share that trait. Association rules run into a different problem. The technique looks at all possible combinations of the inputs; as the number of inputs grows, processing the combinations becomes impossible to do in a reasonable amount of time. Decision-tree methods are much less hindered by large numbers of fields. As the tree is built, the decision-tree algorithm identifies the single field that contributes the most information at each node and bases the next segment of the rule on that field alone. Dozens or hundreds of other fields can come along for the ride, but won’t be represented in the final rules unless they contribute to the solution. TIP When faced with a large number of fields for a directed data mining problem, it is a good idea to start by building a decision tree, even if the final model is to be built using a different technique. The decision tree will identify a good subset of the fields to use as input to a another technique that might be swamped by the original set of input variables. Free-Form Text Most data mining techniques are incapable of directly handling free-form text. But clearly, text fields often contain extremely valuable information. When analyzing warranty claims submitted to an engine manufacturer by independent dealers, the mechanic’s free-form notes explaining what went wrong and what was done to fix the problem are at least as valuable as the fixed fields that show the part numbers and hours of labor used. One data mining technique that can deal with free text is memory-based reasoning, one of the nearest neighbor methods discussed in Chapter 8. Recall that memory-based reasoning is based on the ability to measure the distance
Page 1 and 2:
TEAMFLY
Page 6:
Data Mining Techniques For Marketin
Page 10:
To Stephanie, Sasha, and Nathaniel.
Page 16:
xx Acknowledgments And, of course,
Page 20:
TEAMFLY Team-Fly ®
Page 24:
xxiv Introduction Even if the techn
Page 30:
Contents Acknowledgments About the
Page 34:
Contents vii Learning Things That A
Page 38:
Contents ix Different Kinds of Chur
Page 42:
Contents xi Chapter 8 How Does a Ne
Page 46:
Contents xiii Case Study: Who Is Us
Page 50:
Contents xv Chapter 14 Data Mining
Page 54:
Contents xvii Availability of Train
Page 58:
CHAPTER 1 Why and What Is Data Mini
Page 62:
Why and What Is Data Mining? 3 In t
Page 66:
Why and What Is Data Mining? 5 many
Page 70:
Why and What Is Data Mining? 7 DATA
Page 74:
Why and What Is Data Mining? 9 Clas
Page 78:
Why and What Is Data Mining? 11 cho
Page 82:
Why and What Is Data Mining? 13 man
Page 86:
Why and What Is Data Mining? 15 Com
Page 90:
Why and What Is Data Mining? 17 sit
Page 94:
Why and What Is Data Mining? 19 And
Page 100:
22 Chapter 2 Data is at the heart o
Page 104:
24 Chapter 2 Marketing literature f
Page 108:
26 Chapter 2 What Is the Virtuous C
Page 112:
28 Chapter 2 that lurking inside th
Page 116:
30 Chapter 2 possible to identify t
Page 120:
32 Chapter 2 All of these measureme
Page 124:
34 Chapter 2 Data mining results ch
Page 128:
36 Chapter 2 Quota Savings Randomiz
Page 132:
38 Chapter 2 Some of these fields r
Page 136:
40 Chapter 2 How Data Mining Was Ap
Page 140:
42 Chapter 2 smaller group of likel
Page 144:
44 Chapter 3 years, the authors hav
Page 148:
46 Chapter 3 Ford is the only one w
Page 152:
48 Chapter 3 Figure 3.2 shows anoth
Page 156:
50 Chapter 3 The data mining method
Page 160:
52 Chapter 3 In the most general se
Page 164:
54 Chapter 3 of maleness. It seems
Page 168:
56 Chapter 3 Step One: Translate th
Page 172:
58 Chapter 3 ■■ ■■ ■■ C
Page 176:
60 Chapter 3 Data mining is often p
Page 180:
62 Chapter 3 These operational syst
Page 184:
64 Chapter 3 Often, variables that
Page 188:
66 Chapter 3 90% 80% 70% 60% 50% 40
Page 192:
68 Chapter 3 advantage as smarter p
Page 196:
70 Chapter 3 Including Multiple Tim
Page 200:
72 Chapter 3 People often find it h
Page 204:
74 Chapter 3 When missing values mu
Page 208:
76 Chapter 3 category, such as bake
Page 212:
78 Chapter 3 Step Eight: Assess Mod
Page 216:
80 Chapter 3 Percent of Row Frequen
Page 220:
82 Chapter 3 An example helps to ex
Page 224:
84 Chapter 3 Lift Value 1.5 1.4 1.3
Page 228:
86 Chapter 3 before. The newly disc
Page 232:
88 Chapter 4 comes from traditional
Page 236:
90 Chapter 4 based on price will no
Page 240:
92 Chapter 4 The problem with this
Page 244:
94 Chapter 4 DATA BY CENSUS TRACT T
Page 248:
96 Chapter 4 Actually, the first le
Page 252:
98 Chapter 4 ROC CURVES Models are
Page 256:
100 Chapter 4 The upper, curved lin
Page 260:
102 Chapter 4 BENEFIT (continued) A
Page 264:
104 Chapter 4 A smaller, better-tar
Page 268:
106 Chapter 4 Reaching the People M
Page 272:
108 Chapter 4 Difference in respons
Page 276:
110 Chapter 4 Among the most useful
Page 280:
112 Chapter 4 More typically, a bus
Page 284:
114 Chapter 4 Nonrepayment of debt
Page 288:
116 Chapter 4 Making Recommendation
Page 292:
118 Chapter 4 Retention campaigns c
Page 296:
120 Chapter 4 information than simp
Page 300:
122 Chapter 4 From a data mining pe
Page 304:
124 Chapter 5 What is remarkable an
Page 308:
126 Chapter 5 TIP The simplest expl
Page 312:
128 Chapter 5 Time Series Histogram
Page 316:
130 Chapter 5 The Central Limit The
Page 320:
132 Chapter 5 A QUESTION OF TERMINO
Page 324:
134 Chapter 5 small probability. Pr
Page 328:
136 Chapter 5 Cross-Tabulations Tim
Page 332:
138 Chapter 5 In addition, various
Page 336:
140 Chapter 5 the challenger offer.
Page 340:
142 Chapter 5 Table 5.2 The 95 Perc
Page 344:
144 Chapter 5 Table 5.3 The 95 Perc
Page 348:
146 Chapter 5 What the Confidence I
Page 352:
148 Chapter 5 says that with contro
Page 356:
150 Chapter 5 The appeal of the chi
Page 360:
152 Chapter 5 distribution depends
Page 364:
154 Chapter 5 Table 5.7 Chi-Square
Page 368:
156 Chapter 5 Table 5.8 Chi-Square
Page 372:
158 Chapter 5 100% 80% 60% 40% 20%
Page 376:
160 Chapter 5 There Is a Lot of Dat
Page 380:
162 Chapter 5 Figure 5.11 shows ano
Page 386:
CHAPTER 6 Decision Trees Decision t
Page 390:
Decision Trees 167 thinks of a part
Page 394:
Decision Trees 169 Scoring Figure 6
Page 398:
Decision Trees 171 50% tot units de
Page 402:
Decision Trees 173 The first split
Page 406:
Decision Trees 175 the best splits,
Page 410:
Decision Trees 177 Purity and Diver
Page 414:
Decision Trees 179 Entropy Reductio
Page 418:
Decision Trees 181 COMPARING TWO SP
Page 422:
Decision Trees 183 statistical rela
Page 426:
Decision Trees 185 The CART Pruning
Page 430:
Decision Trees 187 COMPARING MISCLA
Page 434:
Decision Trees 189 Picking the Best
Page 438:
Decision Trees 191 The trees grown
Page 442:
Decision Trees 193 WARNING Small no
Page 446:
Decision Trees 195 Taking Cost into
Page 450:
Decision Trees 197 Voter #1 and Vot
Page 454:
Decision Trees 199 Neural Trees One
Page 458:
Decision Trees 201 part of the targ
Page 462:
Decision Trees 203 Decision Trees i
Page 466:
Decision Trees 205 Applying Decisio
Page 470:
Decision Trees 207 USING DECISION T
Page 474:
Decision Trees 209 enjoyed using th
Page 480:
212 Chapter 7 probing neural networ
Page 484:
214 Chapter 7 of the value of the p
Page 488:
216 Chapter 7 Table 7.1 Common Feat
Page 492:
218 Chapter 7 Year_Built (1923), su
Page 496:
220 Chapter 7 The solution is to in
Page 500:
222 Chapter 7 Feed-forward networks
Page 504:
224 Chapter 7 magnitude of the weig
Page 508:
226 Chapter 7 Feed-Forward Neural N
Page 512:
228 Chapter 7 last purchase age gen
Page 516:
230 Chapter 7 TRAINING AS OPTIMIZAT
Page 520:
232 Chapter 7 networks now takes se
Page 524:
234 Chapter 7 Size of Training Set
Page 528:
236 Chapter 7 This transformation (
Page 532:
238 Chapter 7 Features with Ordered
Page 536:
240 Chapter 7 be mapped to -1.0, -0
Page 540:
242 Chapter 7 pattern the network f
Page 544:
244 Chapter 7 1.0 B B B B A A B 0.0
Page 548:
246 Chapter 7 Notice that the time-
Page 552:
248 Chapter 7 2. Measure the output
Page 556:
250 Chapter 7 The output units comp
Page 560:
252 Chapter 7 unknown instance is f
Page 564:
254 Chapter 7 The story continues w
Page 570:
CHAPTER 8 Nearest Neighbor Approach
Page 574:
Memory-Based Reasoning and Collabor
Page 578:
Page 582:
Page 586:
Page 590:
Page 594:
Page 598:
Page 602:
Page 606:
Page 610:
Page 614:
Page 618:
Page 622:
Page 626:
Page 630:
CHAPTER 9 Market Basket Analysis an
Page 634:
Market Basket Analysis and Associat
Page 638:
Page 642:
Page 646:
Page 650:
Page 654:
Page 658:
Page 662:
Page 666:
Page 670:
Page 674:
Page 678:
Page 682:
Page 686:
Page 690:
Page 694:
Page 698:
CHAPTER 10 Link Analysis The intern
Page 702:
Link Analysis 323 four people, all
Page 706:
Link Analysis 325 Bananas Red Leaf
Page 710:
Link Analysis 327 WHY DO THE DEGREE
Page 714:
Link Analysis 329 This lack of scal
Page 718:
Link Analysis 331 cannot be part of
Page 722:
Link Analysis 333 a link to Harvard
Page 726:
Link Analysis 335 Hubs Authorities
Page 730:
Link Analysis 337 There are many ap
Page 734:
Link Analysis 339 is good for guida
Page 738:
Link Analysis 341 Figure 10.10 show
Page 742:
Link Analysis 343 Case Study: Segme
Page 746:
Link Analysis 345 Jane also racks u
Page 750:
Link Analysis 347 Although link ana
Page 756:
350 Chapter 11 autumn, typically to
Page 760:
352 Chapter 11 Two different astron
Page 764:
354 Chapter 11 Unlike the tradition
Page 768:
356 Chapter 11 X 2 X 1 Figure 11.4
Page 772:
358 Chapter 11 Figure 11.6 These ex
Page 776:
360 Chapter 11 True measures are in
Page 780:
362 Chapter 11 The angle between ve
Page 784:
364 Chapter 11 But what if X is mea
Page 788:
366 Chapter 11 Gaussian mixture mod
Page 792:
368 Chapter 11 Agglomerative Cluste
Page 796:
370 Chapter 11 Clusters and Trees T
Page 800:
372 Chapter 11 algorithm is to supp
Page 804:
374 Chapter 11 Case Study: Clusteri
Page 808:
376 Chapter 11 Each of the scores o
Page 812:
378 Chapter 11 N W E S 0 2.5 5 mile
Page 816:
380 Chapter 11 Using Thematic Clust
Page 820:
TEAMFLY Team-Fly ®
Page 824:
384 Chapter 12 of loyalty—that th
Page 828:
386 Chapter 12 may be one-time only
Page 832:
388 Chapter 12 100% 90% 80% Percent
Page 836:
390 Chapter 12 100% 90% 80% Percent
Page 840:
392 Chapter 12 PARAMETRIC APPROACHE
Page 844:
394 Chapter 12 Hazards The precedin
Page 848:
396 Chapter 12 The same idea can be
Page 852:
398 Chapter 12 When the contract is
Page 856:
400 Chapter 12 time Figure 12.7 In
Page 860:
402 Chapter 12 Table 12.4 From Time
Page 864:
404 Chapter 12 These two customers
Page 868:
406 Chapter 12 At any point in time
Page 872:
408 Chapter 12 A NOTE ABOUT SURVIVA
Page 876:
410 Chapter 12 Stratification: Meas
Page 880:
412 Chapter 12 The biggest assumpti
Page 884:
Hazard Probability ("Risk" of React
Page 888:
416 Chapter 12 Number Actual Predic
Page 892:
418 Chapter 12 100% 90% 80% 70% Sur
Page 898:
CHAPTER 13 Genetic Algorithms Like
Page 902:
Genetic Algorithms 423 The first wo
Page 906:
Genetic Algorithms 425 GAs work by
Page 910:
Genetic Algorithms 427 SIMPLE OVERV
Page 914:
Genetic Algorithms 429 Selection Th
Page 918:
Genetic Algorithms 431 Table 13.4 T
Page 922:
Genetic Algorithms 433 the genome.
Page 926:
Genetic Algorithms 435 which were v
Page 930:
Genetic Algorithms 437 010 011 01*
Page 934:
Genetic Algorithms 439 Application
Page 938:
Genetic Algorithms 441 ■■ ■
Page 942:
Genetic Algorithms 443 The comment
Page 946:
Genetic Algorithms 445 easily confu
Page 950:
CHAPTER 14 Data Mining throughout t
Page 954:
Data Mining throughout the Customer
Page 958:
Page 962:
Page 966:
Page 970:
Page 974:
Page 978:
Page 982:
Page 986:
Page 990:
Page 994:
Page 998:
Page 1002:
CHAPTER 15 Data Warehousing, OLAP,
Page 1006:
Data Warehousing, OLAP, and Data Mi
Page 1010:
Page 1014:
Page 1018:
Page 1022:
Page 1026:
Page 1030:
Page 1034:
Page 1038:
Page 1042:
Page 1046:
Page 1050:
Page 1054:
Page 1058:
Page 1062:
Page 1066:
Page 1070:
Page 1074:
Page 1078:
Page 1082:
CHAPTER 16 Building the Data Mining
Page 1086:
Building the Data Mining Environmen
Page 1090:
Page 1094:
Page 1098:
Page 1102:
Page 1106:
Page 1110:
Page 1114:
Page 1118:
Page 1122:
Page 1126:
Page 1130:
Page 1134:
CHAPTER 17 Preparing Data for Minin
Page 1138:
Preparing Data for Mining 541 a sig
Page 1142:
Preparing Data for Mining 543 Histo
Page 1146:
Preparing Data for Mining 545 data
Page 1150:
Preparing Data for Mining 547 varia
Page 1154:
Preparing Data for Mining 549 Figur
Page 1158:
Preparing Data for Mining 551 7,000
Page 1162:
Preparing Data for Mining 553 Chara
Page 1166:
Preparing Data for Mining 555 Ameri
Page 1170:
Preparing Data for Mining 557 Our r
Page 1174:
Preparing Data for Mining 559 Once
Page 1178:
Preparing Data for Mining 561 RESI
Page 1182:
Preparing Data for Mining 563 This
Page 1186:
Preparing Data for Mining 565 error
Page 1190:
Preparing Data for Mining 567 10,00
Page 1194:
Preparing Data for Mining 569 The f
Page 1198:
Preparing Data for Mining 571 Somet
Page 1202:
Preparing Data for Mining 573 PIVOT
Page 1206:
Preparing Data for Mining 575 ■
Page 1210:
Preparing Data for Mining 577 Purch
Page 1214:
Preparing Data for Mining 579 56 54
Page 1218: Preparing Data for Mining 581 Data
Page 1222: Preparing Data for Mining 583 Estim
Page 1226: Preparing Data for Mining 585 There
Page 1230: Preparing Data for Mining 587 This
Page 1234: Preparing Data for Mining 589 Does
Page 1238: Preparing Data for Mining 591 Becau
Page 1242: Preparing Data for Mining 593 WARNI
Page 1246: Preparing Data for Mining 595 ■
Page 1250: CHAPTER 18 Putting Data Mining to W
Page 1254: Putting Data Mining to Work 599 Wha
Page 1258: Putting Data Mining to Work 601 A S
Page 1262: Putting Data Mining to Work 603 a c
Page 1266: Putting Data Mining to Work 605 Thi
Page 1272: 608 Chapter 18 from one record to a
Page 1276: 610 Chapter 18 are appropriate for
Page 1280: 612 Chapter 18 serial number and ph
Page 1284: 614 Chapter 18 plan allows. Since t
Page 1288: 616 Index analysis differential res
Page 1292: 618 Index auxiliary information, 56
Page 1296: 620 Index champion-challenger appro
Page 1300: 622 Index creative process, data mi
Page 1304: 624 Index data (continued) missing
Page 1308: 626 Index discrete outcomes, classi
Page 1312: 628 Index genetic algorithms case s
Page 1316: 630 Index intuition, data explorati
Page 1320:
632 Index memory-based reasoning (M
Page 1324:
634 Index new customer information
Page 1328:
636 Index proof-of-concept projects
Page 1332:
638 Index response, survey response
Page 1336:
640 Index SQL data, time series ana
Page 1340:
642 Index testing (continued) KS (K
show all

Berry

Create successful ePaper yourself

Delete template?

Save as template?