Thinking-data-science-a-data-science-practitioners-guide

Recommendations

Info

PrefaceChapter 1 (Data Science Process) introduces you to the data science process that isfollowed by a modern data scientist in developing those highly acclaimed AIapplications. It describes both the traditional and modern approach followed by acurrent day data scientist in model building. In today’s world, a data scientist has todeal with not just the numeric data, but he needs to handle even text and imagedatasets. The high-frequency datasets are another major challenge for a data scientist.After this brief on model building, the chapter introduces you to the full datascience process. As we have a very large number of machine learning algorithms,which can apply to your datasets, the model development process becomes timeconsuming and resource intensive. The chapter introduces you to AutoML that easesthis model development process and hyper-parameter tuning for the selected algorithm.Finally, it introduces you to the modern approach of using deep neuralnetworks (DNNs) and transfer learning.Machine learning is based on data, more the data that you have; it makes learningbetter. Let us consider a simple example of identifying a person in a photo, video, orjust in real life. If you have a better knowledge or have more features of that personknown to you, the identification becomes a simple task. However, in machinelearning, the machine does not like this. In fact, we consider having many featuresa curse of dimensionality. This is mainly due to two reasons—we, human-beings,cannot visualize data beyond three dimensions and having many dimensionsdemands enormous resources and training times. Chapter 2 (Dimensionality Reduction)teaches you several techniques for bringing down the dimensions of yourdataset to a manageable level. The chapter gives you an exhaustive coverage ofdimensionality reduction techniques followed by a data scientist.After we prepare the dataset for machine learning, the data scientist’s major taskis to select an appropriate algorithm for the problem that he is trying to solve. TheClassical Algorithms Overview (Part I) gives you an overview of the variousalgorithms that you will study in the next few chapters.The model development task could be of a regression or classification type.Regression is a well-studied statistical technique and successfully implemented inv
viPrefacemachine learning algorithms. Chapter 3 (Regression Analysis) discusses severalregression algorithms, starting with simple linear to Ridge, Lasso, ElasticNet,Bayesian, Logistic, and so on. You will learn their practical implementations andhow to evaluate which best fits for a dataset.Chapter 4 (Decision Trees) deals with decision trees—a fundamental block formany machine learning algorithms. I give in-depth coverage to building and traversingthe trees. The projects in this chapter prove their importance for bothregression and classification problems.Chapter 5 (Ensemble: Bagging and Boosting) talks about the statistical ensemblemethods used to improve the performance of decision trees. It covers both baggingand boosting techniques. I cover several algorithms in each category, giving youdefinite guidelines on when to use them. You will learn several algorithms in thischapter, such as Random Forest, ExtraTrees, BaggingRegressor, andBaggingClassifier. Under boosting, you will learn AdaBoost, Gradient Boosting,XGBoost, CatBoost, and LIghtGBM. The chapter also presents a comparative studyon their performances, which will help you in taking your decisions on which one touse for your own datasets.Now, we move on to classification problems. The next three chapters cover K-Nearest Neighbors, Naive Bayes, and Support Vector Machines used forclassification.Chapter 6 (K-Nearest Neighbors) describes K-Nearest Neighbors, also calledKNN, which is the simplest and starting algorithm for classifications. I describethe algorithm in-depth along with the effect of K on the classification. I discuss thetechniques of obtaining the optimal K value for better classifications and finallyprovided guidelines on when to use this simple algorithm.Chapter 7 (Naive Bayes) describes Naive Bayes’ theorem and its advantages anddisadvantages. I also discuss the various types, such as Multinomial, Bernoulli,Gaussian, Complement, and Categorical Naive Bayes. The Naive Bayes is usefulin classifying huge datasets. A trivial project toward the end of the chapter brings outits importance.Now, we come to another important and widely researched classification algorithm,and that is SVM (Support Vector Machines). Chapter 8 (Support VectorMachines) gives an in-depth coverage to this algorithm. There are several types ofhyperplanes that divide the dataset into different classes. I fully discuss the effects ofthe kernel and its various types, such as Linear, Polynomial, Radial Basis, andSigmoid. I provide you with definite guidelines for kernel selection for your dataset.SVM also requires tuning of its several parameters, such as C, Degree, Gamma, andso on. You will learn parameter tuning. Toward the end, I present a project thatshows how to use SVM and concludes with SVM’s advantages and disadvantages inpractical situations.A data scientist need not have a deep knowledge of how these algorithms aredesigned? Having only a conceptual understanding of the purpose for which theywere designed suffices. So, in Chaps. 3 to 8, I focus on explaining the algorithm’sconcepts, avoiding mathematics on which we base them and giving more importanceto how we use them practically.
Page 1 and 2: The Springer Series in Applied Mach
Page 3 and 4: Editorial BoardOver the last decade
Page 5: Poornachandra SarangPracticing Data
Page 9 and 10: viiiPrefaceLarge Applications) is c
Page 11 and 12: Contents1 Data Science Process ....
Page 13 and 14: ContentsxiiiTree Traversal Algorith
Page 15 and 16: ContentsxvModel Fitting for Huge Da
Page 17 and 18: Contentsxvii13 BIRCH ..............
Page 19 and 20: Contentsxix18 ANN-Based Application
Page 21 and 22: Chapter 1Data Science ProcessWith l
Page 23 and 24: AI on Image Datasets 3creates a fin
Page 25 and 26: Data Science Process 5(NLU). So, we
Page 27 and 28: Data Science Process 7data and do n
Page 29 and 30: Data Science Process 9Fig. 1.3 Data
Page 31 and 32: Data Science Process 11use techniqu
Page 33 and 34: AutoML 13Fig. 1.9 Exhaustive list o
Page 35 and 36: Hyper-Parameter Tuning 15Fig. 1.11
Page 37 and 38: Models Based on Transfer Learning 1
Page 39 and 40: Chapter 2Dimensionality ReductionCr
Page 41 and 42: Dimensionality Reduction Techniques
Page 57 and 58:
Dimensionality Reduction Techniques
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Summary 51Fig. 2.25 LDA performance
Page 73 and 74:
Part IClassical Algorithms: Overvie
Page 75 and 76:
Chapter 3Regression AnalysisA Well-
Page 77 and 78:
Regression Types 57Generally, these
Page 79 and 80:
Regression Types 59Polynomial Regre
Page 81 and 82:
Regression Types 61ElasticNet Regre
Page 83 and 84:
Regression Types 63As before, you p
Page 85 and 86:
Bayesian Linear Regression 65Fig. 3
Page 87 and 88:
Bayesian Linear Regression 67You ma
Page 89 and 90:
Logistic Regression 69Fig. 3.8 Sigm
Page 91 and 92:
Logistic Regression 71Fig. 3.9 Asso
Page 93 and 94:
Summary 73those. I discussed only a
Page 95 and 96:
76 4 Decision Treegrades, etc. by b
Page 97 and 98:
78 4 Decision TreeFig. 4.2 Balanced
Page 99 and 100:
80 4 Decision Tree¼ 1X Ci¼1pi ð
Page 101 and 102:
82 4 Decision TreeInformation gain
Page 103 and 104:
84 4 Decision TreeTable 4.2 Subset
Page 105 and 106:
86 4 Decision TreeTable 4.4 Subset
Page 107 and 108:
88 4 Decision TreeFig. 4.6 Final tr
Page 109 and 110:
90 4 Decision TreeFig. 4.7 Housing
Page 111 and 112:
92 4 Decision TreeEvaluating Perfor
Page 113 and 114:
94 4 Decision TreeFig. 4.9 Features
Page 115 and 116:
96 4 Decision TreeFig. 4.11 Decisio
Page 117 and 118:
98 5 Ensemble: Bagging and Boosting
Page 119 and 120:
100 5 Ensemble: Bagging and Boostin
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Chapter 6K-Nearest NeighborsA Super
Page 151 and 152:
KNN Working 133• Step-3: For each
Page 153 and 154:
Advantages 135Table 6.2 Euclidean d
Page 155 and 156:
Project 137those who are further aw
Page 157 and 158:
Project 139#Setup arrays to store t
Page 159 and 160:
Summary 141Fig. 6.7 Classification
Page 161 and 162:
144 7 Naive BayesWhen to Use?A few
Page 163 and 164:
146 7 Naive BayesNow we substitute
Page 165 and 166:
148 7 Naive BayesNaive Bayes TypesD
Page 167 and 168:
150 7 Naive BayesPreparing DatasetT
Page 169 and 170:
152 7 Naive BayesThe output in this
Page 171 and 172:
154 8 Support Vector MachinesFig. 8
Page 173 and 174:
156 8 Support Vector MachinesComple
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
164 8 Support Vector MachinesTable
Page 183 and 184:
Part IIClustering: OverviewClusteri
Page 185 and 186:
Part II Clustering: Overview 169fal
Page 187 and 188:
Chapter 9Centroid-Based ClusteringC
Page 189 and 190:
The K-Means Algorithm 173observatio
Page 191 and 192:
The K-Means Algorithm 175• Elbow
Page 193 and 194:
The K-Means Algorithm 177The Gap St
Page 195 and 196:
The K-Means Algorithm 179ProjectWe
Page 197 and 198:
The K-Medoids Algorithm 181in the c
Page 199 and 200:
Summary 183Fig. 9.11 Clustering wit
Page 201 and 202:
186 10 Connectivity-Based Clusterin
Page 203 and 204:
Page 205 and 206:
Page 207 and 208:
Page 209 and 210:
Page 211 and 212:
Chapter 11Gaussian Mixture ModelA P
Page 213 and 214:
Probability Distribution 199For a m
Page 215 and 216:
Project 201Fig. 11.3 Features and t
Page 217 and 218:
Determining Optimal Number of Clust
Page 219 and 220:
Determining Optimal Number of Clust
Page 221 and 222:
Summary 207SummaryThe Gaussian mixt
Page 223 and 224:
210 12 Density-Based ClusteringFig.
Page 225 and 226:
212 12 Density-Based ClusteringTher
Page 227 and 228:
Page 229 and 230:
Page 231 and 232:
Page 233 and 234:
Page 235 and 236:
Page 237 and 238:
Page 239 and 240:
Page 241 and 242:
Page 243 and 244:
230 13 BIRCHTo understand the BIRCH
Page 245 and 246:
232 13 BIRCH3. Apply existing clust
Page 247 and 248:
234 13 BIRCHFig. 13.4 BIRCH cluster
Page 249 and 250:
236 13 BIRCHYou observe that the cl
Page 251 and 252:
238 14 CLARANS1. Select multiple su
Page 253 and 254:
240 14 CLARANSProjectFor this proje
Page 255 and 256:
242 14 CLARANSThe output shows the
Page 257 and 258:
244 15 Affinity Propagation Cluster
Page 259 and 260:
Page 261 and 262:
Page 263 and 264:
Page 265 and 266:
252 16 STING & CLIQUEFig. 16.1 STIN
Page 267 and 268:
254 16 STING & CLIQUEPros/ConsHere
Page 269 and 270:
256 16 STING & CLIQUEIntervals = 30
Page 271 and 272:
Part IIIANN: OverviewSo far, you ha
Page 273 and 274:
262 17 Artificial Neural NetworksTh
Page 275 and 276:
264 17 Artificial Neural NetworksNe
Page 277 and 278:
266 17 Artificial Neural NetworksIm
Page 279 and 280:
268 17 Artificial Neural NetworksNo
Page 281 and 282:
270 17 Artificial Neural NetworksFi
Page 283 and 284:
272 17 Artificial Neural Networks
Page 285 and 286:
Page 287 and 288:
Page 289 and 290:
Page 291 and 292:
280 17 Artificial Neural NetworksFo
Page 293 and 294:
Page 295 and 296:
284 17 Artificial Neural Networksbi
Page 297 and 298:
Page 299 and 300:
Chapter 18ANN-Based ApplicationsTex
Page 301 and 302:
Developing NLP Applications 291This
Page 303 and 304:
Developing NLP Applications 293The
Page 305 and 306:
Developing NLP Applications 295bert
Page 307 and 308:
Developing NLP Applications 297For
Page 309 and 310:
Developing NLP Applications 299test
Page 311 and 312:
Developing NLP Applications 301We w
Page 313 and 314:
Developing NLP Applications 303Fig.
Page 315 and 316:
Developing NLP Applications 305Mode
Page 317 and 318:
Developing NLP Applications 307Fig.
Page 319 and 320:
Developing NLP Applications 309Glov
Page 321 and 322:
Fig. 18.15 Network summaryFig. 18.1
Page 323 and 324:
Developing Image-Based Applications
Page 325 and 326:
Page 327 and 328:
Page 329 and 330:
Page 331 and 332:
Page 333 and 334:
Page 335 and 336:
Page 337 and 338:
Summary 327SummaryWe have several p
Page 339 and 340:
330 19 Automated Toolsseveral combi
Page 341 and 342:
332 19 Automated Toolsauto-sklearn
Page 343 and 344:
334 19 Automated ToolsIt produced t
Page 345 and 346:
336 19 Automated ToolsFig. 19.3 Reg
Page 347 and 348:
338 19 Automated Toolson the best p
Page 349 and 350:
340 19 Automated ToolsFig. 19.6 Mod
Page 351 and 352:
342 19 Automated Toolsestablish the
Page 353 and 354:
344 19 Automated ToolsFig. 19.9 Net
Page 355 and 356:
346 19 Automated ToolsThe plot for
Page 357 and 358:
348 19 Automated ToolsTPOTThis is a
Page 359 and 360:
Chapter 20Data Scientist’s Ultima
Page 361 and 362:
Workflow-0: Quick Solution 353Workf
Page 363 and 364:
Workflow-4: Features Engineering 35
Page 365 and 366:
Workflow-11: Clustering 357Workflow
show all

Thinking-data-science-a-data-science-practitioners-guide

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?