11.04.2024 Views

Thinking-data-science-a-data-science-practitioners-guide

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Preface

Chapter 1 (Data Science Process) introduces you to the data science process that is

followed by a modern data scientist in developing those highly acclaimed AI

applications. It describes both the traditional and modern approach followed by a

current day data scientist in model building. In today’s world, a data scientist has to

deal with not just the numeric data, but he needs to handle even text and image

datasets. The high-frequency datasets are another major challenge for a data scientist.

After this brief on model building, the chapter introduces you to the full data

science process. As we have a very large number of machine learning algorithms,

which can apply to your datasets, the model development process becomes time

consuming and resource intensive. The chapter introduces you to AutoML that eases

this model development process and hyper-parameter tuning for the selected algorithm.

Finally, it introduces you to the modern approach of using deep neural

networks (DNNs) and transfer learning.

Machine learning is based on data, more the data that you have; it makes learning

better. Let us consider a simple example of identifying a person in a photo, video, or

just in real life. If you have a better knowledge or have more features of that person

known to you, the identification becomes a simple task. However, in machine

learning, the machine does not like this. In fact, we consider having many features

a curse of dimensionality. This is mainly due to two reasons—we, human-beings,

cannot visualize data beyond three dimensions and having many dimensions

demands enormous resources and training times. Chapter 2 (Dimensionality Reduction)

teaches you several techniques for bringing down the dimensions of your

dataset to a manageable level. The chapter gives you an exhaustive coverage of

dimensionality reduction techniques followed by a data scientist.

After we prepare the dataset for machine learning, the data scientist’s major task

is to select an appropriate algorithm for the problem that he is trying to solve. The

Classical Algorithms Overview (Part I) gives you an overview of the various

algorithms that you will study in the next few chapters.

The model development task could be of a regression or classification type.

Regression is a well-studied statistical technique and successfully implemented in

v

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!