01.11.2020 Views

Machine Learning in Python Essential Techniques for Predictive Analysis by Michael Bowles (z-lib.org).epub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 2

Understand the Problem by

Understanding the Data

A new data set (problem) is a wrapped gift. It’s full of promise and

anticipation at the miracles you can wreak once you’ve solved it. But

it remains a mystery until you’ve opened it. This chapter is about

opening up your new data set so you can see what’s inside, get an

appreciation for what you’ll be able to do with the data, and start

thinking about how you’ll approach model building with it.

This chapter has two purposes. One is to familiarize you with data

sets that will be used later as examples of different types of problems

to be solved using the algorithms you’ll learn in Chapter 4,

“Penalized Linear Regression,” and Chapter 6, “Ensemble Methods.”

The other purpose is to demonstrate some of the tools available in

Python for data exploration.

The chapter uses a simple example to review some basic problem

structure, nomenclature, and characteristics of a machine learning

data set. The language introduced in this section will be used

throughout the rest of the book. After establishing some common

language, the chapter goes one by one through several different types

of function approximation problems. These problems illustrate

common variations of machine learning problems so that you’ll know

how to recognize the variants when you see them and will know how

to handle them (and will have code examples for them).

The Anatomy of a New Problem

The algorithms covered in this book start with a matrix (or table) full

of numbers and perhaps some character variables. The example in

Table 2.1 establishes some nomenclature and represents a small

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!