11.04.2024 Views

Thinking-data-science-a-data-science-practitioners-guide

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6 1 Data Science Process

Fig. 1.1 Processing

numeric fields

Numeric Data Processing

Using numeric data comes in application developments like weather forecasting,

predicting sales during Christmas, deciding on the bid price for your house during an

auction, and so on.

The workflow required in processing numeric data is given in Fig. 1.1.

As you know, several times, the database may contain null values in a numeric

field. If you have a sizable amount of data available for machine learning, you may

decide to delete rows containing null fields. However, if you do not have enough

data points, you will usually replace such null fields with their mean/median values.

If the database is not indexed, you may find duplicate rows in a table; you will

remove such duplicates.

Each column in the database may have a varied distribution, and across columns,

the min-max range may differ. Thus, all columnar data must be normalized and

scaled to the same scale. Typically, the data is scaled to a range of 1to+1or0to

1, for better training of the algorithm.

Now, I will talk about what kind of processing is required for text-based data.

Text Processing

The preprocessing of the text data depends on the type of application that you are

trying to develop. A character field in a database column may contain values like

Male and Female to show the gender. If this field is important for your model, you

must replace Male with 0 and Female with 1, say. That is to say, we must convert

such categorical data into a numeric value as the computers understand only binary

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!