18.02.2015 Views

Berry

Berry

Berry

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Cumulative Proportion<br />

The Lure of Statistics: Data Mining Using Familiar Tools 127<br />

Looking at Discrete Values<br />

Much of the data used in data mining is discrete by nature, rather than continuous.<br />

Discrete data shows up in the form of products, channels, regions, and<br />

descriptive information about businesses. This section discusses ways of looking<br />

at and analyzing discrete fields.<br />

Histograms<br />

The most basic descriptive statistic about discrete fields is the number of<br />

times different values occur. Figure 5.1 shows a histogram of stop reason codes<br />

during a period of time. A histogram shows how often each value occurs in the<br />

data and can have either absolute quantities (204 times) or percentage (14.6<br />

percent). Often, there are too many values to show in a single histogram such<br />

as this case where there are over 30 additional codes grouped into the “other”<br />

category.<br />

In addition to the values for each category, this histogram also shows the<br />

cumulative proportion of stops, whose scale is shown on the left-hand side.<br />

Through the cumulative histogram, it is possible to see that the top three codes<br />

account for about 50 percent of stops, and the top 10, almost 90 percent. As an<br />

aesthetic note, the grid lines intersect both the left- and right-hand scales at<br />

sensible points, making it easier to read values off of the chart.<br />

12,500<br />

100%<br />

10,000<br />

10,048<br />

80%<br />

Number of Stops<br />

7,500<br />

5,000<br />

5,944<br />

3,851<br />

3,549<br />

3,311<br />

3,054<br />

4,884<br />

60%<br />

40%<br />

2,500<br />

20%<br />

1,491 1,306 1,226 1,108<br />

0<br />

0%<br />

TI NO OT VN PE CM CP NR MV EX<br />

OTHER<br />

Stop Reason Code<br />

Figure 5.1 This example shows both a histogram (as a vertical bar chart) and cumulative<br />

proportion (as a line) on the same chart for stop reasons associated with a particular<br />

marketing effort.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!