18.02.2015 Views

Berry

Berry

Berry

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Preparing Data for Mining 579<br />

56<br />

54<br />

52<br />

50<br />

y = 0.1007x + 49.625<br />

R 2 = 0.0135<br />

48<br />

46<br />

44<br />

1 2 3 4 5 6 7 8 9 10 11 12<br />

Figure 17.15 The slope of the line of best fit provides a good measure of<br />

changes over time.<br />

This example shows a very typical use for calculating the slope—finding the<br />

slope over the previous year’s usage or billing patterns. The tabular format<br />

shows the calculation in a way most suitable for a spreadsheet. However,<br />

many data mining tools provide a function to calculate beta values directly<br />

from a set of variables in a single row. When such a function is not available, it<br />

is possible to express it using more basic arithmetic functions.<br />

Although monthly data is often the most convenient for such calculations,<br />

remember that different months have different numbers of days. This issue is<br />

particularly significant for businesses that have strong weekly cycles. Some<br />

months have five full weekends, for instance, while others only have four. Different<br />

months have between 20 and 23 working days (not including holidays).<br />

These differences can account for up to 25 percent of the difference between<br />

months. When working with data that has such cycles, it is a good idea to calculate<br />

the “average per weekend” or “average per working day” to see how<br />

the chosen measure is changing over time.<br />

TIP When working with data that has weekly cycles but must be reported by<br />

month, consider variables such as “average per weekend day” or “average per<br />

work day” so that comparisons between months are more meaningful.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!