18.02.2015 Views

Berry

Berry

Berry

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Data Mining Methodology and Best Practices 59<br />

MISUNDERSTANDING THE BUSINESS PROBLEM: A CAUTIONARY TALE<br />

Data Miners, the consultancy started by the authors, was once called upon to<br />

analyze supermarket loyalty card data on behalf of a large consumer packaged<br />

goods manufacturer. To put this story in context, it helps to know a little bit<br />

about the supermarket business. In general, a supermarket does not care<br />

whether a customer buys Coke or Pepsi (unless one brand happens to be on a<br />

special deal that temporarily gives it a better margin), so long as the customer<br />

purchases soft drinks. Product manufacturers, who care very much which<br />

brands are sold, vie for the opportunity to manage whole categories in the<br />

stores. As category managers, they have some control over how their own<br />

products and those of their competitors are merchandised. Our client wanted to<br />

demonstrate its ability to utilize loyalty card data to improve category<br />

management. The category picked for the demonstration was yogurt because<br />

by supermarket standards, yogurt is a fairly high-margin product.<br />

As we understood it, the business goal was to identify yogurt lovers. To<br />

create a target variable, we divided loyalty card customers into groups of high,<br />

medium, and low yogurt affinity based on their total yogurt purchases over<br />

the course of a year and into groups of high, medium, and low users based<br />

on the proportion of their shopping dollars spent on yogurt. People who<br />

were in the high category by both measures were labeled as yogurt lovers.<br />

The transaction data had to undergo many transformations to be turned into<br />

a customer signature. Input variables included the proportion of trips and of<br />

dollars spent at various times of day and in various categories, shopping<br />

frequency, average order size, and other behavioral variables.<br />

Using this data, we built a model that gave all customers a yogurt lover score.<br />

Armed with such a score, it would be possible to print coupons for yogurt when<br />

likely yogurt lovers checked out, even if they did not purchase any yogurt on<br />

that trip. The model might even identify good prospects who had not yet gotten<br />

in touch with their inner yogurt lover, but might if prompted with a coupon.<br />

The model got good lift, and we were pleased with it. The client, however,<br />

was disappointed. “But, who is the yogurt lover?” asked the client. “Someone<br />

who gets a high score from the yogurt lover model” was not considered a good<br />

answer. The client was looking for something like “The yogurt lover is a woman<br />

between the ages of X and Y living in a zip code where the median home price<br />

is between M and N.” A description like that could be used for deciding where<br />

to buy advertising and how to shape the creative content of ads. Ours, based<br />

on shopping behavior rather than demographics, could not.<br />

statement of the business problem should be as specific as possible. “Identify<br />

the 10,000 gold-level customers most likely to defect within the next 60 days”<br />

is better than “provide a churn score for all customers.”<br />

The role of the data miner in these discussions is to ensure that the final<br />

statement of the business problem is one that can be translated into a data mining<br />

problem. Otherwise, the best data mining efforts in the world may be<br />

addressing the wrong business problem.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!