18.02.2015 Views

Berry

Berry

Berry

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

536 Chapter 16<br />

Comprehensible Output<br />

Tools vary greatly in the extent to which they explain themselves. Rule generators,<br />

tree visualizers, Web diagrams, and association tables can all help.<br />

Some vendors place great emphasis on the visual representation of both<br />

data and rules, providing three-dimensional data terrain maps, geographic<br />

information systems (GIS), and cluster diagrams to help make sense of complex<br />

relationships. The final destination of much data mining work is reports<br />

for management, and the power of graphics should not be underestimated for<br />

convincing non-technical users of data mining results. A data mining tool<br />

should make it easy to export results to commonly available reporting an<br />

analysis packages such as Excel and PowerPoint.<br />

Ability to Handle Diverse Data Types<br />

Many data mining software packages place restrictions on the kinds of data<br />

that can be analyzed. Before investing in a data mining software package, find<br />

out how it deals with the various data types you want to work with.<br />

Some tools have difficulty using categorical variables (such as model, type,<br />

gender) as input variables and require the user to convert these into a series of<br />

yes/no variables, one for each possible class. Others can deal with categorical<br />

variables that take on a small number of values, but break down when faced<br />

with too many. On the target field side, some tools can handle a binary classification<br />

task (good/bad), but have difficulty predicting the value of a categorical<br />

variable that can take on several values.<br />

Some data mining packages on the market require that continuous variables<br />

(income, mileage, balance) be split into ranges by the user. This is especially<br />

likely to be true of tools that generate association rules, since these require a<br />

certain number of occurrences of the same combination of values in order to<br />

recognize a rule.<br />

Most data mining tools cannot deal with text, although such support is starting<br />

to appear. If the text strings in the data are standardized codes (state, part<br />

number), this is not really a problem, since character codes can easily be converted<br />

to numeric or categorical ones. If the application requires the ability to<br />

analyze free text, some of the more advanced data mining tool sets are starting<br />

to provide support for this capability.<br />

Documentation and Ease of Use<br />

A well-designed user interface should make it possible to start mining right<br />

away, even if mastery of the tool requires time and study. As with any complex<br />

software, good documentation can spell the difference between success and<br />

frustration. Before deciding on a tool, ask to look over the manual. It is very

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!