26.12.2013 Views

AI - a Guide to Intelligent Systems.pdf - Member of EEPIS

AI - a Guide to Intelligent Systems.pdf - Member of EEPIS

AI - a Guide to Intelligent Systems.pdf - Member of EEPIS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

352<br />

KNOWLEDGE ENGINEERING AND DATA MINING<br />

Traditionally, data has been analysed with user-driven techniques, where a<br />

user formulates a hypothesis and then tests and validates it with the available<br />

data. A query <strong>to</strong>ol is, in fact, one such technique. However, as we already<br />

know, the success <strong>of</strong> a query <strong>to</strong>ol in discovering new knowledge is largely based<br />

on the user’s ability <strong>to</strong> hypothesise, or in other words, on the user’s hunch.<br />

Moreover, even experts are not capable <strong>of</strong> correlating more than three or, at<br />

best, four variables, while in reality, a data warehouse may include dozens<br />

<strong>of</strong> variables, and there may be hundreds <strong>of</strong> complex relationships among<br />

these variables.<br />

Can we use statistics <strong>to</strong> make sense <strong>of</strong> the data?<br />

Statistics is the science <strong>of</strong> collecting, organising and utilising numerical data. It<br />

gives us general information about data: the average and median values,<br />

distribution <strong>of</strong> values, and observed errors. Regression analysis – one <strong>of</strong> the most<br />

popular techniques for data analysis – is used <strong>to</strong> interpolate and extrapolate<br />

observed data.<br />

Statistics is useful in analysing numerical data, but it does not solve data<br />

mining problems, such as discovering meaningful patterns and rules in large<br />

quantities <strong>of</strong> data.<br />

What are data mining <strong>to</strong>ols?<br />

Data mining is based on intelligent technologies already discussed in this book.<br />

It <strong>of</strong>ten applies such <strong>to</strong>ols as neural networks and neuro-fuzzy systems. However,<br />

the most popular <strong>to</strong>ol used for data mining is a decision tree.<br />

What is a decision tree?<br />

A decision tree can be defined as a map <strong>of</strong> the reasoning process. It describes a<br />

data set by a tree-like structure. Decision trees are particularly good at solving<br />

classification problems.<br />

Figure 9.44 shows a decision tree for identifying households that are<br />

likely <strong>to</strong> respond <strong>to</strong> the promotion <strong>of</strong> a new consumer product, such as a<br />

new banking service. Typically, this task is performed by determining the<br />

demographic characteristics <strong>of</strong> the households that responded <strong>to</strong> the promotion<br />

<strong>of</strong> a similar product in the past. Households are described by their ownership,<br />

income, type <strong>of</strong> bank accounts, etc. One field in the database (named<br />

Household) shows whether a household responded <strong>to</strong> the previous promotion<br />

campaign.<br />

A decision tree consists <strong>of</strong> nodes, branches and leaves. In Figure 9.44, each box<br />

represents a node. The <strong>to</strong>p node is called the root node. The tree always starts<br />

from the root node and grows down by splitting the data at each level in<strong>to</strong> new<br />

nodes. The root node contains the entire data set (all data records), and child<br />

nodes hold respective subsets <strong>of</strong> that set. All nodes are connected by branches.<br />

Nodes that are at the end <strong>of</strong> branches are called terminal nodes, orleaves.<br />

Each node contains information about the <strong>to</strong>tal number <strong>of</strong> data records at<br />

that node, and the distribution <strong>of</strong> values <strong>of</strong> the dependent variable.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!