26.12.2013 Views

AI - a Guide to Intelligent Systems.pdf - Member of EEPIS

AI - a Guide to Intelligent Systems.pdf - Member of EEPIS

AI - a Guide to Intelligent Systems.pdf - Member of EEPIS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

304<br />

KNOWLEDGE ENGINEERING AND DATA MINING<br />

the knowledge engineer (a person capable <strong>of</strong> designing, building and testing an<br />

intelligent system) and the domain expert (a knowledgeable person capable <strong>of</strong><br />

solving problems in a specific area or domain).<br />

Then we specify the project’s objectives, such as gaining a competitive edge,<br />

improving the quality <strong>of</strong> decisions, reducing labour costs, and improving the<br />

quality <strong>of</strong> products and services.<br />

Finally, we determine what resources are needed for building the system.<br />

They normally include computer facilities, development s<strong>of</strong>tware, knowledge<br />

and data sources (human experts, textbooks, manuals, web sites, databases and<br />

examples) and, <strong>of</strong> course, money.<br />

9.1.2 Data and knowledge acquisition<br />

During this phase we obtain further understanding <strong>of</strong> the problem domain by<br />

collecting and analysing both data and knowledge, and making key concepts <strong>of</strong><br />

the system’s design more explicit.<br />

Data for intelligent systems are <strong>of</strong>ten collected from different sources, and<br />

thus can be <strong>of</strong> different types. However, a particular <strong>to</strong>ol for building an<br />

intelligent system requires a particular type <strong>of</strong> data. Some <strong>to</strong>ols deal with<br />

continuous variables, while others need <strong>to</strong> have all variables divided in<strong>to</strong> several<br />

ranges, or <strong>to</strong> be normalised <strong>to</strong> a single range, say from 0 <strong>to</strong> 1. Some handle<br />

symbolic (textual) data, while others use only numerical data. Some <strong>to</strong>lerate<br />

imprecise and noisy data, while others require only well-defined, clean data. As<br />

a result, the data must be transformed, or massaged, in<strong>to</strong> the form useful for a<br />

particular <strong>to</strong>ol. However, no matter which <strong>to</strong>ol we choose, there are three<br />

important issues that must be resolved before massaging the data (Berry and<br />

Lin<strong>of</strong>f, 1997).<br />

The first issue is incompatible data. Often the data we want <strong>to</strong> analyse s<strong>to</strong>re<br />

text in EBCDIC coding and numbers in packed decimal format, while the <strong>to</strong>ols<br />

we want <strong>to</strong> use for building intelligent systems s<strong>to</strong>re text in the ASCII code and<br />

numbers as integers with a single- or double-precision floating point. This issue is<br />

normally resolved with data transport <strong>to</strong>ols that au<strong>to</strong>matically produce the code<br />

for the required data transformation.<br />

The second issue is inconsistent data. Often the same facts are represented<br />

differently in different databases. If these differences are not spotted and<br />

resolved in time, we might find ourselves, for example, analysing consumption<br />

patterns <strong>of</strong> carbonated drinks using data that do not include Coca-Cola just<br />

because they were s<strong>to</strong>red in a separate database.<br />

The third issue is missing data. Actual data records <strong>of</strong>ten contain blank fields.<br />

Sometimes we might throw such incomplete records away, but normally we<br />

would attempt <strong>to</strong> infer some useful information from them. In many cases,<br />

we can simply fill the blank fields in with the most common or average values. In<br />

other cases, the fact that a particular field has not been filled in might itself<br />

provide us with very useful information. For example, in a job application form,<br />

a blank field for a business phone number might suggest that an applicant is<br />

currently unemployed.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!