Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business 

Intelligence and Knowledge 

Discovery 

Note: Using material from Tan / Steinbach / Kumar (2005) 

Introduction to Data Mining, , Addison Wesley; and Cios / Pedrycz / 

Swiniarski / Kurgan (2007) Data Mining: A Knowledge Discovery 

Approach, , Springer. 

Engineering and Technology Management 

1

Database Marketing 

Database marketing is a form of direct marketing 

using databases of customers or potential 

customers to generate personalized 

communications in order to promote a product or 

service for marketing purposes. 

The distinction between direct and database 

marketing stems primarily from the attention paid 

to the analysis of data. Database marketing 

emphasizes the use of statistical techniques to 

develop models of customer behavior, which are 

then used to select customers for 

communications. 


2

Database Marketing 

Classic database marketing 

Customer list (in-house or bought) 

Simple model based on past data 

E-mails, coupons, offers 

Database marketing 2.0 

Integrated data source (internal, external) and 

warehouses 

Complex models (data mining, social network 

analysis) 

Communication channels include social media, 

direct web interactions (recommender 

systems), and many more 


3

Business Intelligence 

Encompasses architectures, tools, applications, 

databases and methodologies for the collection, 

integration, analysis, and presentation of business 

information. 

The purpose of business intelligence is to support 

better business decision making. 


4

BI Components and Architecture 


5

Transactional vs. Analytical Data 

Processing 

 

 

Transactional processing takes place in operational 

systems that provide the organization with the 

capability to perform business transactions and 

produce transaction reports. This is done primarily for 

fast and efficient processing of routine, repetitive data. 

Supplementary activity to transaction processing is 

called analytical processing, which involves the 

analysis of accumulated data. Analytical processing, 

sometimes referred to as business intelligence, 

includes data mining, decision support systems (DSS), 

querying, and other analysis activities. These analyses 

place strategic information in the hands of decision 

makers to enhance productivity and make better 

decisions, leading to greater competitive advantage. 


6

Business Analytics 

 

 

 

Business analytics is how organizations gather and 

interpret data in order to make better business 

decisions and to optimize business processes. In 

businesses, analytics (alongside data access and 

reporting) represents a subset of business intelligence 

(BI). 

Analytics are defined as the extensive use of data, 

statistical and quantitative analysis, explanatory and 

predictive modeling, and fact-based decision-making. 

Analytics may be used as input for human decisions, 

but there are also examples of fully automated 

decisions that require minimal human intervention. 


7

Business Analytics 


8

Knowledge Discovery 

 

The process of automatically searching large 

volumes of data for patterns that can be 

considered knowledge about the data 

Evolutionary stage 

Business question enabling technologies characteristic 

Data collection (1980s) 

What was my total revenue in 

the last 5 years? 

Computers ,tapes , disks 

Retrospective , static data 

delivery 

Data access (1980s) 

What were unit sales in new 

England last March ? 

Relational databases (RDBMS), 

structured query language 

(SQL) 

Retrospective , dynamic data 

delivery at record level 

Data warehousing and 

decision support (early 

1990s) 

What were the sales in region A 

by product, by salesperson? 

OLAP, multidimensional 

databases, data warehouses 

Retrospective , proactive data 

delivery at multiple level 

Intelligent data mining (late 

1990s) 

What’s likely to happen to the 

Boston unit’s sales next 

month ? Why? 

Advanced algorithms, 

multiprocessor computers, 

massive databases 

Prospective , proactive 

information delivery 

Advanced intelligent 

systems; complete 

integration (2000-2004) 

What is the best plan to follow? 

How did we perform compared 

to metrics? 

Neural computing advanced Al 

models, complex optimization, 

web services 

Proactive , integrative ; 

multiple business partners 


9

Data Mining 

 

 

Non-trivial extraction of implicit, previously 

unknown and potentially useful information from 

data 

Exploration & analysis, by automatic or semiautomatic 

means, of large quantities of data 

in order to discover meaningful patterns 

Prediction Methods: Use some variables to 

predict unknown or future values of other 

variables. 

Description Methods: Find human-interpretable 

patterns that describe the data. 


10

Text Mining 

 

 

The application of data mining to non- structured 

or less-structured text files. 

Text mining helps organizations to do the following 

(1) find the ‘’hidden’’ content of documents, 

including additional useful relationship and (2) 

group documents by common themes (e.g., 

identity all the customers of an insurance firm who 

have similar complaints). 


11

Web Mining 

 

 

The application of data mining techniques to 

discover actionable and meaningful patterns, 

profiles, and trends from web resources. 

Web mining is used in the following areas: 

information filtering, mining of web- access logs 

for analyzing usage, assisted browsing,... 


12

Data Life Cycle Process 


13

Knowledge Discovery Process 

The knowledge discovery process (KDP) forms the 

overall process for extracting new knowledge from 

data. 

– a sequence of steps (with feedback loops) that should be 

followed to discover new knowledge (e.g. patterns) 

a well-defined KDP model is a logical, cohesive, wellthought-out 

structure and approach that is presented to 

decision-makers who may have difficulty understanding 

the need, value, and mechanics behind a KDP 

to ensure the end product is useful for the user/owner of 

the data 

KD projects require a significant project management 

effort that needs to be grounded in a solid framework 

KD should follow other disciplines that have established 

models 


14

Knowledge Discovery Process 

KDP is defined as the non-trivial process of identifying 

valid, novel, potentially useful, and ultimately 

understandable patterns in data: 

consists of many steps (one is Data Mining), each 

attempting at the completion of a particular discovery 

task, and accomplished by the application of a DM method 

concerns the entire KD process, including how the data is 

stored and accessed, how to use efficient and scalable 

algorithms to analyze large datasets, how to interpret and 

visualize the results, and how to model and support 

interaction between human and machine 

concerns support for learning and analyzing the 

application domain 


15

Overview of the Knowledge Discovery 

Process 

– consists of multiple steps, which are executed in a sequence 

the next step is initiated upon successful completion of the 

previous step, and requires the result generated by the previous 

step as its input. 

it stretches between the task of understanding the project 

domain and data, through data preparation and analysis, to 

evaluation, understanding and application of the generated 

results 

it is iterative, i.e. includes feedback loops that are triggered by 

revisions 

Input data 

(database, 

images, video, 

semi-structured 

data, etc.) 

STEP 1 STEP 2 

STEP n- 

1 

STEP n 

Knowledge 

(patterns, rules, 

clusters, 

classification, 

associations, etc.) 


16

Knowledge Discovery Process Models 

Popular KDP models include 

Nine-step model by Fayyad and colleagues 

• academic 

CRISP-DM (CRoss-Industry Standard Process for Data 

Mining) model 

• industrial 

Six-step KDP model by Cios and colleagues 

• hybrid (academic/industrial) 


17


Nine-step model by Fayyad and colleagues 

– Developing and Understanding of the Application Domain 

It includes learning the relevant prior knowledge, and the goals of the 

end-user of the discovered knowledge. 

– Creating a Target Data Set 

It selects a subset of variables (attributes) and data points (examples), 

which will be used to perform discovery tasks. It usually includes 

querying the existing data to select the desired subset. 

– Data Cleaning and Preprocessing 

It consists of removing outliers, dealing with noise and missing values in 

the data, and accounting for time sequence information and known 

changes. 

– Data Reduction and Projection 

It consists of finding useful attributes by applying dimension reduction 

and transformation methods, and finding invariant representation of the 

data. 


18


– Choosing the Data Mining Task 

It matches the goals defined in step 1 with a particular DM method, such 

as classification, regression, clustering, etc. 

– Choosing the Data Mining Algorithm 

It selects methods for searching patterns in the data, and decides which 

models and parameters of the used methods may be appropriate. 

– Data Mining 

It generates patterns in a particular representational form, such as 

classification rules, decision trees, regression models, trends, etc. 

– Interpreting Mined Patterns 

It usually involves visualization of the extracted patterns and models, 

and visualization of the data based on the extracted models. 

– Consolidating Discovered Knowledge 

It consists of incorporating the discovered knowledge into the 

performance system, and documenting and reporting it to the interested 

parties. It also may include checking and resolving potential conflicts 

with previously believed knowledge. 


19


CRISP-DM (CRoss-Industry Standard Process for 

Data Mining) model 

designed in late 1990s by four companies: Integral 

Solutions Ltd. (provider of commercial Data Mining 

solutions), NCR (database provider), Daimler Chrysler 

(automobile manufacturer), and OHRA (insurance 

company) 

CRISP-DM Special Interest Group was created to support 

the developed process model 

• it includes over 300 users and tool/service providers 

the model consists of six steps 


20


CRISP-DM model 

– Business Understanding 

It focuses on understanding objectives and requirements from a 

business perspective. It also converts them into a DM problem definition, 

and designs a preliminary project plan to achieve the objectives. 

It is further broken into several sub-steps: 

– determination of business objectives 

– assessment of situation 

– determination of DM goals, and 

– generation of project plan. 

– Data Understanding 

It starts with an initial data collection and familiarization with the data. 

Specific aims include identification of data quality problems, discovery of 

initial insights into the data, and detection of interesting data subsets. 

It is further broken down into: 

– collection of initial data 

– description of data 

– exploration of data, and 

– verification of data quality 


21



– Data Preparation 

It covers all activities to construct the final dataset, which constitutes 

the data that will be fed into DM tool(s) in the next step. It includes 

table, record, and attribute selection, data cleaning, construction of new 

attributes, and data transformation. 

This step is divided into: 

– selection of data 

– cleansing of data 

– construction of data 

– integration of data, and 

– formatting of data sub-steps. 


22



– Modeling 

It selects and applies various modeling techniques. It usually involves 

use of several methods for the same DM problem type, and calibration 

of their parameters to optimal values. Since some methods may require 

a specific format for input data, often reiteration into the previous step is 

necessary. This step is subdivided into: 

– selection of modeling technique(s) 

– generation of test design 

– creation of models, and 

– assessment of generated models. 


23



– Evaluation 

After building one or more models that have high quality from a data 

analysis perspective, the model is evaluated from business objective 

perspective. The model is thoroughly evaluated, and review of the steps 

executed to construct the model is performed. A key objective is to 

determine if there are important business issues that have not been 

sufficiently considered. At the end of this phase, a decision on the use of 

the DM results should be reached. 

The key sub-steps in this step include: 

– evaluation of the results 

– process review, and 

– determination of the next step. 


24



– Deployment 

It involves organization and presentation of the discovered knowledge in 

a way that the customer can use. Depending on the requirements, this 

can be as simple as generating a report or as complex as implementing 

a repeatable KDP. 

This step is further divided into: 

– planning of the deployment 

– planning of the monitoring and maintenance 

– generation of final report, and 

– review of the process sub-steps. 


25



is characterized by an easy to understand vocabulary and 

good documentation 

acknowledges the strong iterative nature of the process 

with loops between several of the steps 

successful and extensively applied model, which is mainly 

because of its grounding in practical, industrial, real-world 

Knowledge Discovery experience 


26


Six-step model by Cios and colleagues 

developed based on the CRISP-DM model by adopting it to 

academic research; main differences and extensions 

include: 

• providing more general, research-oriented description of the 

steps 

• introducing the Data Mining step instead of the Modeling step 

• introducing several new explicit feedback mechanisms. The 

CRISP-DM model has only three major feedback sources, 

while this model has more detailed feedback mechanisms 

• modification of the last step; the discovered for a particular 

domain may be applied in other domains 

includes six steps 


27


Six-step model 

Understanding of 

the Problem 

Domain 

Understanding of 

the Data 

input data 

(database, images, 

video, semistructured 

data, etc.) 

Preparation of the 

Data 

Data Mining 

Evaluation of the 

Discovered Knowledge 

knowledge 

(patterns, rules, clusters, 

classifica- 

-tion, associations, etc.) 

Use of the Discovered 


Extend knowledge to 

other domains 


28



– Understanding of the Problem Domain 

It involves working closely with domain experts to define the problem 

and determine the project goals, identifying key people, and learning 

about current solutions to the problem. It also involves learning domainspecific 

terminology. A description of the problem, including its 

restrictions, is prepared. Finally, project goals are translated into the DM 

goals and initial selection of DM tools to be used later in the process is 

performed. 

– Understanding of the Data 

It includes collection of sample data and deciding which data, including 

its format and size, will be needed. Background knowledge can be used 

to guide these efforts. Data is checked for completeness, redundancy, 

missing values, plausibility of attribute values, etc. Finally, the step 

includes verification of the usefulness of the data in respect to the DM 

goals. 


29


– Preparation of the Data 

It concerns deciding which data will be used as input for DM methods in 

the next step. It involves sampling, running correlation and significance 

tests, data cleaning that includes checking completeness of data 

records, removing or correcting for noise and missing values, etc. The 

cleaned data may be further processed by feature selection and 

extraction algorithms (to reduce dimensionality), by derivation of new 

attributes (say by discretization), and by summarization of data (data 

granularization). The end results are data that meet specific input 

requirements for the selected in step 1 DM tools. 

– Data Mining 

It involves using various DM methods to derive knowledge from 

preprocessed data. 


30


– Evaluation of the Discovered Knowledge 

It includes understanding the results, checking whether the discovered 

knowledge is novel and interesting, interpreting of the results by domain 

experts, and checking the impact of the discovered knowledge. Only the 

approved models are retained and the entire process is revisited to 

identify which alternative actions could have been taken to improve the 

results. A list of errors made in the process is prepared. 

– Use of the Discovered Knowledge 

It consists of planning where and how the discovered knowledge will be 

used. The application area in the current domain may be extended to 

other domains. A plan to monitor the implementation of the discovered 

knowledge is created and the entire project documented. Finally the 

discovered knowledge is deployed. 


31



– this model identifies and describes explicit feedback 

loops 

• from Understanding of the Data to the Understanding of the Problem 

Domain step; the loop is caused by needing additional domain 

knowledge to better understand the data 

• from the Preparation of the Data to the Understanding of the Data 

step; the loop is caused by need for additional or more specific 

information about the data to guide the choice of data 

preprocessing algorithms 

• from the Data Mining to the Understanding of the Problem Domain 

step; the reason could be unsatisfactory results generated by 

selected DM methods, requiring modification of the project’s goals 

• from the Data Mining to the Understanding of the Data step; the 

most common reason is poor understanding of the data, which 

results in incorrect selection of DM method and its subsequent 

failure 


32


• from the Data Mining to the Preparation of the Data step; the loop is 

caused by need to improve data preparation. This is often caused by 

the specific requirements of the used DM method, which may have 

not been known during the Data Preparation step, 

• from the Evaluation of the Discovered Knowledge to the 

Understanding of the Problem Domain step; the most common 

cause is invalidity of the discovered knowledge. Several possible 

reasons include incorrect understanding or interpretation of the 

domain, incorrect design or understanding of problem restrictions, 

requirements, or goals 

• from the Evaluation of the Discovered Knowledge to the Data 

Mining; this loop is executed when the discovered knowledge is not 

novel, interesting, or useful. The least expensive solution is to 

choose a different DM tool and repeat the DM step. 


33

Comparison of Knowledge Discovery 

Process Models 

Model 

domain of origin 

# steps 

Steps 

Fayyad et al. 

academic 

9 

1. Developing and Understanding of the 

Application Domain 

2. Creating a Target Data Set 

Cios et al. 

hybrid (academic/industry) 

6 

1. Understanding of the Problem 

Domain 

2. Understanding of the Data 

CRISP-DM 

industry 

6 

1. Business Understanding 

2. Data Understanding 

Notes 

supporting software 

3. Data Cleaning and Preprocessing 

4. Data Reduction and Projection 

5. Choosing the Data Mining Task 

6. Choosing the Data Mining Algorithm 

7. Data Mining 

8. Interpreting Mined Patterns 

9. Consolidating Discovered Knowledge 

the most popular model; provides detailed 

technical description with respect to data 

analysis, but lacks business aspects 

commercial system MineSet TM 

3. Preparation of the Data 

4. Data Mining 

5. Evaluation of the Discovered 


6. Use of the Discovered Knowledge 

draws from both academic and industrial 

models; emphasizes iterative aspects; 

identifies and describes explicit 

feedback loops 

N/A 

3. Data Preparation 

4. Modeling 

5. Evaluation 

6. Deployment 

uses easy to understand 

vocabulary; has good 

documentation; 

commercial system Clementine® 

reported application 

domains 

medicine, engineering, production, 

e-business, software 

medicine, software 

medicine, engineering, marketing, 

sales 


34

Comparison of the Knowledge Discovery 

Process Models 

A very important aspect of the KDP is the relative 

time spent to complete each of the steps 

– it enables precise scheduling 

– estimates proposed by both researchers and practitioners are 

shown below 

• specific estimated values depend on many factors, such as existing 

knowledge about the considered project domain, skills level of human 

resources, complexity of the problem, etc. 

• data preparation step is by far the most time consuming step 

relative effort [%] 

70 

60 

Cabena et al. estimates 

Shearer estimates 

Cios and Kurgan estimates 

50 

40 

30 

20 

10 

0 

Understanding 

of Domain 

Understanding 

of Data 

Preparation of 

Data 

Data Mining 

Evaluation of 

Results 

Deployment of 

Results 

KDDM steps 


35

Database Marketing, Business Intelligence and Knowledge Discovery

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?