17.07.2013 Views

Mining Association Rules from Empirical Data in the Domain of ...

Mining Association Rules from Empirical Data in the Domain of ...

Mining Association Rules from Empirical Data in the Domain of ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>M<strong>in</strong><strong>in</strong>g</strong> <strong>Association</strong> <strong>Rules</strong> <strong>from</strong> <strong>Empirical</strong> <strong>Data</strong> <strong>in</strong> <strong>the</strong> Doma<strong>in</strong> <strong>of</strong> Education 935<br />

for association and classification, KAON [15] for cluster<strong>in</strong>g and text m<strong>in</strong><strong>in</strong>g and CIECoF [16] for<br />

association rule m<strong>in</strong><strong>in</strong>g. The application <strong>of</strong> <strong>the</strong>se systems <strong>in</strong> <strong>the</strong> field <strong>of</strong> education <strong>in</strong>cludes <strong>the</strong><br />

selection <strong>of</strong> suitable DM techniques. It is not unusual that multiple DM techniques are applied<br />

to <strong>the</strong> same data sample.<br />

As <strong>in</strong> [2,7] <strong>the</strong> ma<strong>in</strong> steps <strong>of</strong> application <strong>of</strong> DM techniques are:<br />

1. Collection <strong>of</strong> <strong>the</strong> data. The CMS system is used and <strong>the</strong> collected data are stored <strong>in</strong><br />

database. This step can be executed by a questionnaire or some o<strong>the</strong>r data collection<br />

technique <strong>in</strong>stead <strong>of</strong> CMS usage.<br />

2. Preprocess<strong>in</strong>g <strong>the</strong> data. The data is "cleaned" and transformed <strong>in</strong>to an appropriate format<br />

to be m<strong>in</strong>ed.<br />

3. Application <strong>of</strong> suitable DM technique. The DM technique is applied to build <strong>the</strong> model<br />

that discovers new rules, patterns and knowledge. To execute this step, ei<strong>the</strong>r a general or<br />

a specific data m<strong>in</strong><strong>in</strong>g tool, or a commercial or a free data m<strong>in</strong><strong>in</strong>g tool can be used.<br />

4. The <strong>in</strong>terpretation, evaluation and deployment <strong>of</strong> <strong>the</strong> results.<br />

In particular, it is necessary to apply and elaborate <strong>in</strong> detail each <strong>of</strong> <strong>the</strong>se steps depend<strong>in</strong>g<br />

on <strong>the</strong> data to be analyzed. Some <strong>of</strong> <strong>the</strong> systems for data m<strong>in</strong><strong>in</strong>g that are used to analyze data<br />

<strong>from</strong> different doma<strong>in</strong>s are:<br />

• Rosetta system, developed by researchers <strong>from</strong> <strong>the</strong> University <strong>of</strong> Warsaw and <strong>the</strong> University<br />

<strong>of</strong> Trondheim [17,18]. Rosetta is capable <strong>of</strong> syn<strong>the</strong>sis <strong>of</strong> <strong>the</strong> IF ... THEN rules by<br />

usage <strong>of</strong> <strong>the</strong> Rough sets <strong>the</strong>ory. This system is based on classification, reduction <strong>of</strong> data<br />

and decision rules syn<strong>the</strong>sis.<br />

• Weka system was developed by researchers <strong>from</strong> <strong>the</strong> University <strong>of</strong> Waikato, New Zealand<br />

[19].<br />

Rosetta system allows data to be loaded <strong>from</strong> MS Excel table; <strong>the</strong> format <strong>of</strong> <strong>the</strong> loaded data<br />

can also be CSV (Comma Separated Values). Rosetta system performs <strong>the</strong> extraction <strong>of</strong> <strong>the</strong><br />

IF...THEN rules. The data can be collected by various methods; <strong>the</strong> format <strong>of</strong> <strong>the</strong> collected data<br />

does not have to be specially adapted to DM techniques implemented <strong>in</strong> Rosetta.<br />

On <strong>the</strong> o<strong>the</strong>r hand, <strong>the</strong> Weka GUI Chooser provides a start<strong>in</strong>g po<strong>in</strong>t for launch<strong>in</strong>g Wekas ma<strong>in</strong><br />

GUI applications and support<strong>in</strong>g tools. The Weka system can be used to start <strong>the</strong> particular<br />

DM applications:<br />

• Explorer - An environment for explor<strong>in</strong>g data with Weka.<br />

• Experimenter - An environment for perform<strong>in</strong>g experiments and conduct<strong>in</strong>g statistical tests<br />

between learn<strong>in</strong>g schemes.<br />

• KnowledgeFlow - This application supports essentially <strong>the</strong> same functions as <strong>the</strong> Explorer<br />

but with a drag-and-drop <strong>in</strong>terface. One advantage is that it supports <strong>in</strong>cremental learn<strong>in</strong>g.<br />

• SimpleCLI - Provides a simple command-l<strong>in</strong>e <strong>in</strong>terface that allows direct execution <strong>of</strong> Weka<br />

commands for operat<strong>in</strong>g systems that do not provide <strong>the</strong>ir own command l<strong>in</strong>e <strong>in</strong>terface.<br />

System Weka allows (see Figure 1):

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!