27.06.2013 Views

Volume Two - Academic Conferences

Volume Two - Academic Conferences

Volume Two - Academic Conferences

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Cristina Wanzeller and Orlando Belo<br />

used to accomplish transformation and analysis activities, as well humans to provide complementary<br />

descriptions. The data collection begins when a new problem arises, within a problem-solving use of<br />

the system. Nevertheless, a new case may be integrated into de system independently from problem<br />

solving. The analyst describes some datasets characteristics (e.g. variables semantic category) but<br />

most metadata is extracted automatically (e.g. variables data type, number of distinct and null values),<br />

as exemplified on Table 2. The data collection proceeds, with more intensity, during the Conciliate<br />

and Retain steps. Again, user interaction with the analysts is required to gather information. However,<br />

concerning mining activities, PMML documents are a very important source of knowledge to the MPS<br />

system semi-automated learning approach.<br />

Figure 3 shows a PMML document excerpt. We focus the mining activities gathering from PMML<br />

documents, presenting the main item types which may be extracted (on Table 3). The used dataset<br />

describes the page visits from msnbc.com, on September 28, 1999, whose dataset is available from<br />

the UCI KDD Archive. We have few cases on MPS repository, based on this well know dataset.<br />

<br />

<br />

<br />

… <br />

<br />

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!