14.04.2014 Views

Hello Processing - Vula

Hello Processing - Vula

Hello Processing - Vula

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

An Introduction To Programming With <strong>Processing</strong><br />

small data set with a high ratio of inconsistent data types, manual data mining<br />

could be more effective and save you some time.<br />

For example, lets take a hypothetical situation where the owner of a small café<br />

has asked you to determine what items a customer is likely to purchase together.<br />

The data set you have acquired consists of all the items customers have purchased<br />

in the store over the past year. From that data you could determine that goods can<br />

be divided up into food, drinks, magazines, stationary etc, and then into even<br />

smaller groups like fruit, vegetables, soft-drinks etc. Identifying these groups<br />

would be the first step of data mining and is a process also known as clustering. If<br />

the café has a large variety of items to choose from the clusters making up the<br />

data could resultantly be numerous, yet the data set as a whole is actually quite<br />

small, only consisting of a single year of purchased items.<br />

In contrast a more established café that has been operating for several years<br />

proposes the same question to you. In the case of the smaller café groupings of<br />

purchased items would yield a lower probability of repeating in a shorter period<br />

of time. In contrast the more established café has a better chance of the same<br />

groups of items being purchased together over a longer period of time.<br />

In the former case manually mining this data set could be more effective because<br />

of the lower probability of the same items being purchased together in a relatively<br />

short space of time. In this scenario the majority of your time would be spent on<br />

clustering and populating the resultant groups after eliminating the majority of<br />

items purchased in that year because they will not fall into any cluster.<br />

However in the case of the established café although there may be just as many<br />

clusters the values that these clusters are populated with have a higher probability<br />

of being repeated, it might therefore be more efficient to have a computer<br />

program count, cluster and mine the data.<br />

Regardless of whether you choose to manually mine your data or have a software<br />

program do the work for you, you should have a set of data at the end of the<br />

process that can be manipulated programmatically.<br />

A Scalable Software Development Model 30

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!