25.12.2013 Views

Wild Mushrooms Classification – Edible Or Poisonous

Wild Mushrooms Classification – Edible Or Poisonous

Wild Mushrooms Classification – Edible Or Poisonous

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ECE539 Project Proposal<br />

Yulin Shen<br />

11/14/2013<br />

<strong>Wild</strong> <strong>Mushrooms</strong> <strong>Classification</strong> <strong>–</strong> <strong>Edible</strong> <strong>Or</strong> <strong>Poisonous</strong><br />

<strong>Mushrooms</strong>, as a kind of food, are very special due to their edibility. Some countries, for<br />

example, China, treat mushrooms as high nutrition food. However, only small portions of<br />

them are really ate by people. And also some explorers may get lost in some forests<br />

without food. They can eat mushrooms, however, they do not know whether they are<br />

edible. Hence, I want to use some classification algorithms to develop a best model to<br />

predict whether new emerging mushrooms are edible based on the detected data about<br />

mushrooms (features and edibility).<br />

In order to establish a model for predicting edibility of mushrooms, a dataset with enough<br />

features is necessary. In the UCI Machine Learning Repository, there is a dataset, which<br />

has 8124 instances with 22 attributes and the edibility. However, these 23 kinds of data<br />

are descriptive, not numerical. So, I need to preprocess the data. For example, the “edible”<br />

can be 1, and “poisonous” be 0, and if an attribute has 5 choices, I would use 5 bits<br />

strings to replace them. Because I just have a dataset, I intend to split the dataset into 2<br />

files. One is for training data, and another is testing data. The best way is I can get<br />

another testing dataset. After getting the numerical data with same meaning, I need to<br />

consider about the algorithms for classification. In the class, I have already learned K-


Nearest Neighbor Classifier, Multi-Layer Perceptron, and other algorithms. In the project,<br />

I want to make some models based on the algorithms I have already learned in the class.<br />

Furthermore, I want to implement another algorithm outside the class.<br />

There is a reference teaches us top 10 algorithms for data mining. Although some of them<br />

are mentioned in the class, I can implement 1 algorithm to do comparison. And also there<br />

is an efficient clustering algorithm for categorical dataset that is K-Histograms. My aim<br />

in the project is to make an accurate model for predicting the edibility of mushrooms.<br />

Hence, my plan is to better understand the classification algorithms and looking for more<br />

data to do testing. <strong>Classification</strong> and prediction are really useful, even in other fields.<br />

Reference:<br />

1. http://arxiv.org/pdf/cs/0509033.pdf (K-Histograms)<br />

2. http://www.cs.umd.edu/~samir/498/10Algorithms-08.pdf (Top 10 algorithms for<br />

data mining)<br />

3. http://archive.ics.uci.edu/ml/datasets/Mushroom (Dataset)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!