Wild Mushrooms Classification – Edible Or Poisonous
Wild Mushrooms Classification – Edible Or Poisonous
Wild Mushrooms Classification – Edible Or Poisonous
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
ECE539 Project Proposal<br />
Yulin Shen<br />
11/14/2013<br />
<strong>Wild</strong> <strong>Mushrooms</strong> <strong>Classification</strong> <strong>–</strong> <strong>Edible</strong> <strong>Or</strong> <strong>Poisonous</strong><br />
<strong>Mushrooms</strong>, as a kind of food, are very special due to their edibility. Some countries, for<br />
example, China, treat mushrooms as high nutrition food. However, only small portions of<br />
them are really ate by people. And also some explorers may get lost in some forests<br />
without food. They can eat mushrooms, however, they do not know whether they are<br />
edible. Hence, I want to use some classification algorithms to develop a best model to<br />
predict whether new emerging mushrooms are edible based on the detected data about<br />
mushrooms (features and edibility).<br />
In order to establish a model for predicting edibility of mushrooms, a dataset with enough<br />
features is necessary. In the UCI Machine Learning Repository, there is a dataset, which<br />
has 8124 instances with 22 attributes and the edibility. However, these 23 kinds of data<br />
are descriptive, not numerical. So, I need to preprocess the data. For example, the “edible”<br />
can be 1, and “poisonous” be 0, and if an attribute has 5 choices, I would use 5 bits<br />
strings to replace them. Because I just have a dataset, I intend to split the dataset into 2<br />
files. One is for training data, and another is testing data. The best way is I can get<br />
another testing dataset. After getting the numerical data with same meaning, I need to<br />
consider about the algorithms for classification. In the class, I have already learned K-
Nearest Neighbor Classifier, Multi-Layer Perceptron, and other algorithms. In the project,<br />
I want to make some models based on the algorithms I have already learned in the class.<br />
Furthermore, I want to implement another algorithm outside the class.<br />
There is a reference teaches us top 10 algorithms for data mining. Although some of them<br />
are mentioned in the class, I can implement 1 algorithm to do comparison. And also there<br />
is an efficient clustering algorithm for categorical dataset that is K-Histograms. My aim<br />
in the project is to make an accurate model for predicting the edibility of mushrooms.<br />
Hence, my plan is to better understand the classification algorithms and looking for more<br />
data to do testing. <strong>Classification</strong> and prediction are really useful, even in other fields.<br />
Reference:<br />
1. http://arxiv.org/pdf/cs/0509033.pdf (K-Histograms)<br />
2. http://www.cs.umd.edu/~samir/498/10Algorithms-08.pdf (Top 10 algorithms for<br />
data mining)<br />
3. http://archive.ics.uci.edu/ml/datasets/Mushroom (Dataset)