Wild Mushrooms Classification – Edible Or Poisonous

ECE539 Project Proposal 

Yulin Shen 

11/14/2013 

Wild Mushrooms Classification – Edible Or Poisonous 

Mushrooms, as a kind of food, are very special due to their edibility. Some countries, for 

example, China, treat mushrooms as high nutrition food. However, only small portions of 

them are really ate by people. And also some explorers may get lost in some forests 

without food. They can eat mushrooms, however, they do not know whether they are 

edible. Hence, I want to use some classification algorithms to develop a best model to 

predict whether new emerging mushrooms are edible based on the detected data about 

mushrooms (features and edibility). 

In order to establish a model for predicting edibility of mushrooms, a dataset with enough 

features is necessary. In the UCI Machine Learning Repository, there is a dataset, which 

has 8124 instances with 22 attributes and the edibility. However, these 23 kinds of data 

are descriptive, not numerical. So, I need to preprocess the data. For example, the “edible” 

can be 1, and “poisonous” be 0, and if an attribute has 5 choices, I would use 5 bits 

strings to replace them. Because I just have a dataset, I intend to split the dataset into 2 

files. One is for training data, and another is testing data. The best way is I can get 

another testing dataset. After getting the numerical data with same meaning, I need to 

consider about the algorithms for classification. In the class, I have already learned K-

Nearest Neighbor Classifier, Multi-Layer Perceptron, and other algorithms. In the project, 

I want to make some models based on the algorithms I have already learned in the class. 

Furthermore, I want to implement another algorithm outside the class. 

There is a reference teaches us top 10 algorithms for data mining. Although some of them 

are mentioned in the class, I can implement 1 algorithm to do comparison. And also there 

is an efficient clustering algorithm for categorical dataset that is K-Histograms. My aim 

in the project is to make an accurate model for predicting the edibility of mushrooms. 

Hence, my plan is to better understand the classification algorithms and looking for more 

data to do testing. Classification and prediction are really useful, even in other fields. 

Reference: 

1. http://arxiv.org/pdf/cs/0509033.pdf (K-Histograms) 

2. http://www.cs.umd.edu/~samir/498/10Algorithms-08.pdf (Top 10 algorithms for 

data mining) 

3. http://archive.ics.uci.edu/ml/datasets/Mushroom (Dataset)

Wild Mushrooms Classification – Edible Or Poisonous

Create successful ePaper yourself

Delete template?

Save as template?