Sample classification using the GA/KNN algorithm with varying ...

Similarly if the p parameter equals 5, then the formula will be as follows: 

Methods : 

D = [ (g1k – g1j) 5 +…….+(gnk – gnj) 5 ] 1/5 

Data: Golub data was obtained from BINF 733 website .The data was filtered and 

transformed to log scale as shown below (taken from Li et al. 1 ). 

The filtering was done by excluding genes whose expression levels were below 50 in 

more than 57 out of the 72 samples. 5455 genes were left after the filtering which were 

transformed to the log scale. The data set was tailored to act as an input for the GA/KNN 

algorithm. 

Usage of GA/KNN algorithm: 

The GA/KNN source code was kindly provided by Dr. Leping Li .First, it was run on the 

dataset without any modification. In that case the p parameter was equal to 2. Then the 

code was modified by changing the distance calculation used while calculating the Knearest 

neighbors. Then the modified source code was run for p parameter values 3, 4 ,5, 

6 and 7. Higher p values increased the run time of the program and the program did not 

successfully execute because of limited computing resources. 

The program was run several times on unix server(mason.gmu.edu) to find out the 

optimal parameters, taking into account the run time. The following parameters were 

found best suited for the golub data set: 

Chromosome length : 10 (higher values did not change the output) 

Number of near optimal solutions : 5000(higher values did not change the output) 

Termination fitness cutoff : 36 

K( in KNN) 3 

Number of training samples 38 

Number of test samples 34 

For a Detailed description of each of these parameters please refer to Li et al 1 . 

The results from the algorithm were analyzed and the number of wrong classifications 

were counted for each run(for each value of p parameter). Each run of the program 

yielded a Gene ranking. 50 Top ranked genes were taken in each case and the dataset was 

cut short from 5455 genes across 72 samples to 50 genes across 72 samples. This data 

was now plotted in its first two principal components using MATLAB.

Previous page

Next page

1

2

3

4

5

6

7

8

Sample classification using the GA/KNN algorithm with varying ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?