28.04.2015 Views

Identification of Pharmacogenomic Biomarker Classifiers in Cancer ...

Identification of Pharmacogenomic Biomarker Classifiers in Cancer ...

Identification of Pharmacogenomic Biomarker Classifiers in Cancer ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Support vector mach<strong>in</strong>es are very popular <strong>in</strong> the mach<strong>in</strong>e learn<strong>in</strong>g literature. Although<br />

they sound very exotic, l<strong>in</strong>ear kernel support vector mach<strong>in</strong>es do class prediction us<strong>in</strong>g a<br />

predictor <strong>of</strong> the form <strong>of</strong> equation (1). The weights are determ<strong>in</strong>ed by optimiz<strong>in</strong>g a misclassification<br />

rate criterion, however, <strong>in</strong>stead <strong>of</strong> a least-squares criterion as <strong>in</strong> l<strong>in</strong>ear<br />

discrim<strong>in</strong>ant analysis (Ramaswamy et al. {Ramaswamy, 2001 #92}). Although there are<br />

more complex forms <strong>of</strong> support vector mach<strong>in</strong>es, they appear to be <strong>in</strong>ferior to l<strong>in</strong>ear<br />

kernel SVM’s for class prediction with large numbers <strong>of</strong> genes {Ben-Dor, 2000 #91}.<br />

In the study <strong>of</strong> Dudoit et al. {Dudoit, 2002 #42}, the simplest methods, diagonal l<strong>in</strong>ear<br />

discrim<strong>in</strong>ant analysis, and nearest neighbor classification, performed as well or better<br />

than the more complex methods. Nearest neighbor classification is def<strong>in</strong>ed as follows. It<br />

depends on a feature set F <strong>of</strong> genes selected to be useful for discrim<strong>in</strong>at<strong>in</strong>g the classes. It<br />

also depends upon a distance function d( x, y)<br />

which measures the distance between the<br />

expression pr<strong>of</strong>iles x and y <strong>of</strong> two samples. The distance function utilizes only the genes<br />

<strong>in</strong> the selected set <strong>of</strong> features F. To classify a sample with expression pr<strong>of</strong>ile y , compute<br />

d( x, y)<br />

for each sample x <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g set. The predicted class <strong>of</strong> y is the class <strong>of</strong><br />

the sample <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g set which is closest to y with regard to the distance function d.<br />

A variant <strong>of</strong> nearest neighbor classification is k-nearest neighbor classification. For<br />

example with 3-nearest neighbor classification, you f<strong>in</strong>d the three samples <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g<br />

set which are closest to the sample y . The class which is most represented among these<br />

three samples is the predicted class for y . Tibshirani et al. ( ) developed a variant called<br />

8

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!