05.08.2013 Views

New Approaches to in silico Design of Epitope-Based Vaccines

New Approaches to in silico Design of Epitope-Based Vaccines

New Approaches to in silico Design of Epitope-Based Vaccines

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.3. MHC BINDING PREDICTION FOR ALL MHC-I ALLELES 41<br />

Figure 4.6: Correlation between UniTope performance and distance <strong>to</strong> nearest neighbor.<br />

The PCC achieved by UniTope on unseen alleles on a specific MHC allele is plotted aga<strong>in</strong>st<br />

the allele’s distance <strong>to</strong> its nearest neighbor. The PCC between UniTope performance and distance<br />

<strong>to</strong> nearest neighbor is −0.59. The regression l<strong>in</strong>e is plotted as solid red l<strong>in</strong>e. One and two standard<br />

deviations are plotted as gray dashed and dashed-dotted l<strong>in</strong>e, respectively. MHC alleles discussed<br />

<strong>in</strong> the text are shown <strong>in</strong> blue.<br />

The average PCC yielded by ANN on the IEDB h9 data set is 0.54 with a m<strong>in</strong>imum <strong>of</strong><br />

0.19 (HLA-B*40:01) and a maximum <strong>of</strong> 0.83 (HLA-B*18:01). The <strong>in</strong>dividual performances<br />

are listed <strong>in</strong> Table B.5 <strong>in</strong> the appendix.<br />

Performance on Unseen Alleles<br />

Performance on unseen alleles is measured via a leave-one-out validation. In leave-one-out<br />

validation the data for one allele, the unseen allele, is omitted from tra<strong>in</strong><strong>in</strong>g. A model<br />

selection is performed on the rema<strong>in</strong><strong>in</strong>g data and the f<strong>in</strong>al model is used for predictions on<br />

the unseen allele. In this sett<strong>in</strong>g, the performance <strong>of</strong> UniTope on unknown MHC specificities<br />

is simulated.<br />

Leave-one-out validation is performed for each allele represented <strong>in</strong> the IEDB h9 data set.<br />

We determ<strong>in</strong>e the best parameter comb<strong>in</strong>ation via five-fold cross validation on the tra<strong>in</strong><strong>in</strong>g<br />

set. On average, UniTope yields a PCC <strong>of</strong> 0.55 with a m<strong>in</strong>imum <strong>of</strong> 0.01 (HLA-B*27:05) and<br />

a maximum <strong>of</strong> 0.82 (HLA-A*02:06). While this def<strong>in</strong>itely leaves room for improvement, it<br />

is on par with the performance <strong>of</strong> the allele-specific ANN. The <strong>in</strong>dividual performances <strong>of</strong><br />

UniTope are listed <strong>in</strong> Table B.6 <strong>in</strong> the appendix.<br />

UniTope performs particularly well on HLA-A*02:06 and produces merely random predictions<br />

for HLA-B*27:05. What determ<strong>in</strong>es whether UniTope will perform well on an<br />

allele? In order <strong>to</strong> analyze this, we consider each allele’s distance <strong>to</strong> its nearest neighbor <strong>in</strong><br />

the IEDB h9 data set. The distance between two alleles a and b is def<strong>in</strong>ed as the Euclidean

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!