05.08.2013 Views

New Approaches to in silico Design of Epitope-Based Vaccines

New Approaches to in silico Design of Epitope-Based Vaccines

New Approaches to in silico Design of Epitope-Based Vaccines

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

42 CHAPTER 4. EPITOPE DISCOVERY<br />

distance between their pocket encod<strong>in</strong>gs:<br />

d(a, b) =<br />

<br />

<br />

Φ P pca(P 1 a ), . . . , Φ P pca(P 9 a ) − Φ P pca(P 1 b ), . . . , ΦP pca(P 9 b ) <br />

where P i x is the i-th pocket <strong>of</strong> allele x. Figure 4.6 plots these distances aga<strong>in</strong>st the respective<br />

UniTope performance. The PCC between the distance <strong>to</strong> the nearest neighbor and the<br />

performance <strong>of</strong> UniTope is −0.59. In the follow<strong>in</strong>g we will exam<strong>in</strong>e some <strong>of</strong> the alleles<br />

that deviate strongly from the regression l<strong>in</strong>e <strong>in</strong> Figure 4.6. HLA-B*27:05 (PCC = 0.01)<br />

belongs <strong>to</strong> the B27 supertype [28]. Its b<strong>in</strong>d<strong>in</strong>g motif differs from those <strong>of</strong> the other alleles<br />

<strong>in</strong> the N-term<strong>in</strong>al anchor position, where it prefers basic residues [28]. Hence, <strong>in</strong> the<br />

leave-one-out validation none <strong>of</strong> the alleles <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g set is suited as a representative.<br />

The same is true for HLA-B*08:01 (PCC = 0.04), which is the only B08 allele <strong>in</strong> the<br />

data set. Accord<strong>in</strong>g <strong>to</strong> [28], B08 alleles display a unique mode <strong>of</strong> peptide b<strong>in</strong>d<strong>in</strong>g. The<br />

rema<strong>in</strong><strong>in</strong>g alleles’ b<strong>in</strong>d<strong>in</strong>g <strong>in</strong>formation is thus <strong>of</strong> no avail <strong>to</strong> predict<strong>in</strong>g HLA-B*08:01 b<strong>in</strong>d<strong>in</strong>g<br />

aff<strong>in</strong>ities. A typical phenomenon <strong>of</strong> pan-specific predictions is displayed by HLA-B*58:01<br />

(PCC = 0.22) and its nearest neighbor HLA-B*57:01 (PCC = 0.58), both from the same<br />

supertype: while the sparsely populated HLA-B*57:01 (59 data po<strong>in</strong>ts) benefits from the<br />

highly populated HLA-B*58:01 (988 data po<strong>in</strong>ts), the few HLA-B*57:01 data po<strong>in</strong>ts do<br />

not suffice <strong>to</strong> adequately represent HLA-B*58:01 <strong>in</strong> the leave-one-out validation. HLA-<br />

A*24:03 (PCC = 0.23) is located very close <strong>to</strong> the related HLA-A*24:02 (PCC = 0.51) <strong>in</strong><br />

feature space. Both alleles have only few data po<strong>in</strong>ts <strong>in</strong> the IEDB h9 data set, 254 and 197,<br />

respectively. This suggests a similar UniTope performance for both alleles <strong>in</strong> the leave-oneout<br />

validation. However, this is not the case: we observe a deviation <strong>of</strong> more than two<br />

standard deviations from the regression l<strong>in</strong>e for HLA-A*24:03 and <strong>of</strong> about one standard<br />

deviation for HLA-A*24:02. This discrepancy can be expla<strong>in</strong>ed by the presence <strong>of</strong> another<br />

allele from the same supertype: HLA-A*23:01 (PCC = 0.67), which is also located nearby.<br />

Analysis <strong>of</strong> the three allele-specific data sets reveals a strong overlap <strong>in</strong> the data sets <strong>of</strong><br />

HLA-A*23:01 and HLA-A*24:02: 78 peptides are conta<strong>in</strong>ed <strong>in</strong> both data sets. The b<strong>in</strong>d<strong>in</strong>g<br />

aff<strong>in</strong>ities <strong>of</strong> these peptides with respect <strong>to</strong> HLA-A*23:01 and HLA-A*24:02 correlate very<br />

well (PCC = 0.71). In contrast, the HLA-A*24:03 data set does not overlap with the other<br />

data sets.<br />

Performance on Seen Alleles<br />

The performance on seen alleles is measured via a two-times nested five-fold cross validation<br />

us<strong>in</strong>g the splits specified <strong>in</strong> the IEDB benchmark data set. UniTope clearly outperforms<br />

the allele-specific ANN. It yields an average PCC <strong>of</strong> 0.67 with a m<strong>in</strong>imum <strong>of</strong> 0.39<br />

(HLA-B*08:01) and a maximum <strong>of</strong> 0.84 (HLA-A*02:06, HLA-A*11:01). The <strong>in</strong>dividual<br />

performances are listed <strong>in</strong> Table B.5 <strong>in</strong> the appendix. In this sett<strong>in</strong>g, the performance<br />

on the special cases HLA-B*27:05 and HLA-B*08:01 is significantly better than <strong>in</strong> the<br />

leave-one-out validation: 0.47 vs 0.01 and 0.39 vs. 0.04, respectively.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!