New Approaches to in silico Design of Epitope-Based Vaccines
New Approaches to in silico Design of Epitope-Based Vaccines
New Approaches to in silico Design of Epitope-Based Vaccines
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
46 CHAPTER 4. EPITOPE DISCOVERY<br />
regard pMHC b<strong>in</strong>d<strong>in</strong>g as an estimate <strong>of</strong> pMHC:TCR aff<strong>in</strong>ity and <strong>in</strong>corporate it explicitly<br />
<strong>in</strong><strong>to</strong> our model <strong>of</strong> self-<strong>to</strong>lerance. The reported correlation between pMHC aff<strong>in</strong>ity and<br />
peptide immunogenicity [9] additionally justifies this approach.<br />
Measur<strong>in</strong>g Distance <strong>to</strong> Self<br />
We def<strong>in</strong>e the distance <strong>of</strong> a peptide <strong>to</strong> a set <strong>of</strong> peptides <strong>to</strong> correspond <strong>to</strong> the smallest pairwise<br />
distance <strong>to</strong> one <strong>of</strong> the peptides <strong>in</strong> the set. The distance between two peptides <strong>of</strong> equal<br />
length can be calculated <strong>in</strong> various ways, rang<strong>in</strong>g from a simple Hamm<strong>in</strong>g distance-based<br />
approach <strong>to</strong> substitution matrix-based distance measures. With<strong>in</strong> this work, we employ a<br />
distance measure that has been derived from the BLOSUM45 substitution matrix [88]. It<br />
corresponds <strong>to</strong> a sum <strong>of</strong> AA distances and can thus be represented by a symmetric 20×20matrix.<br />
The matrix is generated as follows: The substitution matrix A is turned <strong>in</strong><strong>to</strong> a<br />
symmetric matrix A ′ by replac<strong>in</strong>g each entry aij by aij+aji<br />
2<br />
. The entries are shifted by<br />
add<strong>in</strong>g the absolute value <strong>of</strong> the smallest entry, such that the result<strong>in</strong>g matrix A ′′ conta<strong>in</strong>s<br />
no negative entries: a ′′<br />
ij = a′ ij + |m<strong>in</strong>(A′ )|. Subsequently, the matrix entries are normalized<br />
via division by the maximum entry <strong>in</strong> A ′′ yield<strong>in</strong>g the matrix A ′′′ : a ′′′<br />
ij =<br />
a ′′<br />
ij<br />
max(A ′′ ) . In<br />
a last step, the distance matrix M is obta<strong>in</strong>ed by subtract<strong>in</strong>g the entries <strong>of</strong> A ′′′ from 1:<br />
mij = 1 − a ′′′<br />
ij .<br />
Given this distance matrix, the distance <strong>of</strong> a target peptide <strong>to</strong> the reference set can<br />
be determ<strong>in</strong>ed. However, l<strong>in</strong>early compar<strong>in</strong>g a peptide <strong>to</strong> a set <strong>of</strong> several hundreds <strong>of</strong><br />
thousands <strong>of</strong> peptides is computationally expensive. We therefore use a memory-efficient<br />
trie-based approach [98]. Each self-peptide is represented by a leaf <strong>of</strong> the trie. Peptides<br />
descend<strong>in</strong>g from the same node with<strong>in</strong> a trie have a common prefix. This representation<br />
<strong>of</strong> the reference peptide set allows us <strong>to</strong> prune branches conta<strong>in</strong><strong>in</strong>g peptides that are <strong>to</strong>o<br />
distant <strong>to</strong> be <strong>of</strong> <strong>in</strong>terest. Determ<strong>in</strong><strong>in</strong>g the self-peptide <strong>in</strong> the IPI-based peptide reference<br />
set (861,352 HLA-B*35:01-b<strong>in</strong>d<strong>in</strong>g n<strong>in</strong>emers) that is closest <strong>to</strong> a given target peptide takes<br />
less than a second.<br />
Feature Encod<strong>in</strong>g<br />
Each peptide is encoded based on a set <strong>of</strong> the k >> 0 most similar self-peptides from the<br />
respective reference proteome. The rationale beh<strong>in</strong>d consider<strong>in</strong>g several similar peptides<br />
<strong>in</strong>stead <strong>of</strong> the s<strong>in</strong>gle most similar peptide is that it allows <strong>to</strong> consider (a) high-aff<strong>in</strong>ity<br />
self-peptides that are sufficiently similar but not the most similar, and (b) similarity distributions<br />
among the closest self-peptides.<br />
For each target peptide p ∗ , the distances <strong>to</strong> the k nearest self-peptides p1, . . . , pk are<br />
determ<strong>in</strong>ed. Distances are <strong>in</strong> the range [0, 1] with 0 mean<strong>in</strong>g that the target peptide is<br />
identical <strong>to</strong> a self-peptide and 1 be<strong>in</strong>g the maximum possible distance with respect <strong>to</strong> the<br />
distance measure. Additionally, we determ<strong>in</strong>e the b<strong>in</strong>d<strong>in</strong>g aff<strong>in</strong>ities <strong>of</strong> the target and the k<br />
self-peptides us<strong>in</strong>g NetMHC. B<strong>in</strong>d<strong>in</strong>g aff<strong>in</strong>ities are also <strong>in</strong> the range [0, 1] with 0 mean<strong>in</strong>g<br />
low aff<strong>in</strong>ity, i.e., non-b<strong>in</strong>der, and 1 mean<strong>in</strong>g very high aff<strong>in</strong>ity, i.e., strong b<strong>in</strong>der. Let d(p)<br />
be the distance <strong>of</strong> peptide p <strong>to</strong> p ∗ and b(p) the b<strong>in</strong>d<strong>in</strong>g aff<strong>in</strong>ity <strong>of</strong> peptide p <strong>to</strong> the MHC<br />
molecule correspond<strong>in</strong>g <strong>to</strong> the allele under consideration. The self-<strong>to</strong>lerance feature vec<strong>to</strong>r