05.08.2013 Views

New Approaches to in silico Design of Epitope-Based Vaccines

New Approaches to in silico Design of Epitope-Based Vaccines

New Approaches to in silico Design of Epitope-Based Vaccines

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

46 CHAPTER 4. EPITOPE DISCOVERY<br />

regard pMHC b<strong>in</strong>d<strong>in</strong>g as an estimate <strong>of</strong> pMHC:TCR aff<strong>in</strong>ity and <strong>in</strong>corporate it explicitly<br />

<strong>in</strong><strong>to</strong> our model <strong>of</strong> self-<strong>to</strong>lerance. The reported correlation between pMHC aff<strong>in</strong>ity and<br />

peptide immunogenicity [9] additionally justifies this approach.<br />

Measur<strong>in</strong>g Distance <strong>to</strong> Self<br />

We def<strong>in</strong>e the distance <strong>of</strong> a peptide <strong>to</strong> a set <strong>of</strong> peptides <strong>to</strong> correspond <strong>to</strong> the smallest pairwise<br />

distance <strong>to</strong> one <strong>of</strong> the peptides <strong>in</strong> the set. The distance between two peptides <strong>of</strong> equal<br />

length can be calculated <strong>in</strong> various ways, rang<strong>in</strong>g from a simple Hamm<strong>in</strong>g distance-based<br />

approach <strong>to</strong> substitution matrix-based distance measures. With<strong>in</strong> this work, we employ a<br />

distance measure that has been derived from the BLOSUM45 substitution matrix [88]. It<br />

corresponds <strong>to</strong> a sum <strong>of</strong> AA distances and can thus be represented by a symmetric 20×20matrix.<br />

The matrix is generated as follows: The substitution matrix A is turned <strong>in</strong><strong>to</strong> a<br />

symmetric matrix A ′ by replac<strong>in</strong>g each entry aij by aij+aji<br />

2<br />

. The entries are shifted by<br />

add<strong>in</strong>g the absolute value <strong>of</strong> the smallest entry, such that the result<strong>in</strong>g matrix A ′′ conta<strong>in</strong>s<br />

no negative entries: a ′′<br />

ij = a′ ij + |m<strong>in</strong>(A′ )|. Subsequently, the matrix entries are normalized<br />

via division by the maximum entry <strong>in</strong> A ′′ yield<strong>in</strong>g the matrix A ′′′ : a ′′′<br />

ij =<br />

a ′′<br />

ij<br />

max(A ′′ ) . In<br />

a last step, the distance matrix M is obta<strong>in</strong>ed by subtract<strong>in</strong>g the entries <strong>of</strong> A ′′′ from 1:<br />

mij = 1 − a ′′′<br />

ij .<br />

Given this distance matrix, the distance <strong>of</strong> a target peptide <strong>to</strong> the reference set can<br />

be determ<strong>in</strong>ed. However, l<strong>in</strong>early compar<strong>in</strong>g a peptide <strong>to</strong> a set <strong>of</strong> several hundreds <strong>of</strong><br />

thousands <strong>of</strong> peptides is computationally expensive. We therefore use a memory-efficient<br />

trie-based approach [98]. Each self-peptide is represented by a leaf <strong>of</strong> the trie. Peptides<br />

descend<strong>in</strong>g from the same node with<strong>in</strong> a trie have a common prefix. This representation<br />

<strong>of</strong> the reference peptide set allows us <strong>to</strong> prune branches conta<strong>in</strong><strong>in</strong>g peptides that are <strong>to</strong>o<br />

distant <strong>to</strong> be <strong>of</strong> <strong>in</strong>terest. Determ<strong>in</strong><strong>in</strong>g the self-peptide <strong>in</strong> the IPI-based peptide reference<br />

set (861,352 HLA-B*35:01-b<strong>in</strong>d<strong>in</strong>g n<strong>in</strong>emers) that is closest <strong>to</strong> a given target peptide takes<br />

less than a second.<br />

Feature Encod<strong>in</strong>g<br />

Each peptide is encoded based on a set <strong>of</strong> the k >> 0 most similar self-peptides from the<br />

respective reference proteome. The rationale beh<strong>in</strong>d consider<strong>in</strong>g several similar peptides<br />

<strong>in</strong>stead <strong>of</strong> the s<strong>in</strong>gle most similar peptide is that it allows <strong>to</strong> consider (a) high-aff<strong>in</strong>ity<br />

self-peptides that are sufficiently similar but not the most similar, and (b) similarity distributions<br />

among the closest self-peptides.<br />

For each target peptide p ∗ , the distances <strong>to</strong> the k nearest self-peptides p1, . . . , pk are<br />

determ<strong>in</strong>ed. Distances are <strong>in</strong> the range [0, 1] with 0 mean<strong>in</strong>g that the target peptide is<br />

identical <strong>to</strong> a self-peptide and 1 be<strong>in</strong>g the maximum possible distance with respect <strong>to</strong> the<br />

distance measure. Additionally, we determ<strong>in</strong>e the b<strong>in</strong>d<strong>in</strong>g aff<strong>in</strong>ities <strong>of</strong> the target and the k<br />

self-peptides us<strong>in</strong>g NetMHC. B<strong>in</strong>d<strong>in</strong>g aff<strong>in</strong>ities are also <strong>in</strong> the range [0, 1] with 0 mean<strong>in</strong>g<br />

low aff<strong>in</strong>ity, i.e., non-b<strong>in</strong>der, and 1 mean<strong>in</strong>g very high aff<strong>in</strong>ity, i.e., strong b<strong>in</strong>der. Let d(p)<br />

be the distance <strong>of</strong> peptide p <strong>to</strong> p ∗ and b(p) the b<strong>in</strong>d<strong>in</strong>g aff<strong>in</strong>ity <strong>of</strong> peptide p <strong>to</strong> the MHC<br />

molecule correspond<strong>in</strong>g <strong>to</strong> the allele under consideration. The self-<strong>to</strong>lerance feature vec<strong>to</strong>r

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!