New Approaches to in silico Design of Epitope-Based Vaccines
New Approaches to in silico Design of Epitope-Based Vaccines
New Approaches to in silico Design of Epitope-Based Vaccines
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4.3. MHC BINDING PREDICTION FOR ALL MHC-I ALLELES 39<br />
Performance tests with our approach yield promis<strong>in</strong>g figures: UniTope performs better<br />
than exist<strong>in</strong>g allele-specific methods on alleles <strong>in</strong>cluded <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g set. On alleles not<br />
<strong>in</strong>cluded <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g set, which correspond <strong>to</strong> alleles without experimental b<strong>in</strong>d<strong>in</strong>g data,<br />
it achieves remarkable results, comparable <strong>to</strong> those <strong>of</strong> allele-specific methods.<br />
4.3.2 Methods<br />
Pocket Def<strong>in</strong>ition<br />
A pocket <strong>of</strong> the MHC-I b<strong>in</strong>d<strong>in</strong>g groove is composed <strong>of</strong> all residues <strong>in</strong> contact with the<br />
correspond<strong>in</strong>g residue <strong>of</strong> a bound n<strong>in</strong>emer: MHC residues <strong>in</strong> contact with the first residue<br />
<strong>of</strong> the peptide belong <strong>to</strong> the first pocket, those <strong>in</strong> contact with the second residue belong <strong>to</strong><br />
the second and so forth. The MHC-I sequence <strong>in</strong>dices <strong>of</strong> residues found <strong>to</strong> contribute <strong>to</strong> a<br />
specific pocket are recorded <strong>in</strong> the pocket pr<strong>of</strong>ile. In order <strong>to</strong> determ<strong>in</strong>e the pocket pr<strong>of</strong>iles,<br />
3D structures <strong>of</strong> nonameric peptides bound <strong>to</strong> MHC-I molecules had <strong>to</strong> be analyzed. 75<br />
crystal structures <strong>of</strong> such pMHC-I complexes were retrieved from the Prote<strong>in</strong> Data Bank<br />
(PDB) [90] and analyzed us<strong>in</strong>g the BALL framework [91]. A list <strong>of</strong> these 75 structures is<br />
given <strong>in</strong> Table B.4 <strong>in</strong> the appendix. We applied the SS contact criterion [78] <strong>to</strong> determ<strong>in</strong>e<br />
contacts between MHC and peptide residues: An MHC residue and a peptide residue are<br />
def<strong>in</strong>ed <strong>to</strong> be <strong>in</strong> contact if they are at most 4Å apart. Interactions with the MHC backbone<br />
as well as with the peptide backbone are omitted. To ensure consistent <strong>in</strong>dex<strong>in</strong>g,<br />
MHC-I position <strong>in</strong>dices were determ<strong>in</strong>ed as follows: We retrieved AA sequences derived<br />
from all known human MHC-I alleles from the IMGT/HLA database [92] (release 2.16).<br />
Sequences derived from alleles which have been shown not <strong>to</strong> be expressed were discarded.<br />
Furthermore, all sequences with an <strong>in</strong>complete b<strong>in</strong>d<strong>in</strong>g groove were removed. A multiple<br />
sequence alignment (MSA) <strong>of</strong> the rema<strong>in</strong><strong>in</strong>g sequences us<strong>in</strong>g ClustalW [93] showed<br />
a conserved sequence (GSHSMRYF) at the beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> the α-cha<strong>in</strong>. All MHC sequences<br />
were truncated <strong>to</strong> beg<strong>in</strong> with this conserved sequence. The result<strong>in</strong>g pocket pr<strong>of</strong>iles are<br />
displayed <strong>in</strong> Figure 4.5.<br />
Prediction Model<br />
Our aim is <strong>to</strong> develop a s<strong>in</strong>gle prediction model for all known allelic variants <strong>of</strong> MHC-I<br />
molecules. Hence, our <strong>in</strong>stances are MHC-peptide pairs and the feature vec<strong>to</strong>rs comprise<br />
an encod<strong>in</strong>g <strong>of</strong> the peptide and an encod<strong>in</strong>g <strong>of</strong> the MHC allele, more precisely, <strong>of</strong> the<br />
correspond<strong>in</strong>g gene product. We use the five-dimensional pca encod<strong>in</strong>g [94], which was<br />
described <strong>in</strong> Section 4.2, <strong>to</strong> encode <strong>in</strong>dividual AAs. The nonameric peptides are encoded<br />
AA-wise, yield<strong>in</strong>g 45 features. The MHC alleles are encoded pocket-wise. In order <strong>to</strong><br />
model the physicochemical environment with<strong>in</strong> the pockets, a pocket P = {p1, p2, . . . , pn}<br />
is encoded by averag<strong>in</strong>g over the pca encod<strong>in</strong>gs <strong>of</strong> the pocket’s residues, i.e.,<br />
Φ P pca(P ) = 1<br />
n<br />
n<br />
i=1<br />
Φ AA<br />
pca(pi) (4.6)<br />
where Φ AA<br />
pca(p) is the pca encod<strong>in</strong>g <strong>of</strong> AA p. This yields feature vec<strong>to</strong>rs <strong>of</strong> length 90: five<br />
features per peptide AA and five features per pocket. <strong>Based</strong> on these feature vec<strong>to</strong>rs and