05.08.2013 Views

New Approaches to in silico Design of Epitope-Based Vaccines

New Approaches to in silico Design of Epitope-Based Vaccines

New Approaches to in silico Design of Epitope-Based Vaccines

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.3. MHC BINDING PREDICTION FOR ALL MHC-I ALLELES 39<br />

Performance tests with our approach yield promis<strong>in</strong>g figures: UniTope performs better<br />

than exist<strong>in</strong>g allele-specific methods on alleles <strong>in</strong>cluded <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g set. On alleles not<br />

<strong>in</strong>cluded <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g set, which correspond <strong>to</strong> alleles without experimental b<strong>in</strong>d<strong>in</strong>g data,<br />

it achieves remarkable results, comparable <strong>to</strong> those <strong>of</strong> allele-specific methods.<br />

4.3.2 Methods<br />

Pocket Def<strong>in</strong>ition<br />

A pocket <strong>of</strong> the MHC-I b<strong>in</strong>d<strong>in</strong>g groove is composed <strong>of</strong> all residues <strong>in</strong> contact with the<br />

correspond<strong>in</strong>g residue <strong>of</strong> a bound n<strong>in</strong>emer: MHC residues <strong>in</strong> contact with the first residue<br />

<strong>of</strong> the peptide belong <strong>to</strong> the first pocket, those <strong>in</strong> contact with the second residue belong <strong>to</strong><br />

the second and so forth. The MHC-I sequence <strong>in</strong>dices <strong>of</strong> residues found <strong>to</strong> contribute <strong>to</strong> a<br />

specific pocket are recorded <strong>in</strong> the pocket pr<strong>of</strong>ile. In order <strong>to</strong> determ<strong>in</strong>e the pocket pr<strong>of</strong>iles,<br />

3D structures <strong>of</strong> nonameric peptides bound <strong>to</strong> MHC-I molecules had <strong>to</strong> be analyzed. 75<br />

crystal structures <strong>of</strong> such pMHC-I complexes were retrieved from the Prote<strong>in</strong> Data Bank<br />

(PDB) [90] and analyzed us<strong>in</strong>g the BALL framework [91]. A list <strong>of</strong> these 75 structures is<br />

given <strong>in</strong> Table B.4 <strong>in</strong> the appendix. We applied the SS contact criterion [78] <strong>to</strong> determ<strong>in</strong>e<br />

contacts between MHC and peptide residues: An MHC residue and a peptide residue are<br />

def<strong>in</strong>ed <strong>to</strong> be <strong>in</strong> contact if they are at most 4Å apart. Interactions with the MHC backbone<br />

as well as with the peptide backbone are omitted. To ensure consistent <strong>in</strong>dex<strong>in</strong>g,<br />

MHC-I position <strong>in</strong>dices were determ<strong>in</strong>ed as follows: We retrieved AA sequences derived<br />

from all known human MHC-I alleles from the IMGT/HLA database [92] (release 2.16).<br />

Sequences derived from alleles which have been shown not <strong>to</strong> be expressed were discarded.<br />

Furthermore, all sequences with an <strong>in</strong>complete b<strong>in</strong>d<strong>in</strong>g groove were removed. A multiple<br />

sequence alignment (MSA) <strong>of</strong> the rema<strong>in</strong><strong>in</strong>g sequences us<strong>in</strong>g ClustalW [93] showed<br />

a conserved sequence (GSHSMRYF) at the beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> the α-cha<strong>in</strong>. All MHC sequences<br />

were truncated <strong>to</strong> beg<strong>in</strong> with this conserved sequence. The result<strong>in</strong>g pocket pr<strong>of</strong>iles are<br />

displayed <strong>in</strong> Figure 4.5.<br />

Prediction Model<br />

Our aim is <strong>to</strong> develop a s<strong>in</strong>gle prediction model for all known allelic variants <strong>of</strong> MHC-I<br />

molecules. Hence, our <strong>in</strong>stances are MHC-peptide pairs and the feature vec<strong>to</strong>rs comprise<br />

an encod<strong>in</strong>g <strong>of</strong> the peptide and an encod<strong>in</strong>g <strong>of</strong> the MHC allele, more precisely, <strong>of</strong> the<br />

correspond<strong>in</strong>g gene product. We use the five-dimensional pca encod<strong>in</strong>g [94], which was<br />

described <strong>in</strong> Section 4.2, <strong>to</strong> encode <strong>in</strong>dividual AAs. The nonameric peptides are encoded<br />

AA-wise, yield<strong>in</strong>g 45 features. The MHC alleles are encoded pocket-wise. In order <strong>to</strong><br />

model the physicochemical environment with<strong>in</strong> the pockets, a pocket P = {p1, p2, . . . , pn}<br />

is encoded by averag<strong>in</strong>g over the pca encod<strong>in</strong>gs <strong>of</strong> the pocket’s residues, i.e.,<br />

Φ P pca(P ) = 1<br />

n<br />

n<br />

i=1<br />

Φ AA<br />

pca(pi) (4.6)<br />

where Φ AA<br />

pca(p) is the pca encod<strong>in</strong>g <strong>of</strong> AA p. This yields feature vec<strong>to</strong>rs <strong>of</strong> length 90: five<br />

features per peptide AA and five features per pocket. <strong>Based</strong> on these feature vec<strong>to</strong>rs and

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!