New Approaches to in silico Design of Epitope-Based Vaccines

More documents

Recommendations

Info

34 CHAPTER 4. EPITOPE DISCOVERY properties from every position can be obtained using the following two formulations. The first is based on the polynomial kernel (3.21): K Ψ k,d (x, x′ ) = k i=1 Ψ(xi), Ψ(x ′ i) d , (4.2) and the second on the Gaussian RBF kernel (3.22): K Ψ k,σ (x, x′ k i=1 ) = exp − Ψ(xi) − Ψ(x ′ i )2 2σ2 . (4.3) Depending on the problem at hand, such a considerably richer feature space can be beneficial. Modified WD Kernel Replacing the substring comparison I(x = x ′ ) with one of the more general formulations in (4.1), (4.2), or (4.3) together with an AA encoding Ψ, directly implies a generalized form of the WD and other string kernels. For the WD kernel we can write: K wd,Ψ k K wd,Ψ k (x, x ′ ) = k d=1 βd L−d+1 i=1 K Ψ d (x [i:i+d], x ′ [i:i+d] ). (4.4) is a linear combination of kernels and therefore a valid kernel [85]. Independent of the choice of AASK, the modified WD kernel can be computed efficiently, with a complexity comparable to that of the original. Of particular interest is the WD-RBF kernel, i.e., the combination of the WD kernel and the RBF-AASK (4.3): K wd,Ψ k,σ (x, x′ ) = k d=1 βd L−d+1 i=1 exp − d j=1 Ψ(xj) − Ψ(x ′ j )2 2σ2 . (4.5) For σ → 0 and an encoding Ψ with Ψ(a) = Ψ(b) if and only if a = b, the WD-RBF kernel corresponds to the WD kernel: the RBF-AASK will be one only if the substrings are identical, otherwise it will be zero. Hence, as in the WD kernel only identical substrings will be considered. 4.2.3 Experimental Results We evaluate the classification and the regression performance of the proposed kernels on a benchmark data set for MHC binding prediction: The IEDB benchmark data set from Peters et al. [86], which is part of the Immune Epitope Database (IEDB) [87], contains quantitative binding data (IC50 values) for various MHC alleles. Splits for a five-fold cross validation are given. We employ a subset of this data (hereafter IEDB h9 ): binding data of nonameric peptides with respect to human MHC, yielding 35 allele-specific data sets. (The number of examples per allele in IEDB h9 can be found in Table B.1 in the appendix.)
4.2. IMPROVED KERNELS FOR MHC BINDING PREDICTION 35 For the classification, peptides with IC50 values greater than 500 nM were considered nonbinders, all others binders. We use three sets of physicochemical descriptors for AAs: (1) five descriptors derived from a principal component analysis of 237 physicochemical properties (pca), (2) three descriptors representing hydrophobicity, size, and electronic properties taken from the AAIndex (zscale), and (3) 20 descriptors corresponding to the respective entries of the BLOSUM50 substitution matrix [88] (blosum50). The main goal of the work presented in this section is the methodological improvement of existing string kernels by incorporation of prior knowledge on AA properties. In order to analyze the benefits of the proposed modifications we conducted performance comparisons between the original and the modified string kernels as well as standard kernels. Preliminary Performance Analysis Preliminary classification experiments on three human MHC alleles (HLA-A*23:01, HLA- B*58:01, HLA-A*02:01) were carried out to analyze the performance of the different kernels: WD (3.23), RBF (3.22), poly (3.21), WD-RBF (4.5), WD-poly (as WD-RBF, but with polynomial-AASK) combined with different encodings (pca, zscale, blosum50). The alleles were chosen to comprise a small data set (HLA-A*23:01, 104 examples) as well as a medium (HLA-B*58:01, 988 examples) and a large (HLA-A*02:01, 3089 examples) data set. The respective cross validation results are given in Table 4.1. For each of the alleles a different kernel type performs best: poly (pca) for HLA-A*23:01, RBF (blosum50) for HLA-B*58:01 and WD-RBF (blosum50) for HLA-A*02:01. The latter performs secondbest on HLA-A*23:01 and HLA-B*58:01. As for the benefits of the modification of the WD kernel, the WD-poly and WD-RBF kernels outperform the WD kernel in 17 out of 18 cases. Learning Curve Analysis From Table 4.1 the trend can be observed that the kernels that use AA properties benefit more for smaller datasets. In order to validate this hypothesis, we performed learning curve analyses for WD and WD-RBF (blosum50) in a classification and a regression setting on the largest data set, i.e., HLA-A*02:01. Performance is measured by averaging the auROC and the PCC, respectively. To average over different data splits in order to reduce random fluctuations of the performance, we performed 100 runs of two-times nested five-fold cross validation. In each run, thirty percent of the available data was used for testing. From the remaining data training sets of different sizes (20, 31, 50, 80, 128, 204, 324, 516, 822, 1308) were selected randomly. Figure 4.3 shows the mean performances with standard errors. Both for classification and regression, it can clearly be seen that the fewer examples are available for learning, the stronger is the improvement of the WD-RBF kernel over the WD kernel. Intuitively this makes sense, as the more data is available, the easier it will be to infer the relation of the AAs from the sequences in the training data alone.
Page 1 and 2: New Approaches to in silico Design
Page 3: Abstract Traditional trial-and-erro
Page 7 and 8: Acknowledgments First of all, I wou
Page 9 and 10: Contents 1 Introduction 1 2 Biologi
Page 11 and 12: 7.3 Design of String-of-Beads Vacci
Page 13 and 14: Chapter 1 Introduction Motivation T
Page 15 and 16: prediction of T-cell epitopes is a
Page 17 and 18: systemic property but employ peptid
Page 19 and 20: Chapter 2 Biological Background In
Page 21 and 22: 2.2. CELLULAR IMMUNE RESPONSE 9 A B
Page 23 and 24: 2.2. CELLULAR IMMUNE RESPONSE 11 [3
Page 25 and 26: 2.2. CELLULAR IMMUNE RESPONSE 13 Ho
Page 27 and 28: 2.3. VACCINES 15 the immune system
Page 29 and 30: Chapter 3 Algorithmic Background In
Page 31 and 32: 3.1. COMBINATORIAL OPTIMIZATION 19
Page 33 and 34: 3.2. MACHINE LEARNING 21 Figure 3.2
Page 41 and 42: Chapter 4 Epitope Discovery After h
Page 43 and 44: 4.1. INTRODUCTION 31 Figure 4.2: Sc
Page 45: 4.2. IMPROVED KERNELS FOR MHC BINDI
Page 49 and 50: 4.2. IMPROVED KERNELS FOR MHC BINDI
Page 51 and 52: 4.3. MHC BINDING PREDICTION FOR ALL
Page 57 and 58: 4.4. T-CELL EPITOPE PREDICTION 45 T
Page 59 and 60: 4.4. T-CELL EPITOPE PREDICTION 47 i
Page 61 and 62: 4.4. T-CELL EPITOPE PREDICTION 49 F
Page 63 and 64: 4.4. T-CELL EPITOPE PREDICTION 51 t
Page 65 and 66: Chapter 5 Epitope Selection The pre
Page 67 and 68: 5.3. MATHEMATICAL ABSTRACTION 55 im
Page 69 and 70: 5.3. MATHEMATICAL ABSTRACTION 57 fo
Page 71 and 72: 5.4. EXPERIMENTAL RESULTS 59 Let G
Page 73 and 74: 5.6. IMPLEMENTATION 61 number of no
Page 75 and 76: Chapter 6 Epitope Assembly The math
Page 77 and 78: 6.2. APPROACH 65 Figure 6.2: Graph
Page 79 and 80: 6.3. INCORPORATION OF PROTEASOMAL C
Page 81 and 82: 6.4. EXPERIMENTAL RESULTS 69 Figure
Page 83 and 84: 6.5. DISCUSSION 71 epitope sets wit
Page 85 and 86: Chapter 7 Applications In the previ
Page 87 and 88: 7.1. OPTITOPE - A WEB SERVER FOR EP
Page 89 and 90: 7.2. DESIGN OF A PEPTIDE COCKTAIL V
Page 95 and 96: 7.3. DESIGN OF STRING-OF-BEADS VACC
Page 97 and 98:
7.3. DESIGN OF STRING-OF-BEADS VACC
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Chapter 8 Discussion & Conclusion P
Page 107 and 108:
selection of an optimal epitope set
Page 109 and 110:
Appendix A Abbreviations AA Amino a
Page 111 and 112:
Appendix B Epitope Discovery Table
Page 113 and 114:
Table B.3: Regression performances
Page 115 and 116:
Table B.5: Overall performance of M
Page 117 and 118:
Table B.7: Gene expression omnibus
Page 119 and 120:
Appendix C Applications Table C.1:
Page 121 and 122:
Appendix D Contributions Chapter 4
Page 123 and 124:
Appendix E Publications Published M
Page 125 and 126:
Bibliography [1] M. Moutschen, P. L
Page 127 and 128:
BIBLIOGRAPHY 115 [19] T. Sturniolo,
Page 129 and 130:
BIBLIOGRAPHY 117 [44] F. Morein and
Page 131 and 132:
BIBLIOGRAPHY 119 [72] C. Widmer, N.
Page 133 and 134:
BIBLIOGRAPHY 121 [97] H. Parkinson,
Page 135 and 136:
BIBLIOGRAPHY 123 [123] NCBI dbMHC d
Page 137 and 138:
BIBLIOGRAPHY 125 [150] World Health
show all

New Approaches to in silico Design of Epitope-Based Vaccines

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?