New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

More documents

Recommendations

Info

76 CHAPTER 4. (BIO-)MEDICAL APPLICATIONS Bootstrap (Efron, 1979): The bootstrap is a method for estimating the variance and the distribution of a statistic Tn = g(X1, . . . , Xn) (note that Tn needs to be Hadamard Differentiable, see e.g. (Shao and Tu, 1995)). In principle it can also be used to estimate some parameter θ. This method first creates an infinitely large mega data set by copying the original data set many time. Then a large number of different samples are drawn from this mega set and analyses are performed separately for each sample and the results averaged. Thus, a lot of configurations (including configurations in which an item may be represented several times or not at all) are considered and conclusion about generalization of the results can be drawn. It is a robust alternative to inference based on parametric assumptions when those assumptions are in doubt, or where parametric inference is impossible. Opposed to jackknife the bootstrap gives slightly different results when repeated on the same data. In the real world we would sample n data points (X1 . . . , Xn) from some CDF F and calculate a statistic Tn = g(X1 . . . , Xn). Transferred to the bootstrap world, we sample n data points (X ∗ 1 . . . , X∗ n) from ˆ Fn and estimate a statistic T ∗ n = g(X ∗ 1 . . . , X∗ n). Drawing n points at random from ˆ Fn is the same as drawing a sample of size n with replacement from (X1 . . . , Xn) (the original data). By the law of large numbers we know that vboot a.s. −→VFn ˆ (Tn) as B → ∞. It follows VF (Tn) O(1/ √ n) �� ≈ VFn ˆ (Tn) O(1/ √ B) �� ≈ vboot For the parameter estimation, the number of the bootstrap samples B is usually chosen to be around 200. The algorithm for estimating the variance of some statistic Tn is as follows: � Given data: X = (X1, . . . , Xn) � Repeat the following two steps i = 1 . . . B times 1. Draw X ∗ = (X ∗ 1 , . . . , X∗ n) with replacement from X 2. Calculate T ∗ n,i = g(X∗ 1 , . . . X∗ n) � This results in B estimators (T ∗ n,1 , . . . , T ∗ n,B ) and can be used for various purposes (for variance estimation, for interval estimation, hypothesis testing and so on). For example the variance estimator is computed by: vboot = 1 B · B� b=1 � T ∗ 1 n,b − B · and the estimator for the standard error by: Jackknife vs. Bootstrap ˆseboot = √ vboot Since the jackknife only needs n computations it is usually easier computable compared to about 200-300 replications needed for the bootstrap. However, only using the n jackknife samples, the jackknife uses only limited information about the statistic ˆ θ. It can be shown that asymptotically the estimators of B� i=1 T ∗ n,i � 2
4.2. STATISTICAL REMARKS 77 the bootstrap and the jackknife algorithms are in fact equal (see e.g. (Efron and Tibshirani, 1994; Fan and Wang, 1995)). Since the bootstrap method can also be used with small sample sizes (opposed to the jackknife) this method should be favored when only one technique can be applied. Confidence Intervals As described above the p value can easily be misinterpreted because it combines information about effect size with information about the precision of the effect size estimate. Opposed to that, confidence intervals offer the estimate of some meaningful parameter (e.g. then mean) and the precision of that estimate. For example, rather than reporting usage of drug A yields an improvement on a significance level α < 0.01, using confidence intervals allows to report that this drug yields an improvement of 20% with a 95% confidence interval of 15 . . . 24%. This means, that the best (point) estimate for this parameter is 20% which equals the observed parameter in this study. The interval endpoints (15% and 24%) reflect the variability of the parameter in this population and are consistent with the observed data: in 95% of replications of the process of obtaining the data the interval will include the parameter. The chosen level of confidence is again arbitrary - common values are 80%, 90% or 95%. It is important to realize that the interval endpoints themselves are random variables also estimated using sample data. That is, the confidence interval does not indicate that, given the endpoints, the chance are X% that the interval will include the parameter (note that it is the confidence interval that is random, not the unknown parameter). However, confidence intervals do have a very appealing feature: even if all the research in an area of inquiry was based on radically erroneous estimates of parameters, the parameters would still emerge across studies as a series of overlapping confidence intervals converging on the same parameter. Another quite important fact is that taking the intersection of two confidence intervals C1, C2 (with level α1 and α2, respectively) decreases the power of the new confidence interval to 1 − α1 − α2. This is because the probability that C1 ∩ C2 does not contain θ is the probability that either interval does not contain θ. This is less than or equal to the sum of the probabilities of those two events - α1 + α2. Therefore, C1 ∩ C2 is a level 1 − α1 − α2 confidence region for θ. Thus, a smaller region is attained, but at a reduced confidence level. Calculating confidence intervals (CI) for some quantity θ can be done in numerous ways. We will introduce the two main bootstrap-based approaches, that account for the distribution of Tn, namely if Tn is (close to) a normal distribution or if it is not: (1) If Tn is close to a normal distribution the computation of CI is quite simple: where Cn = (Tn − z α/2 · ˆseboot, Tn + z α/2 · ˆseboot) zα = φ −1 · (1 − α)
Page 1 and 2:
New Statistical Algorithms for the
Page 3 and 4:
Contents Acknowledgments . . . . .
Page 5 and 6:
New Statistical Algorithms for the
Page 7 and 8:
Extended Abstract English Version M
Page 9 and 10:
German Version Das Gebiet der Prote
Page 11 and 12:
Chapter 1 Introduction and Survey 1
Page 13 and 14:
1.2. GOALS, OBJECTIVES AND TASKS 7
Page 15 and 16:
1.2. GOALS, OBJECTIVES AND TASKS 9
Page 17 and 18:
Chapter 2 Preliminaries 2.1 Topic O
Page 19 and 20:
2.1. TOPIC OVERVIEW 13 Figure 2.1.1
Page 21 and 22:
2.1. TOPIC OVERVIEW 15 completeness
Page 23 and 24:
2.2. AN EXAMPLE 17 (a) Opera A (b)
Page 25 and 26:
2.2. AN EXAMPLE 19 Figure 2.2.6: Tw
Page 27 and 28:
2.2. AN EXAMPLE 21 successes. We ca
Page 29 and 30:
Chapter 3 Mathematical Modeling and
Page 31 and 32: 3.2. INTRODUCTION TO MALDI TOF MS 2
Page 37 and 38: 3.3. PREPROCESSING 31 mix (external
Page 39 and 40: 3.3. PREPROCESSING 33 Figure 3.3.5:
Page 41 and 42: 3.3. PREPROCESSING 35 Figure 3.3.7:
Page 43 and 44: 3.4. HIGHLY SENSITIVE PEAK DETECTIO
Page 49 and 50: 3.5. PEAK DETECTION IN 2D MAPS 43
Page 51 and 52: 3.6. PEAK REGISTRATION (ALIGNMENT)
Page 57 and 58: 3.7. IDENTIFYING POTENTIAL FEATURES
Page 63 and 64: 3.8. EXTRACTING FINGERPRINTS 57 Fig
Page 65 and 66: 3.8. EXTRACTING FINGERPRINTS 59 FID
Page 67 and 68: 3.8. EXTRACTING FINGERPRINTS 61 Dim
Page 69 and 70: 3.9. COMPLEXITY ANALYSIS 63 3.9 Com
Page 71 and 72: Chapter 4 (Bio-)Medical Application
Page 73 and 74: 4.1. DATA USED 67 4.1.2 Serum Data
Page 75 and 76: 4.2. STATISTICAL REMARKS 69 1. Vali
Page 77 and 78: 4.2. STATISTICAL REMARKS 71 Molar v
Page 79 and 80: 4.2. STATISTICAL REMARKS 73 � Fir
Page 81: 4.2. STATISTICAL REMARKS 75 Let ˆ
Page 85 and 86: 4.3. STUDY RESULTS 79 Figure 4.3.3:
Page 87 and 88: 4.3. STUDY RESULTS 81 Figure 4.3.4:
Page 89 and 90: 4.3. STUDY RESULTS 83 � kNN (gen.
Page 91 and 92: 4.3. STUDY RESULTS 85 d(x, θi) =
Page 93 and 94: 4.3. STUDY RESULTS 87 pairs of obje
Page 95 and 96: 4.3. STUDY RESULTS 89 � “Peptid
Page 97 and 98: 4.4. IDENTIFICATION OF PROTEOMIC FI
Page 107 and 108: 4.6. BIOLOGICAL APPLICATIONS 101 4.
Page 109 and 110: Chapter 5 Computer Science Grid Str
Page 111 and 112: 5.1. INTRODUCTION 105 � A node is
Page 113 and 114: 5.1. INTRODUCTION 107 of and config
Page 115 and 116: 5.1. INTRODUCTION 109 particular pr
Page 117 and 118: 5.2. THE QUASI AD-HOC (QAD) GRID 11
Page 119 and 120: 5.2. THE QUASI AD-HOC (QAD) GRID 11
Page 121 and 122: 5.3. QAD GRID PLATFORM SERVER 115 F
Page 123 and 124: 5.3. QAD GRID PLATFORM SERVER 117 j
Page 125 and 126: 5.3. QAD GRID PLATFORM SERVER 119 (
Page 127 and 128: 5.3. QAD GRID PLATFORM SERVER 121 p
Page 129 and 130: 5.3. QAD GRID PLATFORM SERVER 123 D
Page 131 and 132: 5.3. QAD GRID PLATFORM SERVER 125 F
Page 133 and 134:
5.3. QAD GRID PLATFORM SERVER 127 t
Page 135 and 136:
5.4. QAD GRID WORKER 129 field. A w
Page 137 and 138:
5.4. QAD GRID WORKER 131 2. This re
Page 139 and 140:
5.4. QAD GRID WORKER 133 Figure 5.4
Page 141 and 142:
5.4. QAD GRID WORKER 135 Checkpoint
Page 143 and 144:
5.4. QAD GRID WORKER 137 database b
Page 145 and 146:
5.5. QAD GRID PLATFORM SERVICES 139
Page 147 and 148:
5.6. QAD GRID WORKFLOWS 141 � Dat
Page 149 and 150:
5.6. QAD GRID WORKFLOWS 143 Service
Page 151 and 152:
5.6. QAD GRID WORKFLOWS 145 Figure
Page 153 and 154:
5.7. RELATED WORK 147 to set-up sys
Page 155 and 156:
5.7. RELATED WORK 149 Table 5.7.1 -
Page 157 and 158:
Chapter 6 proteomics.net - Product-
Page 159 and 160:
6.2. CASE STUDIES 153 6.2 Case Stud
Page 161 and 162:
6.2. CASE STUDIES 155 Figure 6.2.2:
Page 163 and 164:
6.2. CASE STUDIES 157 MASCOT and SE
Page 165 and 166:
6.2. CASE STUDIES 159 The peak pick
Page 167 and 168:
6.2. CASE STUDIES 161 first entry i
Page 169 and 170:
6.2. CASE STUDIES 163 Approach Base
Page 171 and 172:
6.2. CASE STUDIES 165 Figure 6.2.5:
Page 173 and 174:
Chapter 7 Related Work In this chap
Page 175 and 176:
Chapter 8 Conclusion and Future Dir
Page 177 and 178:
8.3. FROM BIOMARKERS TO BIOPRINTS 1
Page 179 and 180:
Appendix A Implementation Details T
Page 181 and 182:
Appendix B Curriculum Vitae Name Ti
Page 183 and 184:
References Aebersold, R. and Mann,
Page 185 and 186:
REFERENCES 179 Breiman, L. (2001).
Page 187 and 188:
REFERENCES 181 Downard, K. M. and M
Page 189 and 190:
REFERENCES 183 Gillette, M. A., Man
Page 191 and 192:
REFERENCES 185 Huyghe, E., Muller,
Page 193 and 194:
REFERENCES 187 Kuijpens, J. L. P.,
Page 195 and 196:
REFERENCES 189 McLachlan, S. M. and
Page 197 and 198:
REFERENCES 191 Platt, J. C. (1999).
Page 199 and 200:
REFERENCES 193 Stone, M. (1974). Cr
Page 201 and 202:
REFERENCES 195 Washburn, M. P., Wol
show all

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?