08.02.2013 Views

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

76 CHAPTER 4. (BIO-)MEDICAL APPLICATIONS<br />

Bootstrap (Efron, 1979): The bootstrap is a method <strong>for</strong> estimating <strong>the</strong><br />

variance and <strong>the</strong> distribution <strong>of</strong> a statistic Tn = g(X1, . . . , Xn) (note that<br />

Tn needs to be Hadamard Differentiable, see e.g. (Shao and Tu, 1995)). In<br />

principle it can also be used to estimate some parameter θ. This method first<br />

creates an infinitely large mega data set by copying <strong>the</strong> original data set many<br />

time. Then a large number <strong>of</strong> different samples are drawn from this mega set<br />

and analyses are per<strong>for</strong>med separately <strong>for</strong> each sample and <strong>the</strong> results averaged.<br />

Thus, a lot <strong>of</strong> configurations (including configurations in which an item<br />

may be represented several times or not at all) are considered and conclusion<br />

about generalization <strong>of</strong> <strong>the</strong> results can be drawn. It is a robust alternative<br />

to inference based on parametric assumptions when those assumptions are in<br />

doubt, or where parametric inference is impossible. Opposed to jackknife <strong>the</strong><br />

bootstrap gives slightly different results when repeated on <strong>the</strong> same data.<br />

In <strong>the</strong> real world we would sample n data points (X1 . . . , Xn) from some<br />

CDF F and calculate a statistic Tn = g(X1 . . . , Xn). Transferred to <strong>the</strong> bootstrap<br />

world, we sample n data points (X ∗ 1 . . . , X∗ n) from ˆ Fn and estimate a<br />

statistic T ∗ n = g(X ∗ 1 . . . , X∗ n). Drawing n points at random from ˆ Fn is <strong>the</strong><br />

same as drawing a sample <strong>of</strong> size n with replacement from (X1 . . . , Xn) (<strong>the</strong><br />

original data). By <strong>the</strong> law <strong>of</strong> large numbers we know that vboot a.s.<br />

−→VFn ˆ (Tn) as<br />

B → ∞. It follows<br />

VF (Tn)<br />

O(1/ √ n)<br />

����<br />

≈ VFn ˆ (Tn)<br />

O(1/ √ B)<br />

����<br />

≈ vboot<br />

For <strong>the</strong> parameter estimation, <strong>the</strong> number <strong>of</strong> <strong>the</strong> bootstrap samples B is<br />

usually chosen to be around 200. The algorithm <strong>for</strong> estimating <strong>the</strong> variance<br />

<strong>of</strong> some statistic Tn is as follows:<br />

� Given data: X = (X1, . . . , Xn)<br />

� Repeat <strong>the</strong> following two steps i = 1 . . . B times<br />

1. Draw X ∗ = (X ∗ 1 , . . . , X∗ n) with replacement from X<br />

2. Calculate T ∗ n,i = g(X∗ 1 , . . . X∗ n)<br />

� This results in B estimators (T ∗ n,1 , . . . , T ∗ n,B ) and can be used <strong>for</strong> various<br />

purposes (<strong>for</strong> variance estimation, <strong>for</strong> interval estimation, hypo<strong>the</strong>sis<br />

testing and so on).<br />

For example <strong>the</strong> variance estimator is computed by:<br />

vboot = 1<br />

B ·<br />

B�<br />

b=1<br />

�<br />

T ∗ 1<br />

n,b −<br />

B ·<br />

and <strong>the</strong> estimator <strong>for</strong> <strong>the</strong> standard error by:<br />

Jackknife vs. Bootstrap<br />

ˆseboot = √ vboot<br />

Since <strong>the</strong> jackknife only needs n computations it is usually easier computable<br />

compared to about 200-300 replications needed <strong>for</strong> <strong>the</strong> bootstrap. However,<br />

only using <strong>the</strong> n jackknife samples, <strong>the</strong> jackknife uses only limited in<strong>for</strong>mation<br />

about <strong>the</strong> statistic ˆ θ. It can be shown that asymptotically <strong>the</strong> estimators <strong>of</strong><br />

B�<br />

i=1<br />

T ∗ n,i<br />

� 2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!