01.03.2013 Views

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

420 Appendix A - Short Survey on Probability Theory<br />

Therefore, in order to obtain a certainty 1 – α (confidence level) that a relative<br />

frequency deviates from the probability of an event less than ε (tolerance or error),<br />

one would need a sequence of n trials, with:<br />

ε α<br />

2<br />

pq<br />

n ≥ . A. 36<br />

⎛ k ⎞<br />

Note that lim P ⎜ − p ≥ ε ⎟ = 0 .<br />

n→∞<br />

⎝ n ⎠<br />

A stronger result is provided by the Strong Law of Large Numbers, which states<br />

the convergence of k/n to p with probability one.<br />

These results clarify the assumption made in section A.1 of the convergence of<br />

the relative frequency of an event to its probability, in a long sequence of trials.<br />

Example A. 17<br />

Q: What is the tolerance of the percentage, p, of favourable votes on a certain<br />

market product, based on a sample enquiry of 2500 persons, with a confidence<br />

level of at least 95%?<br />

A: As we do not know the exact value of p, we assume the worst-case situation for<br />

A.36, occurring at p = q = ½. We then have:<br />

pq<br />

ε = = 0.045.<br />

nα<br />

A.7.3 The Normal Distribution<br />

For increasing values of n <strong>and</strong> with fixed p, the probability function of the<br />

binomial distribution becomes flatter <strong>and</strong> the position of its maximum also grows<br />

(see Figure A.5). Consider the following r<strong>and</strong>om variable, which is obtained from<br />

the r<strong>and</strong>om variable with a binomial distribution by subtracting its mean <strong>and</strong><br />

dividing by its st<strong>and</strong>ard deviation (the so-called st<strong>and</strong>ardised r<strong>and</strong>om variable or<br />

z-score):<br />

X np<br />

Z = . A. 37<br />

npq<br />

n −<br />

It can be proved that for large n <strong>and</strong> not too small p <strong>and</strong> q (say, with np <strong>and</strong> nq<br />

greater than 5), the st<strong>and</strong>ardised discrete variable is well approximated by a<br />

continuous r<strong>and</strong>om variable having density function f(z), with the following<br />

asymptotic result:<br />

P(<br />

Z)<br />

→<br />

n→∞<br />

f ( z)<br />

=<br />

1<br />

e<br />

2π<br />

−z<br />

2<br />

/ 2<br />

. A. 38<br />

This result, known as De Moivre’s Theorem, can be proved using the above<br />

Stirling formula A.34. The density function f(z) is called the st<strong>and</strong>ard normal (or

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!