06.01.2015 Views

CHAPTER 3: THE NORMAL DISTRIBUTION

CHAPTER 3: THE NORMAL DISTRIBUTION

CHAPTER 3: THE NORMAL DISTRIBUTION

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 3: The Normal Distribution<br />

1<br />

Chaptter 3::<br />

<strong>THE</strong> <strong>NORMAL</strong> DIISTRIIBUTIION<br />

Upon completion of this chapter, you should be able to:<br />

explain what is the normal distribution<br />

assess normality using graphical techniques - histogram<br />

assess normality using graphical techniques - box plots<br />

assess normality using graphical techniques - normality plots<br />

assess normality using statistical techniques<br />

<strong>CHAPTER</strong> OVERVIEW<br />

<br />

<br />

<br />

<br />

<br />

<br />

What is normal distribution<br />

Why is the normal distribution<br />

important<br />

What are the parameters of the<br />

normal distribution<br />

Assessing normality<br />

Assessing normality using<br />

graphical methods<br />

o Histogram<br />

o Boxplots<br />

o Normal probability plots<br />

Assessing normality using<br />

statistical method<br />

o Kolmogorov-Smirnoff<br />

o Sharpiro-Wilks<br />

Chapter 1: Introduction<br />

Chapter 2: Descriptive Statistics<br />

Chapter 3: The Normal Distribution<br />

Chapter 4: Hypothesis Testing<br />

Chapter 5: T-test<br />

Chapter 6: Oneway Analysis of Variance<br />

Chapter 7: Correlation<br />

Chapter 8: Chi-Square<br />

This chapter discusses the issue of normal distribution. Oftentimes researchers tend to<br />

ignore the importance of the normal distribution. The normal distribution is important<br />

several of the statistical you will be using assume that your sample is normally<br />

distributed. Whether a distribution is normal can be determined by both graphical and<br />

statistical method. The graphical methods are the histogram, the boxplot and the normal<br />

probability plot. The statistical methods are Kolmogorov-Smirnoff and the Shapiro-Wilks,


Chapter 3: The Normal Distribution<br />

2<br />

WHAT IS <strong>THE</strong> <strong>NORMAL</strong> <strong>DISTRIBUTION</strong><br />

Now that you know what is the mean and standard deviation of a set of scores,<br />

we can examine the concept of a normal distribution. The normal curve was<br />

developed mathematically in 1733 by DeMoivre as an appropximation to the binomial<br />

distribution. Laplace used the normal curve in 1783 to describe the distribution of<br />

errors. However, it was Gauss who popularised the normal curve when he used it to<br />

analyse astronomical data in 1809 and it became known as the Gaussian distribution.<br />

The term normal distribution refers to a particular way in which scores or<br />

observations will tend to pile up or distribute around a particular value rather than be<br />

scattered all over. The normal distribution which is bell-shaped is based on a<br />

mathematical equation (which we will not get into!).<br />

While some argue that in the real world, scores or observations are seldom<br />

normally distributed, others argue that in the general population, many variables such<br />

as height, weight, IQ scores, reading ability, job satisfaction, blood pressure turn out<br />

to have distributions that are bell-shaped or normal.<br />

WHY IS <strong>THE</strong> <strong>NORMAL</strong> <strong>DISTRIBUTION</strong> IMPORTANT<br />

The normal distribution is important for the following reasons:<br />

Many physical, biological and social phenomena or variables are normally<br />

distributed. However, some variables are only approximately normally<br />

distributed.<br />

Many kinds of statistical tests (such as the t-test, ANOVA) are derived from a<br />

normal distribution. In other words, most of these statistical tests argue that<br />

they work best when the sample tested is distributed normally.<br />

FORTUNATELY, these statistical tests work very well even if the distribution is<br />

only approximately normally distributed. Some tests work well even with very wide<br />

deviations from normality. They are described as 'robust' tests that are able to tolerate<br />

the lack of a normal distribution.<br />

WHAT ARE <strong>THE</strong> PARAMETERS OF <strong>THE</strong> <strong>NORMAL</strong> CURVE<br />

A normal distribution (or normal curve) is completely determined by the mean and<br />

standard deviation. i.e. two normally distributed variables having the same mean and<br />

standard deviation must have the same distribution. We often identify a normal curve<br />

by stating the corresponding mean and standard deviation and calling those the<br />

parameters of the normal curve.<br />

A normal distribution is symmetric and centred at the mean of the variable,<br />

and its spread depends on the standard deviation of the variable. The larger the<br />

standard deviation, the flatter and more spread out is the distribution.


Chapter 3: The Normal Distribution<br />

3<br />

A) ILLUSTRATION OF <strong>THE</strong> <strong>NORMAL</strong> <strong>DISTRIBUTION</strong> OR <strong>THE</strong><br />

<strong>NORMAL</strong> CURVE<br />

55 70 85 100 115 130 145<br />

Figure 3.1 Graph showing a normal distribution of IQ scores among<br />

adolescents<br />

See Figure 3.1 which is a graph showing a normal distribution of IQ scores among a<br />

sample of adolescents.<br />

Mean is 100<br />

Standard Deviation is 15.<br />

As you can see, the distribution is symmetric. If you folded the graph in the centre,<br />

the two sides would match, i.e. they are identical.<br />

B) MEAN. MODE AND MEDIAN and <strong>THE</strong> <strong>NORMAL</strong> CURVE<br />

The centre of the distribution is the mean. The mean of a normal distribution is<br />

also the most frequently occurring value (i.e. the mode), and it is also the value that<br />

divides the distribution of scores into two equal parts (i.e. the median). In any normal<br />

distribution, the mean, median and the mode all have the same value (i.e. 100 in the<br />

example above).


Chapter 3: The Normal Distribution<br />

4<br />

C) <strong>THE</strong> THREE-STANDARD-DEVIATIONS RULE<br />

The normal distribution shows the area under the curve. The Three-standarddeviations<br />

rule, when applied to a variable states that almost all the possible<br />

observations or scores of the variable lie within three standard deviations to either<br />

side of the mean. The normal curve is close to (but does not touch) the horizontal axis<br />

outside the range of the three standard deviations to either side of the mean. Based on<br />

the graph above, you will notice that with a mean of 100 and a standard deviation of<br />

15;<br />

68% of all IQ scores fall between 85 (i.e. one standard deviation less then the<br />

mean which is 100 - 15 = 85) and 115. (i.e. one standard deviation more than<br />

the mean which is 100 + 15 = 115).<br />

95% of all IQ scores fall between 70 (i.e. two standard deviations less then the<br />

mean which is 100 - 30 = 70) and 130. (i.e. two standard deviations more than<br />

the mean which is 100 + 30 = 130.<br />

99% of all IQ scores fall between 55 (i.e. three standard deviations less then<br />

the mean which is 100 - 45 = 55) and 145. (i.e. three standard deviations more<br />

than the mean which is 100 + 45 = 145.<br />

A normal distribution can have any mean and standard deviation. But the percentage<br />

of cases or individuals falling within one, two or three standard deviations from the<br />

mean is always the same. The shape of a normal distribution does not change. Means<br />

and standard deviations will differ from variable to variable. But the percentage of<br />

cases or individuals falling within specific intervals is always the same in a true<br />

normal distribution.<br />

LEARNING ACTIVITY<br />

1. Precisely what is meant by the statement that a<br />

population is normally distributed<br />

2. Two normally distributed variables have the same<br />

means and the same standard deviations. What can<br />

you say about their distributions Explain your<br />

answer.<br />

3. Which normal distribution has a wider spread: the<br />

one with mean 1 and standard deviation 2 or the<br />

one with mean 2 and standard deviation 1 Explain<br />

you answer.<br />

4. The mean of a normal distribution has no effect on<br />

its shape. Explain.<br />

5. What are the parameters for a normal curve


Chapter 3: The Normal Distribution<br />

5<br />

D) INFERENTIAL STATISTICS AND <strong>NORMAL</strong>ITY<br />

Often in statistics one would like to assume that the sample under<br />

investigation has a normal distribution or an approximate normal distribution.<br />

However, such an assumption should be supported in some way some technique. As<br />

mentioned earlier, the use of several inferential statistics such as the t-test and<br />

ANOVA require that the distribution of the variables analysed are normally<br />

distributed or at least approximately normally distributed. However, as discussed in<br />

Chapter 1, if a simple random sample is taken from a population, the distribution of<br />

the observed values of a variable in the sample will approximate the distribution of<br />

the population. Generally, the larger the sample, the better the approximation tends to<br />

be. In other words, if the population is normally distributed, the sample of observed<br />

values would also be normally distributed if the sample is randomly selected and it is<br />

large enough.<br />

ASSESSING <strong>NORMAL</strong>ITY<br />

Assessing normality means determining whether the sample of students,<br />

teachers, parents or principals you are studying are normally distributed. When you<br />

draw a sample from a population that is normally distributed, it does not mean that<br />

your sample will necessarily have a distribution that is exactly normal. Samples vary,<br />

so the distribution of each sample may also vary. However, if a sample is reasonably<br />

large and it comes froma normal population, its distribution should look more or less<br />

normal.<br />

For example, when you administer a questionnaire to a group of school<br />

principals, you want to be sure that your sample of 250 principals is normally<br />

distributed. WHY The assumption of normality is a prerequisite for many inferential<br />

statistical techniques and there are two main ways of determining the normality of<br />

distribution.<br />

a) Using graphical methods (such as histograms, stem-and-lead plots and<br />

boxplots)<br />

b) Using statistical procedures.(such as the Kolmogorov-Smirnov<br />

statistic and the Shapiro-Wilks statistics)


Chapter 3: The Normal Distribution<br />

6<br />

SPSS Procedures for Assessing Normality:<br />

There are several procedures to obtain the different graphs and statistics to assess<br />

normality, But the EXPLORE procedure is the most convenient when both graphs<br />

and statistics are required.<br />

Select the Analyse menu<br />

Click on Descriptive Statistics and then Explore ....to open the Explore dialogue<br />

box<br />

Select the variable you require and click on the button to move this variable into<br />

the Dependent List: box<br />

Click on the Plots...command pushbutton to obtain the Explore: Plots sub dialogue<br />

box<br />

Click on the Histogram check box and the Normality plots with tests check box,<br />

and ensure that the Factor levels together radio button is selected in the Boxplots<br />

display<br />

Click on Continue<br />

In the Display box, ensure that Both is activated<br />

Click on the Options...command pushbutton to open the Explore: Options subdialogue<br />

box<br />

In the Missing Values box, click on the Exclude cases pairwise (if not selected by<br />

default)<br />

Click on Continue and then OK.<br />

ASSESSING <strong>NORMAL</strong>ITY USING GRAPHICAL METHOD<br />

A) The HISTOGRAM<br />

<br />

<br />

<br />

See Figure the Graph which is a histogram showing the distribution of<br />

scores obtained on a Scientific Literacy Test administered to a sample of<br />

students.<br />

The values on the vertical axis indicate the frequency or number of cases.<br />

The values on the horizontal axis are midpoints of value ranges. For<br />

example, the first bar is 20 and the second bar is 30, indicating that each<br />

bar covers a range of 10.<br />

Superimposed on the histogram is the normal curve. Simply looking at the<br />

bars indicates that the distribution has the rough shape of a normal<br />

distribution. The superimposed curve, however shows that there are some<br />

deviations. The question is whether this deviation is small enough to say<br />

that the distribution is approximately normal.


Chapter 3: The Normal Distribution<br />

7<br />

50<br />

Frequency<br />

60<br />

50<br />

40<br />

30<br />

20 30 40 50 60 70 80 90 100<br />

Figure 3.2 Graph showing the distribution of scores on Scientific Literacy among<br />

a group of students<br />

a) Some Key Characteristics of Distribution Using the Histogram<br />

(i) Skewness<br />

Skewness is the degree of departure from symmetry of a distribution. A normal<br />

distribution is symmetrical. A non-symmetrical distribution is described as being<br />

either negatively or positively skewed. A distribution is skewed if one of its tail is<br />

longer than the other or the tail pulled to either the left or right.<br />

Skewness 1.5<br />

Figure 3.3 Graph<br />

showing a positive skew


Chapter 3: The Normal Distribution<br />

8<br />

Refer to Figure 3.3 which shows the distribution of the scores obtained by students on<br />

a test. There is a positive skew because it has a longer tail in the positive direction or<br />

the long tail is on the right side (towards the high values on the horizontal axis).<br />

What does it mean It means that more students were getting low scores in the<br />

test which indicates that the test was too difficult. Alternatively, it could mean that the<br />

questions were not clear or the teaching methods and materials did not bring about the<br />

desired learning outcomes.<br />

Skewness - 1.5<br />

Figure 3.4 Graph<br />

showing a negative skew<br />

Refer to Figure 3.4 which shows the distribution of the scores obtained by students on<br />

a test. There is a negative skew because it has a longer tail in the negative direction<br />

or to the left (towards the lower values on the horizontal axis).<br />

What does it mean It means that more students were getting high scores on<br />

the test which may indicate that either the test was too easy or the teaching methods<br />

and materials were successful in bringing about the desired learning outcomes.<br />

(ii) Interpreting the Statistics for Skewness<br />

Besides graphical methods, you can also determine skewness by examining<br />

the statistics reported. A normal distribution has a skewness of 0. See the Table 3.1<br />

which reports the skewness statistics for three independent groups. A positive value<br />

indicated a positive skew while a negative value reflects a negative skew.<br />

Among the three groups, Group 3 is not as normally distributed compared to<br />

the other two groups. Its skewness value of -1.200 which is greater than 1<br />

which indicates that the distribution is non-symmetrical [Rule of thumb - > 1<br />

indicates a non-symmetrical distribution].<br />

The distribution of Group 2 with a skewness value of .235 is closer to being<br />

normal (i.e. 0) followed by Group 1 with a skewness value of .973.


Chapter 3: The Normal Distribution<br />

9<br />

SPSS Output:<br />

GROUP 1 Skewness .973<br />

GROUP 2 Skewness +.235<br />

GROUP 3 Skewness - 1.200<br />

Table 3.1 Skewness reported for three groups of students<br />

(iii) Kurtosis:<br />

Kurtosis indicates the degree of "flatness" or "peakedness" in a distribution relative to<br />

the shape of normal distribution.<br />

High kurtosis<br />

Low kurtosis<br />

Figure 3.5 Graphs showing high and low kurtosis<br />

Refer to Figure 3.5 which shows:<br />

Low Kurtosis: Data with low kurtosis tend to have a flat top near the mean rather<br />

than a sharp peak.<br />

High Kurtosis: Data with high kurtosis tend to have a distinct peak near the mean<br />

and a sharp decline rather rapidly with a heavy tail.


Chapter 3: The Normal Distribution<br />

10<br />

Figure 3.6 Names assigned to different types of kurtosis<br />

See Figure 3.6 which show the names assigned to different levels of kurtosis:<br />

A normal distribution has a kurtosis of 0 and is called mesokurtic (Graph A).<br />

[Strictly speaking a mesokurtic distribution has a value of 3 but in line with<br />

the practice used in SPSS packages, the adjusted version is 0].<br />

If a distribution is peaked (tall and skinny), its kurtosis value is greater than 0<br />

and it is said to be leptokurtic (Graph B) and has a positive kurtosis.<br />

<br />

If, on the other hand, the kurtosis is flat, its value is less than 0, or platykurtic<br />

(Graph C) and has a negative kurtosis.<br />

(iv) Interpreting the Statistics for Kurtosis<br />

Besides graphical methods, you can also determine skewness by examining the<br />

statistics reported. A normal distribution has a kurtosis of 0. See the Table 3.2 which<br />

reports the kurtosis statistics for three independent groups.


Chapter 3: The Normal Distribution<br />

11<br />

SPSS Output:<br />

GROUP 1 Kurtosis .500<br />

GROUP 2 Kurtosis -1.58<br />

GROUP 3 Kurtosis 1.65<br />

Table 3.2 Kurtosis reported for three groups of students<br />

<br />

<br />

<br />

Group 1 which a kurtosis value of 0.500 (positive value) is more normally<br />

distributed than the other two groups because it is closer to 0.<br />

Group 2 with a kurtosis value of -1.58 has a distribution that is more flattened<br />

and not as normally distributed compare to Group 1.<br />

Group 3 with a kurtosis value + 1.65 has a distribution that is more peaked and<br />

not as normally distributed compared to Group 1.<br />

B) The BOXPLOT<br />

The boxplot also provides information about the distribution of scores. Unlike<br />

the histogram which plots actual values, the boxplot summarises the distribution using<br />

the median, the 25th and 75th percentiles, and extreme scores in the distribution. See<br />

Figure 3.7 which shows a boxplot for the same set of data on scientific literacy<br />

discussed earlier. Note that the lower boundary of the box is the 25th percentile and<br />

the upper boundary is the 75th percentile.<br />

The BOX<br />

The box has hinges that form the outer boundaries of the box. The hinges are<br />

the scores that cut of the top and bottom 25% of the data. Thus, 50% of the scores fall<br />

within the hinges. The thick horizontal line through the box represents the median in<br />

the case of a normal distribution the line runs through the centre of the box.<br />

If the median is closer to the top of the box, then the distribution is negatively<br />

skewed. If it is closer to the bottom of the box, then it is positively skewed.<br />

Whiskers<br />

The smallest and largest observed values within the distribution are represented by the<br />

horizontal lines at either end of the box, commonly referred as whiskers.<br />

The two whiskers indicate the spread of the scores.


Chapter 3: The Normal Distribution<br />

12<br />

Hinges<br />

The largest observed<br />

value within the<br />

distribution is<br />

represented by the<br />

horizontal line at the end<br />

of the box, referred to as<br />

whisker.<br />

75 percentile<br />

‘wisker’<br />

MEDIAN<br />

25 percentile<br />

‘wisker’<br />

The median is presented<br />

by a horizontal line<br />

through the centre of the<br />

box<br />

Hinges<br />

The smallest observed value<br />

within the distribution is<br />

represented by the horizontal<br />

line at the end of the box,<br />

referred to as wisker.<br />

Figure 3.7 Boxplot showing the distribution of scores on Scientific Literacy<br />

among a group of students<br />

Scores that fall outside the upper and lower whiskers are classed as extreme<br />

scores or outliers. If the distribution has any extreme scores, i.e. 3 or more box lengths<br />

from the upper or lower hinge; these will be represented by a circle (o).<br />

Outliers tell us that we should see why it is so extreme. Could it be that you<br />

may have made an error in data entry.<br />

Why is it important to identify outliers This is because many of the<br />

statistical techniques used involve calculation of means. The mean is sensitive to<br />

extreme scores and it is important to be aware whether you data contain such extreme<br />

scores if you are to draw conclusions from the statistical analysis conducted.


Chapter 3: The Normal Distribution<br />

13<br />

C) The <strong>NORMAL</strong> PROBABILITY PLOT<br />

Besides the histogram and the box plot, another frequently used graphical<br />

technique of determining normality is the "Normal Probability Plot" or "Normal<br />

Q-Q Plot". The idea behind a normal probability plot is simple. It compares the<br />

observed values of the variable to the observations expected for a normally distributed<br />

variable. More precisely, a normal probability plot is a plot of the observed values of<br />

the variable versus the normal scores (the observations expected for a variable having<br />

the standard normal distribution).<br />

In a normal probability plot, each observed or value (score) obtained is paired<br />

with its theoretical normal distribution forming a linear pattern. If the sample is from<br />

a normal distribution, then the observed values or scores fall more or less in a straight<br />

line. The normal probability plot is formed by:<br />

Vertical axis: Expected normal values<br />

Horizontal axis: Observed values<br />

SPSS Procedures<br />

1. Select the Analyze menu.<br />

2. Click on Descriptive Statistics and then Explore .....to pen the Explore<br />

dialogue box<br />

3. Select the variable you require (i.e. mathematics score) and click on button to<br />

move this variable to the Dependent List: box<br />

4. Click on the Plots....command pushbutton to obtain the Explore: Plots<br />

subdialogue box<br />

5. Click on the Histogram check box and the Normality plots with tests check<br />

box and ensure that the Factor levels together radio button is selected in the<br />

Boxplots display<br />

6. Click on Continue<br />

7. In the Display box, ensure that Both is activated<br />

8. Click on the Options....command pushbutton to open the Explore: Options<br />

sub-dialogue box.<br />

9. In the Missing Values box, click on the Exclude cases pairwise radio button.<br />

If this option is not selected then, by default, any variable with missing data will<br />

be excluded from the analysis. That is, plots and statistics will generated only for<br />

cases with complete data.<br />

10.Click on Continue and then OK<br />

Note that these commands will give you the 'Histogram', 'Stem-and-leaf plots',<br />

'Boxplots' and Normality Plots.


Chapter 3: The Normal Distribution<br />

14<br />

Outlier<br />

Outlier<br />

Figure 3.8 Normal Probability Plot showing the distribution of scores on<br />

Scientific Literacy among a group of students<br />

Figure 3.8 Normal Probability Plot showing the distribution of scores on<br />

Scientific Literacy among a group of students<br />

When you use a normal probability plot to assess the normality of a variable, you<br />

must remember that the decision of whether the distribution is roughly linear and is<br />

normal is a subjective one. Figure 3.8 is an example of a normal probability plot.<br />

Though none of the value fall exactly on the line, most of the points are very close to<br />

the line.<br />

Value that are above the line represent units for which the observation is larger<br />

than its normal score.<br />

Value that are below the line represent units for which the observation is<br />

smaller than its normal score.


Chapter 3: The Normal Distribution<br />

15<br />

Note that there is one value that falls well outside the overall pattern of the plot. It is<br />

called an outlier and you will have to remove the outlier from the sample data and<br />

redraw the normal probability plot.<br />

Even with the outlier, the values are close to the line and you can conclude<br />

that the distribution will look like a bell-shaped curve. If the normal scores plot<br />

departs only slightly from having all of its dots on the line, then the distribution of the<br />

data departs only slightly from a bell-shaped curve. If one or more of the dots departs<br />

substantially from the line, then the distribution of the data is substantially different<br />

from a bell-shape.<br />

Outliers:<br />

Refer to the normal probability plot above. Note that there are possible outliers<br />

which are values lying off the hypothetical straight line.<br />

Outliers are anomalous values in the data which may be due to recording<br />

errors, which may be correctable, or they may be due to the sample not being entirely<br />

from the same population.


Chapter 3: The Normal Distribution<br />

16<br />

Skewness to the left:<br />

Refer to the normal probability plot above. Both ends of the normality plot fall below<br />

the straight line passing through the main body of the values of the probability plot,<br />

then the population distribution from which the data were sampled may be skewed to<br />

the left.<br />

Skewness to the right:<br />

If both ends of the normality plot bend above the straight line passing through the<br />

values of the probability plot, then the population distribution from which the data<br />

were sampled may be skewed to the right.


Chapter 3: The Normal Distribution<br />

17<br />

LEARNING ACTIVITY<br />

Refer to the output of a Normal Probability Plot above for<br />

the distribution of mathematics scores by eight students:<br />

a) Comment on the distribution of scores<br />

b) Would you consider the distribution normal<br />

c) Are there outliers<br />

ASSESSING <strong>NORMAL</strong>ITY USING STATISTICAL TECHNIQUES<br />

The graphical methods discussed present qualitative information about the<br />

distribution of data that may not be apparent from statistical tests. Histograms, box


Chapter 3: The Normal Distribution<br />

18<br />

plots and normal probability plots are graphical methods are useful for determining<br />

whether data follow a normal curve. Extreme deviations from normality are often<br />

readily identified from graphical methods. However, in many instances the decision is<br />

not straightforward. Using graphical methods to decide whether a data set is normally<br />

distributed involves making a subjective decision; formal test procedures are usually<br />

necessary to test the assumption of normality.<br />

In general, both statistical tests and graphical plots should be used to<br />

determine normality. However, the assumption of normality should not be rejected on<br />

the basis of a statistical test alone. In particular, when the sample is large, available,<br />

statistical tests for normality can be sensitive to very small (i.e., negligible) deviations<br />

in normality. Therefore, if the sample is very large, a statistical test may reject the<br />

assumption of normality when the data set, as shown using graphical methods, is<br />

essentially normal and the deviation from normality too small to be of practical<br />

significance.<br />

a) KOLMOGOROV-SMIRNOV TEST<br />

You could use the Kolmogorov-Smirnov statistic Z test evaluates statistically<br />

whether the difference between the observed distribution and a theoretical normal<br />

distribution is small enough to be just due to chance. If it could be due to chance you<br />

would treat the distribution as being normal. If the distribution between the actual<br />

distribution and the theoretical normal distribution is larger than is likely to be due to<br />

chance (sampling error) then you would treat the actual distribution as not being<br />

normal.<br />

In terms of hypothesis testing, the Kolmogorov-Smirnov test is based on Ho: that the<br />

data are normally distributed. The test is used for samples which have more than 50<br />

subjects.<br />

Ho: µ1 = µ2 OR Ha: µ1 ≠ µ2<br />

<br />

<br />

If the Kolmogorov-Smirnov Z test yields a significance level of less () than<br />

0.05, it means that the distribution is normal.<br />

Kolmogrorov-Smirnov (a)<br />

Statistic df Sig.<br />

SCORE .21 1598 .000*<br />

* This is lower bound of the true signifcance<br />

(a) Lilliefors Significance Correction


Chapter 3: The Normal Distribution<br />

19<br />

B) SHAPIRO - WILKS TEST<br />

Another powerful and most commonly employed tests for normality is the W<br />

test by Shapiro and Wilks, also called the Shapiro-Wilks test. It is an effective method<br />

for testing whether a data set has been drawn from a normal distribution.<br />

If the normal probability plot is approximately linear (the data follow a normal<br />

curve), the test statistic will be relatively high.<br />

If the normal probability plot has curvature that is evidence of non-normality<br />

in the tails of a distribution, the test statistic will be relatively low.<br />

In terms of hypothesis testing, the Shapiro-Wilks test is based on the hypothesis (Ho:)<br />

that the data are normally distributed. The test is used for samples which have less<br />

than 50 subjects.<br />

Ho: µ1 = µ2 OR Ha: µ1 ≠ µ2<br />

<br />

<br />

Reject the assumption of normality if the test of significance reports a p-value<br />

of less () than 0.05.<br />

SPSS Output<br />

Tests of Normality<br />

Shapiro-Wilks<br />

Independent variable group Statistic df Sig.<br />

Group 1 .912 22 .055<br />

Group 2 . .166 14 .442<br />

Group 3 .900 16 .084<br />

Table 3.3 showing the Shapiro-Wilks statistic for assessing normality<br />

See Table 3.3. The Shapiro-Wilks normality tests indicate that the scores are normally<br />

distributed in each of the three groups. All the p-values reported are more than 0.05<br />

and hence you DO NOT REJECT the null hypothesis.


Chapter 3: The Normal Distribution<br />

20<br />

NOTE:<br />

It should be noted that with large samples even a very small deviation from normality<br />

can yield low significance levels, so a judgement still has to made as to whether the<br />

departure from normality is large enough to matter.<br />

WHAT TO DO IF <strong>THE</strong> <strong>DISTRIBUTION</strong> IS NOT <strong>NORMAL</strong><br />

You have TWO choices if the distribution is not normal and they are:<br />

Use a Nonparametric Statistic Instead<br />

Transform the Variable to Make to Normal<br />

a) Use a Nonparametric Statistic<br />

In many cases, if the distribution is not normal an alternative statistic will be<br />

available, especially for bivariate analyses such as correlation or comparisons of<br />

means. These alternatives which do not require normal distributions are called<br />

nonparametric or distribution-free statistics. Some of these alternatives are shown<br />

below:<br />

Purpose of Parametric Non-Parametric<br />

Analysis Statistics Statistics<br />

---------------------------------------------------------------------------------<br />

Differences between<br />

- Mann-Whitney U test<br />

two independent t-test - Kolmogorov-Smirnov<br />

two means<br />

sample Z test<br />

Differences between t-test - Wilcoxon's matched<br />

two dependent means<br />

pairs test<br />

Differences between more Oneway - Kurskal-Wallis analysis<br />

than two means ANOVA of ranks<br />

Differences between more Repeated - Friedman's two-way<br />

than two means that is measures analysis of variance<br />

repeated ANOVA - Cochran Q test<br />

Relationship between Pearson r - Spearman Rho<br />

variables<br />

- Kendall's tau<br />

- Chi-square


Chapter 3: The Normal Distribution<br />

21<br />

b) Transform the Variable to Make it Normal<br />

The shape of a distribution can be changed by expressing it in different way<br />

statistically. This is referred to as transforming the distribution, Different types of<br />

transformations can be applied to "normalise" the distribution. The type of<br />

transformation selected depends on the manner to which the distribution departs from<br />

normality. [We will not discuss transformation in this course]<br />

Kolmogorov-Smirnov (a)<br />

Statistic df Sig.<br />

SCORE 0.57 999 .200*<br />

* This is lower bound of the true significance<br />

(a) Lilliefors Significance Correction<br />

LEARNING ACTIVITY<br />

Examine the SPSS output above and determine if the sample<br />

is normally distributed.<br />

-----0000------

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!