An Introduction to Statistical Methods in GenStat - VSN International
An Introduction to Statistical Methods in GenStat - VSN International
An Introduction to Statistical Methods in GenStat - VSN International
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>An</strong> <strong>Introduction</strong> <strong>to</strong><br />
<strong>Statistical</strong> <strong>Methods</strong><br />
<strong>in</strong> <strong>GenStat</strong><br />
Alex Glaser and Carey Biggs<br />
<strong>VSN</strong> <strong>International</strong>, 5 The Waterhouse,<br />
Waterhouse Street, Hemel Hempstead, UK<br />
email: alex@vsni.co.uk carey@vsni.co.uk<br />
Many thanks <strong>to</strong> Roger Payne for the orig<strong>in</strong>al slides<br />
South Africa, November 2010
Programme<br />
• Day 1<br />
• <strong>Introduction</strong> <strong>to</strong> <strong>GenStat</strong><br />
• From t-test <strong>to</strong> one-way anova<br />
• Basic pr<strong>in</strong>ciples of design and block<strong>in</strong>g<br />
• Treatment structure − fac<strong>to</strong>rials & <strong>in</strong>teractions<br />
and check<strong>in</strong>g the assumptions<br />
• Day 2<br />
• Simple l<strong>in</strong>ear regression<br />
• Multiple l<strong>in</strong>ear regression<br />
• GLM – counts and b<strong>in</strong>omial data<br />
• GLM – further models and extensions
Exercise 1.1<br />
• What happens when you select <strong>in</strong>put log <strong>in</strong><br />
the w<strong>in</strong>dow naviga<strong>to</strong>r?<br />
• Can you see yourself us<strong>in</strong>g this feature <strong>in</strong> you<br />
work? If so, how?<br />
• What happens <strong>to</strong> status bar when you click the<br />
but<strong>to</strong>n?<br />
• Resize the <strong>in</strong>put log and output w<strong>in</strong>dow so<br />
that you can see both simultaneously<br />
• What happens when you click the but<strong>to</strong>n?<br />
• Use the <strong>to</strong>ols|cus<strong>to</strong>mize <strong>to</strong>olbar menu <strong>to</strong><br />
add or remove but<strong>to</strong>ns from the <strong>to</strong>olbar <strong>to</strong> suit<br />
your needs.
<strong>GenStat</strong> Client<br />
Menus Commands<br />
<strong>GenStat</strong> Server
Exercise 1.2<br />
• What happens <strong>to</strong> the text <strong>in</strong> right hand corner<br />
of the status bar if you press the <strong>in</strong>sert key?<br />
• What do you th<strong>in</strong>k this part of the status bar<br />
means?<br />
• Open a new text w<strong>in</strong>dow us<strong>in</strong>g the but<strong>to</strong>n.<br />
In this w<strong>in</strong>dow, type the follow<strong>in</strong>g <strong>GenStat</strong><br />
command<br />
PRINT ‘This is my first time us<strong>in</strong>g <strong>GenStat</strong>’<br />
• Execute the command us<strong>in</strong>g the Run|Submit<br />
L<strong>in</strong>e menu option. Now select the W<strong>in</strong>dow|<br />
Event Log entry for this action. Is there an<br />
Event log for this action?
Exercise 2<br />
• F<strong>in</strong>d help for what’s new <strong>in</strong> the 13 th edition of<br />
<strong>GenStat</strong><br />
• F<strong>in</strong>d help on the <strong>GenStat</strong> spreadsheet<br />
• Open the Tools|Options menu and f<strong>in</strong>d help<br />
about the ECHO COMMANDS sett<strong>in</strong>g on the<br />
AUDIT TRAIL tab.<br />
• Open a new test w<strong>in</strong>dow and type <strong>in</strong> the word<br />
FIT. Place the cursor <strong>in</strong> the word and press the<br />
F1 key. What is FIT? Type <strong>in</strong> a statistical term<br />
and press the F1 key.<br />
• View the <strong>Introduction</strong> <strong>to</strong> <strong>GenStat</strong> guide (pdf<br />
format)<br />
• View an example program for a two-sample ttest.
Exercise 3.1<br />
• Clear all the data from <strong>GenStat</strong> and use the<br />
file|open menu <strong>to</strong> read the data from the<br />
file sulphur.xls from <strong>in</strong>stallsets\Data<br />
• Clear all the data from <strong>GenStat</strong>. Go <strong>to</strong> the<br />
<strong>to</strong>ols|spreadsheet options|file menu and<br />
uncheck the use excel import wizard on<br />
file open option. Repeat part 1 us<strong>in</strong>g the<br />
file|open menu. Which approach best suits<br />
your way of work<strong>in</strong>g?<br />
• The file bacteria.xls, that you met earlier,<br />
conta<strong>in</strong>s data from a second experiment <strong>in</strong><br />
the worksheet called Bacteria Counts. The<br />
data are not s<strong>to</strong>red <strong>in</strong> standard format; the<br />
data can be found <strong>in</strong> the range of cells<br />
D3:E13. Clear the data core. Read the<br />
data <strong>in</strong><strong>to</strong> <strong>GenStat</strong> us<strong>in</strong>g the Excel import<br />
wizard but<strong>to</strong>n.
Exercise 3.2 & 3.3<br />
• Us<strong>in</strong>g the data <strong>in</strong> the iris.gsh file:<br />
• Produce a scatter plot of Sepal Width versus Petal<br />
Width. There is one po<strong>in</strong>t <strong>in</strong> this plot that stands<br />
alone. What are the coord<strong>in</strong>ates of this po<strong>in</strong>t? Can<br />
you suggest a method of easily identify<strong>in</strong>g <strong>to</strong> which<br />
species of iris this unusual po<strong>in</strong>t belongs?<br />
• Produce a scatter plot of Sepal Length versus Petal<br />
Length. Give each fac<strong>to</strong>r a different symbol and<br />
colours. Experiment with labell<strong>in</strong>g.<br />
• Produce a his<strong>to</strong>gram of Petal lengths versus Petal<br />
widths.<br />
• Us<strong>in</strong>g your own data, experiment with the<br />
different aspects of the graphics w<strong>in</strong>dow.<br />
That is, explore the different menus and<br />
<strong>to</strong>olbars. If you have not brought your on data<br />
sets, experiment with any of the course data<br />
files.
Exercise 4.1<br />
• Us<strong>in</strong>g the Excel Import Wizard,<br />
load <strong>in</strong> the file Traffic.xls<br />
• On the second screen enter B3:D43 <strong>in</strong><br />
the Specified Range box.<br />
• Click OK on the Select Columns <strong>to</strong><br />
Convert <strong>to</strong> Fac<strong>to</strong>rs menu<br />
• Convert Day and Month <strong>to</strong> fac<strong>to</strong>rs<br />
us<strong>in</strong>g the methods of your choice.
Exercise 4.2<br />
• Cont<strong>in</strong>ue us<strong>in</strong>g the file Traffic.xls<br />
• Select a cell <strong>in</strong> the Day column.<br />
Delete the value, type ‘F’ and then<br />
press return. Repeat the process but<br />
with the value ‘G’. What property of<br />
the <strong>GenStat</strong> spreadsheet do you th<strong>in</strong>k<br />
this illustrates.<br />
• Select the Tools|Spreadsheet<br />
Options |Conversions menu. Check<br />
the Allow new fac<strong>to</strong>r levels <strong>in</strong> Edit<br />
box. Now repeat the above question.<br />
What happens now?
Exercise 4.3<br />
• Cont<strong>in</strong>ue us<strong>in</strong>g the file Traffic.xls<br />
• Create a new variate which conta<strong>in</strong>s<br />
the log of the Counts.<br />
• Sort the columns <strong>in</strong> descend<strong>in</strong>g order<br />
of the Counts.<br />
• Use the Spread| Manipulate|<br />
Unstack <strong>to</strong> create separate variables<br />
for each day of the week.<br />
• Experiment with the Calculate menu<br />
with your own data.
1 From t-test <strong>to</strong> one-way anova<br />
• In this session you will learn<br />
• how <strong>to</strong> use the t-test <strong>to</strong> compare two treatments<br />
• the T-Test menu<br />
• how <strong>to</strong> use one-way ANOVA <strong>to</strong> compare several treatments<br />
• the model fitted <strong>in</strong> one-way anova<br />
• the statistical philosophy beh<strong>in</strong>d one-way anova<br />
• the relationship between one-way anova and the t-test for<br />
two treatments<br />
• how <strong>to</strong> use the One- and two-way ANOVA menu for oneway<br />
anova<br />
• how <strong>to</strong> plot the means from one-way anova<br />
• how <strong>to</strong> do multiple comparisons ★<br />
• Note: <strong>to</strong>pics marked ★ are optional
t-test<br />
• suppose we have 2 sets of units, that have received 2<br />
different treatments:<br />
• animals that have been fed two different diets<br />
• plots that have been given different fertilisers<br />
• subjects with different drugs<br />
• plants with different fungicides .<br />
• assume the units do not have any special structure e.g.<br />
• the animals are all of the same breed<br />
• the plots are <strong>in</strong> a fairly uniform field<br />
• the subjects are of similar ages, weights and heights<br />
• with 2 treatments we may then do a t-test<br />
• assume each group from a Normal distribution<br />
• usually assume distributions have the same s.e. (can<br />
check)<br />
• but may have different means
Data sets<br />
• filter by the course<br />
Guide <strong>to</strong> <strong>An</strong>ova<br />
and Design<br />
• select the file<br />
• click on Open data<br />
• data sets for the examples<br />
and practicals can be<br />
accessed us<strong>in</strong>g the<br />
Example Data Sets menu
t-test<br />
• experiment <strong>to</strong> study yields from 2<br />
manufactur<strong>in</strong>g methods<br />
• data <strong>in</strong> Manufacture.gsh<br />
• do yields differ more than we<br />
would expect from the random<br />
variation?<br />
• can we estimate mean yields from<br />
each method?
t-test menu<br />
• Use <strong>GenStat</strong> menus for simplicity
Output
Practical 1.2<br />
• spreadsheet Pots.gsh s<strong>to</strong>res<br />
data from a fertilizer<br />
experiment<br />
• 7 plants grown <strong>in</strong> pots with no<br />
fertilizer<br />
• 8 plants grown <strong>in</strong> similar<br />
conditions with fertilizer<br />
• do a two-sample t-test <strong>to</strong><br />
assess whether fertilizer has<br />
an effect
One-way analysis of variance<br />
• l<strong>in</strong>ear model y ij = μ + a i + ε ij<br />
• represent each mean by<br />
• grand mean μ<br />
• + effect a i<br />
• observations described by<br />
• fitted value μ + a i<br />
• + residual ε ij
Residual variation<br />
• may arise from many different causes:<br />
• the units may not be absolutely identical (discuss<br />
later how <strong>to</strong> allocate units <strong>to</strong> treatments <strong>to</strong> take<br />
account of this)<br />
• they may experience slightly different conditions<br />
dur<strong>in</strong>g the experiment<br />
• there may be measurement errors<br />
• they may be be<strong>in</strong>g dealt with by different people<br />
dur<strong>in</strong>g the experiment<br />
• and you can no doubt th<strong>in</strong>k of others!<br />
• so estimation is not exact<br />
• analysis must estimate the amount of variation<br />
• and take account of it <strong>in</strong> draw<strong>in</strong>g conclusions
One-way anova<br />
• l<strong>in</strong>ear model y ij = μ + a i + ε ij<br />
• if treatments have no effect<br />
• a 1 = a 2 = 0<br />
• y ij = μ + ε ij<br />
• estimate grand mean by average of all data values<br />
• assess lack of fit of model by sum of squared residuals (RSS 0 )<br />
• degrees of freedom (d.f.) is n 1 +n 2 −1 (fitted 1 parameter μ)<br />
• fit full model<br />
• estimate a i by average for group i m<strong>in</strong>us grand mean<br />
• assess lack of fit of model by sum of squared residuals (RSS 1 )<br />
• this has n 1 +n 2 −2 d.f. (2 parameters as (n 1 a 1 +n 2 a 2 )/(n 1 +n 2 )=0)<br />
• assess treatments<br />
• sum of squares due <strong>to</strong> treatments is TSS=RSS 0 −RSS 1 on 1 d.f.<br />
• assess underly<strong>in</strong>g variation by residual from full model RSS 1<br />
• variance ratio is treatment mean square / residual mean square<br />
• VR = {TSS / 1} / {RSS 1 / (n 1 +n 2 −2)} on 1 and (n 1 +n 2 −2) d.f.
One and two-way ANOVA<br />
menu
Output<br />
� aov table<br />
� tables of means<br />
� s.e.'s for<br />
differences<br />
between means<br />
(m1 – m2)/sed = t
ANOVA Options menu<br />
• Options menu controls the output
ANOVA Further Output menu<br />
• Further Output menu provides more output<br />
(without redo<strong>in</strong>g the analysis)
ANOVA Means Plots menu<br />
• Means Plots menu plots means<br />
• as po<strong>in</strong>ts<br />
• or jo<strong>in</strong>ed by l<strong>in</strong>es<br />
• or with orig<strong>in</strong>al data po<strong>in</strong>ts <strong>to</strong>o<br />
• or <strong>in</strong> a bar chart
Practical 1.4<br />
• spreadsheet Pots.gsh s<strong>to</strong>res<br />
data from a fertilizer<br />
experiment used <strong>in</strong> Practical<br />
1.2<br />
• 7 plants grown <strong>in</strong> pots with no<br />
fertilizer<br />
• 8 plants grown <strong>in</strong> similar<br />
conditions with fertilizer<br />
• do a one-way analysis of<br />
variance <strong>to</strong> assess if fertilizer<br />
has an effect<br />
• compare results with t-test<br />
from Practical 1.2
One-way anova with >2<br />
treatments<br />
• spreadsheet Rat.gsh has data<br />
from an experiment <strong>to</strong> study<br />
effect of dietary supplements<br />
on ga<strong>in</strong> <strong>in</strong> weight of rats<br />
• 5 diet treatments (a-e)<br />
• 20 rats allocated at random, 4<br />
per treatment<br />
• can use One-and two-way<br />
ANOVA menu, and plot means,<br />
as before
Output<br />
� aov table<br />
� means<br />
� s.e.d
Plot of means<br />
• suppose a-e<br />
represent<br />
amounts 0-4<br />
of supplement<br />
• might want <strong>to</strong><br />
assess l<strong>in</strong>ear<br />
(& quadratic?)<br />
effects of<br />
supplement
Multiple comparison tests<br />
• <strong>in</strong> favour<br />
• there may be many possible comparisons between pairs of<br />
treatment means (with t treatments there are t×(t–1)/2)<br />
• so some researchers feel their significance levels should be<br />
adjusted <strong>to</strong> take account of all the tests that they might make<br />
• aga<strong>in</strong>st<br />
• multiple-comparisons are unnecessary if you have only a small<br />
number of comparisons <strong>to</strong> make – either because there are few<br />
treatments, or because you should have identified beforehand the<br />
comparisons that you feel are likely <strong>to</strong> be of <strong>in</strong>terest<br />
• they are <strong>in</strong>appropriate also if the treatments have any sort of<br />
structure e.g. levels may represent different amounts of a<br />
substance like a fertiliser or a drug, then illogical <strong>to</strong> assume that<br />
only some of the amounts might have an effect<br />
• see on-l<strong>in</strong>e help for the menu
Multiple comparisons<br />
• check that they<br />
are enabled on<br />
the Menus tab<br />
of the Options<br />
menu
Multiple comparisons<br />
• the Multiple Comparisons but<strong>to</strong>n will then be available <strong>to</strong><br />
click on the ANOVA Further Output menu<br />
• check Multiple Comparisons<br />
• select Treatment and type of Test<br />
• click OK (and then Run on the Further Output menu)
Practical 1.9<br />
• spreadsheet Octane.gsh s<strong>to</strong>res<br />
data from an experiment <strong>to</strong> study<br />
the effect of different additives A-E<br />
on the octane level of gasol<strong>in</strong>e<br />
used <strong>in</strong> Practical 1.7<br />
• do a one-way analysis of variance<br />
<strong>to</strong> assess if Gasol<strong>in</strong>e has an effect<br />
• A-E represent 0-4 cc/gallon of<br />
additive − estimate l<strong>in</strong>ear and<br />
quadratic effects of additive<br />
• do a Bonferroni multiplecomparison<br />
test <strong>to</strong> compare the<br />
types of gasol<strong>in</strong>e
2 Block<strong>in</strong>g structures<br />
• In this session you will learn<br />
• how <strong>to</strong> improve the precision of an experiment by group<strong>in</strong>g<br />
the units <strong>in</strong><strong>to</strong> similar sets called "blocks"<br />
• how randomization can avoid bias by guard<strong>in</strong>g aga<strong>in</strong>st<br />
unforeseen differences amongst the units<br />
• how <strong>to</strong> design and analyse a complete randomized block<br />
design<br />
• how <strong>to</strong> recognise situations that may require more than<br />
one type of block<strong>in</strong>g<br />
• how <strong>to</strong> design and analyse a Lat<strong>in</strong> square design ★<br />
• Note: <strong>to</strong>pics marked ★ are optional
Completely-randomized design<br />
• design used for all examples so far<br />
• no formal structure is imposed on the units<br />
• assumes units effectively identical e.g.<br />
• <strong>in</strong> a field experiment, no systematic differences <strong>in</strong><br />
underly<strong>in</strong>g fertility, dra<strong>in</strong>age etc of the plots<br />
• <strong>in</strong> a glasshouse, assumes that light and temperature are the<br />
same for each row of pots<br />
• <strong>in</strong> a fac<strong>to</strong>ry, that workforce behaves <strong>in</strong> essentially the same<br />
way at different times of day, days of the week etc<br />
• <strong>in</strong> educational studies, that children <strong>in</strong> different schools are<br />
approximately the same, or students study<strong>in</strong>g different<br />
subjects at Universities, or <strong>in</strong> different year groups etc<br />
• treatments allocated <strong>to</strong> units at random
Non-uniform units<br />
• for example field experiment on a slope<br />
• best plots may be at <strong>to</strong>p of slope<br />
• random allocation of treatments <strong>to</strong> plots may not seem "fair"<br />
• e.g. replicates of treatment A ma<strong>in</strong>ly on "good" plots & replicates of<br />
treatment B ma<strong>in</strong>ly on "bad" plots − if no actual difference between A &<br />
B, could lead <strong>to</strong> A appear<strong>in</strong>g <strong>to</strong> be much better than B<br />
• systematic differences between plots <strong>in</strong>crease the residual sum of<br />
squares, & hence the estimate of random variability<br />
• treatment differences must be larger <strong>to</strong> give a significant F-test<br />
• standard errors of differences between treatments will be larger i.e.<br />
experiment will give less precise results<br />
• if you know there are differences between units<br />
• avoid bias & improve precision by group<strong>in</strong>g (block<strong>in</strong>g) units <strong>in</strong><strong>to</strong><br />
homogenous groups (i.e. groups that are effectively identical)
Randomized block design<br />
• s<strong>in</strong>gle group<strong>in</strong>g fac<strong>to</strong>r usually known as blocks<br />
• with<strong>in</strong> each block<br />
• same number of units for each treatment (one per<br />
treatment <strong>in</strong> a randomized-complete-block design)<br />
• treatments are allocated randomly <strong>to</strong> the units<br />
• <strong>in</strong> analysis block-effects are estimated and<br />
removed, lead<strong>in</strong>g <strong>to</strong> more-precise estimates<br />
• e.g.
One-way anova with blocks<br />
• another experiment <strong>to</strong><br />
study effect of dietary<br />
supplements on ga<strong>in</strong> <strong>in</strong><br />
weight of rats<br />
• 8 litters of 5 rats<br />
• assume rats from same<br />
litter more similar than<br />
those from different litters<br />
• 5 Diet treatments (A-E),<br />
allocated at random <strong>to</strong><br />
rats with<strong>in</strong> each litter
No block<strong>in</strong>g<br />
� residual m.s. 206.8<br />
variance ratio 0.42<br />
� s.e.d. 7.19
With litters as blocks<br />
Differences between litters<br />
residual m.s.<br />
40.63 (c.f. 206.8)<br />
� variance ratio<br />
2.13 (c.f. 0.42)<br />
� s.e.d. 3.19 (c.f. 7.19)
Practical 2.3<br />
• spreadsheet Wheatstra<strong>in</strong>s.gsh<br />
conta<strong>in</strong>s the results from a<br />
randomized block design <strong>to</strong><br />
assess 4 stra<strong>in</strong>s of wheat<br />
• analyse the experiment<br />
• give your assessment of<br />
whether the block<strong>in</strong>g was<br />
worthwhile
Block<strong>in</strong>g <strong>in</strong> 2 directions<br />
• e.g. experiment on pot plants <strong>in</strong> a glasshouse<br />
• door <strong>in</strong> east wall which may cause temperature differences<br />
• sunlight ma<strong>in</strong>ly from the south<br />
• other e.g.<br />
• weekday × time-of-day<br />
• school × year-group<br />
• fac<strong>to</strong>ry × weekday<br />
• time × location
Lat<strong>in</strong> square design<br />
• a design for t treatments<br />
• arranged <strong>in</strong> t rows and t columns (i.e. t 2 units)<br />
• each treatment occurs exactly once <strong>in</strong> each row<br />
and once <strong>in</strong> each column<br />
• randomized by randomly permut<strong>in</strong>g rows &<br />
columns<br />
• e.g.
Lat<strong>in</strong> square example<br />
• experiment <strong>to</strong> assess the<br />
(<strong>in</strong>?)consistency of 6<br />
samplers <strong>in</strong> assess<strong>in</strong>g the<br />
heights of wheat plants<br />
• 6 areas of wheat <strong>to</strong> assess<br />
• may also be order<strong>in</strong>g<br />
effects (accuracy of<br />
samplers may vary dur<strong>in</strong>g<br />
experiment)<br />
• so 6×6 Lat<strong>in</strong> square used<br />
with block<strong>in</strong>g fac<strong>to</strong>rs Areas<br />
and Orders
<strong>An</strong>alysis of Variance menu<br />
• select Design <strong>to</strong> be Lat<strong>in</strong> Square
Output<br />
� between Areas<br />
� between Orders<br />
� Samplers more<br />
precisely<br />
estimated<br />
(residual m.s.<br />
3.328 c.f. 5.801)
Practical 2.5<br />
• spreadsheet Fabric.gsh<br />
conta<strong>in</strong>s the results from<br />
a Lat<strong>in</strong> square design <strong>to</strong><br />
assess wear resistance of<br />
rubber-covered fabrics<br />
• column fac<strong>to</strong>r is 4<br />
different runs<br />
• row fac<strong>to</strong>r is four<br />
positions on test<strong>in</strong>g<br />
mach<strong>in</strong>e used <strong>to</strong><br />
generate wear under<br />
simulated natural<br />
conditions<br />
• analyse the results
3 Treatment structure<br />
• In this session you will learn how <strong>to</strong><br />
• recognise the need for more than one treatment fac<strong>to</strong>r<br />
• analyse designs with two treatment fac<strong>to</strong>rs us<strong>in</strong>g the Oneand<br />
two-way ANOVA menu<br />
• def<strong>in</strong>e and <strong>in</strong>terpret <strong>in</strong>teractions between fac<strong>to</strong>rs<br />
• analyse designs with two treatment fac<strong>to</strong>rs us<strong>in</strong>g the<br />
general <strong>An</strong>alysis of Variance menu ★<br />
• use the <strong>An</strong>ova Contrasts menu ★<br />
• estimate comparisons between levels of a treatment fac<strong>to</strong>r<br />
★<br />
• <strong>in</strong>terpret <strong>in</strong>teractions between treatment contrasts ★<br />
• use model formulae <strong>to</strong> def<strong>in</strong>e the treatment terms <strong>to</strong> be<br />
fitted<br />
• <strong>in</strong>clude control treatments <strong>in</strong> a fac<strong>to</strong>rial experiment ★<br />
• use covariates <strong>to</strong> improve precision by us<strong>in</strong>g additional<br />
background <strong>in</strong>formation about the experimental units (not<br />
used for block<strong>in</strong>g ★<br />
• Note: <strong>to</strong>pics marked ★ are optional
Types of treatment<br />
• experiments may study different types of treatment e.g.<br />
• several different drugs at a range of different doses<br />
• several different types of fertiliser<br />
• varieties of wheat and types of fungicide<br />
• represent each type of treatment by a different treatment<br />
fac<strong>to</strong>r, with levels <strong>to</strong> represent the various possibilities<br />
e.g.<br />
• Drug − levels Morph<strong>in</strong>e, Amidone, Phenadoxone, Pethid<strong>in</strong>e;<br />
• Dose − levels 2.5, 5, 10, 15;<br />
• Nitrogen − levels 0, 50, 100, 150;<br />
• Phosphate − levels 50, 100;<br />
• Fungicide − levels Carbendazim, Prochloraz;<br />
• Amount − levels 2, 3, 4.
Two treatment fac<strong>to</strong>rs<br />
• experiment on canola<br />
(oil-seed rape)<br />
• 2 treatment fac<strong>to</strong>rs<br />
• N (nitrogen) 0, 180, 230<br />
• S (sulphur) 0, 10, 20, 40<br />
• randomized-block<br />
design<br />
• with 3 blocks (fac<strong>to</strong>r<br />
block)<br />
• and 12 plots per block
One and two-way ANOVA<br />
menu<br />
• Two-way analysis (Treatment fac<strong>to</strong>rs N & S)<br />
• with Blocks (fac<strong>to</strong>r block)
Output<br />
� l<strong>in</strong>e for each term: N<br />
& S ma<strong>in</strong> effects,<br />
and N.S <strong>in</strong>teraction<br />
� table of means for<br />
each treatment term<br />
� s.e.d. for each table<br />
of means
L<strong>in</strong>ear model<br />
• y ijk = μ + β i + n j + s k + ns jk + ε ijk<br />
• β i represent the block effects (block stratum <strong>in</strong> the aov)<br />
• ε ijk are the residuals<br />
• n j represent the ma<strong>in</strong> effect of nitrogen (N)<br />
• s k represent the ma<strong>in</strong> effect of sulphur (S)<br />
• ns jk represent the <strong>in</strong>teraction between nitrogen & sulphur<br />
(N.S)<br />
• analysis fits each term <strong>in</strong> turn, so you can<br />
decide how complicated a model is required<br />
• analysis-of-variance table has a l<strong>in</strong>e for each term, so you<br />
can assess whether its parameters are needed <strong>in</strong> the model<br />
• conclusions will be much clearer if there is no <strong>in</strong>teraction
With <strong>in</strong>teraction
Without <strong>in</strong>teraction<br />
• l<strong>in</strong>es are parallel<br />
• can decide on best level of<br />
S without consider<strong>in</strong>g N<br />
• or best level of N without<br />
consider<strong>in</strong>g S<br />
• need present only one-way<br />
tables of means
General <strong>An</strong>alysis of Variance<br />
menu<br />
• Design: Two-way ANOVA (<strong>in</strong> Randomized Blocks)<br />
• click on Contrasts but<strong>to</strong>n <strong>to</strong> fit comparisons (or<br />
other contrasts)
Model formula<br />
• def<strong>in</strong>e a model <strong>to</strong> be fitted <strong>in</strong> an analysis<br />
• formed au<strong>to</strong>matically by the menus – or can def<strong>in</strong>e your own<br />
• list of model terms, l<strong>in</strong>ked by opera<strong>to</strong>r "+”<br />
• e.g. A + B<br />
• 2 terms represent<strong>in</strong>g ma<strong>in</strong> effects of fac<strong>to</strong>rs A & B<br />
• Higher-order terms specified as series of<br />
fac<strong>to</strong>rs separated by dots (e.g. <strong>in</strong>teractions):<br />
mean<strong>in</strong>g depends on contents of formula<br />
• e.g. N + S + N.S N.S is an <strong>in</strong>teraction<br />
• e.g. Block + Block.Plot Block.Plot represents plotwith<strong>in</strong>-block<br />
effects: differences between <strong>in</strong>dividual plots after<br />
remov<strong>in</strong>g the overall similarity between plots <strong>in</strong> same block
Opera<strong>to</strong>rs for formulae<br />
• cross<strong>in</strong>g opera<strong>to</strong>r * specifies fac<strong>to</strong>rial<br />
structures<br />
e.g. N * S<br />
is expanded au<strong>to</strong>matically <strong>to</strong> become N + S + N.S<br />
• nest<strong>in</strong>g opera<strong>to</strong>r / occurs most often <strong>in</strong> block<br />
formulae<br />
e.g Block / Plot<br />
is expanded <strong>to</strong> become Block + Block.Plot
Several opera<strong>to</strong>rs<br />
• 3-fac<strong>to</strong>r fac<strong>to</strong>rial model<br />
A * B * C<br />
becomes A + B + C + A.B + A.C + B.C + A.B.C<br />
• 3 nested fac<strong>to</strong>rs (e.g. block model of split-plot)<br />
block / wplot / subplot<br />
becomes block + block.wplot + block.wplot.subplot<br />
• fac<strong>to</strong>rial-plus-added-control<br />
treatment structure Control / (Drug * Dose)<br />
expands <strong>to</strong> Control + Control.Drug + Control.Dose +<br />
Control.Drug.Dose<br />
• NB: many commands and menus have a FACTORIAL<br />
option <strong>to</strong> control the number of fac<strong>to</strong>rs/variates <strong>in</strong> the<br />
terms <strong>to</strong> fit
Fac<strong>to</strong>rial plus added control<br />
• 4 different fumigants <strong>to</strong><br />
control nema<strong>to</strong>des<br />
• CN, CS, CM and CK<br />
• 2 levels of dose<br />
• s<strong>in</strong>gle and double<br />
• also <strong>in</strong>clude a control<br />
treatment<br />
• none (no fumigant at<br />
any dose)<br />
• randomized-block design<br />
• 4 blocks<br />
• 12 plots per block<br />
• (4 replicates of control<br />
treatment <strong>in</strong> each block)<br />
• effects proportional<br />
• analyse log counts
<strong>An</strong>alysis of Variance menu<br />
• select Design <strong>to</strong> be General Treatment<br />
Structure (<strong>in</strong> Randomized Blocks)
Fac<strong>to</strong>rial plus added<br />
control<br />
• treatment structure Fumigant / ( Level * Type )<br />
• Fumigant represents the overall effect of any<br />
fumigant at any (non-zero) dose<br />
• Fumigant.Level represents comparison between s<strong>in</strong>gle and<br />
double doses (averaged over different types)<br />
• Fumigant.Type represents overall differences between types<br />
(averaged over s<strong>in</strong>gle and double doses)<br />
• Fumigant.Level.Type represents the <strong>in</strong>teraction between Level<br />
and Type (given that some sort of fumigant<br />
has been applied)
Output
Output<br />
� notice different<br />
sed's accord<strong>in</strong>g <strong>to</strong><br />
the replication of<br />
the means
Covariates<br />
• provide additional background <strong>in</strong>formation<br />
• often measurements made before expt (not used for block<strong>in</strong>g)<br />
• e.g. (log) prior nema<strong>to</strong>de counts<br />
• <strong>in</strong>corporated <strong>in</strong> model as l<strong>in</strong>ear (regression) terms<br />
• y ijkl = μ + β i + f j + ft jk + fl jl + ftl jkl + b ×(x ijkl − x mean ) + ε ijkl<br />
• improve precision<br />
• remove potential biases caused by non-uniformity of units<br />
• <strong>in</strong> aov table<br />
• extra l<strong>in</strong>e(s) <strong>to</strong> assess effect of covariate(s) on y-variate, after<br />
remov<strong>in</strong>g effects of treatments<br />
• treatment s.s. (and effects) adjusted <strong>to</strong> take account of the fact<br />
that the plots with the various treatments have different covariate<br />
values<br />
• cov.ef. for treatment is efficiency rema<strong>in</strong><strong>in</strong>g after adjustment<br />
• cov.ef. for residual is amount by which its m.s. has decreased
Output<br />
� regression coefficient for<br />
adjustment <strong>in</strong> Blocks stratum<br />
� regression coefficient for<br />
adjustment with<strong>in</strong> Blocks<br />
� comb<strong>in</strong>ed estimate
Output
Practical 3.7<br />
• spreadsheet Ratmuscles.gsh conta<strong>in</strong>s<br />
data from an experiment <strong>to</strong> study the<br />
effect of electrical stimulation <strong>in</strong><br />
prevent<strong>in</strong>g the wast<strong>in</strong>g away of<br />
denervated muscles of rats<br />
• 3 treatment fac<strong>to</strong>rs<br />
• length of each treatment<br />
• number of treatment periods per day<br />
• type of current<br />
• randomized block design with 2 blocks<br />
• denervated muscles were<br />
gastrocnemius muscles on one side of<br />
each rat<br />
• the normal muscle on the other side<br />
of each rat was also measured, for<br />
use as a covariate <strong>in</strong> the analysis<br />
• analyse the experiment
4 Check<strong>in</strong>g the assumptions<br />
• In this session you will learn<br />
• what assumptions are needed <strong>to</strong> ensure validity of an aov<br />
• why the variance must be homogeneous (e.g. variability<br />
of residuals should be the same at high as low response<br />
values)<br />
• how <strong>to</strong> assess whether the variance is homogeneous<br />
• that residuals should come from identical and <strong>in</strong>dependent<br />
Normal distributions<br />
• how <strong>to</strong> assess the Normality of the residuals<br />
• why the model must be additive (i.e. differences between<br />
treatment effects must rema<strong>in</strong> the same however large or<br />
small the underly<strong>in</strong>g size of the response variable)<br />
• how <strong>to</strong> identify outliers<br />
• how transform<strong>in</strong>g the response variate may correct for<br />
failures <strong>in</strong> the assumptions ★<br />
• how <strong>to</strong> pr<strong>in</strong>t back-transformed tables of means ★<br />
• how <strong>to</strong> do a random permutation test ★<br />
• Note: <strong>to</strong>pics marked ★ are optional
Homogeneity of variance<br />
• random variation must be similar over all units<br />
• beware: it may change with the size of response<br />
• assess by plott<strong>in</strong>g residuals aga<strong>in</strong>st fitted values<br />
• homogeneous <strong>in</strong>creas<strong>in</strong>g with response
Non-homogeneity of<br />
variance<br />
• if variation <strong>in</strong>creases with size of response<br />
• s.e.d.'s between treatment means will be<br />
• over-estimated for differences between low means<br />
• under-estimated for differences between larger means<br />
• this could lead you <strong>to</strong> the wrong conclusions!<br />
• if plot of residuals aga<strong>in</strong>st fitted values<br />
<strong>in</strong>dicates non-homogeneity of variances<br />
• consider transform<strong>in</strong>g the response variate<br />
• (or us<strong>in</strong>g a generalized l<strong>in</strong>ear model; see Guide <strong>to</strong> L<strong>in</strong>ear,<br />
Nonl<strong>in</strong>ear and Generalized L<strong>in</strong>ear Models <strong>in</strong> <strong>GenStat</strong>)
Normality of residuals<br />
• his<strong>to</strong>gram – should be "bell-shaped"<br />
• Normal plot<br />
•residuals <strong>in</strong> ascend<strong>in</strong>g order plotted aga<strong>in</strong>st Normal<br />
quantiles<br />
•should give an approximately straight l<strong>in</strong>e<br />
• half-Normal plot<br />
•similar <strong>to</strong> Normal plot but plots absolute residual values
Additivity<br />
• differences between treatment effects rema<strong>in</strong> the same<br />
however large or small the underly<strong>in</strong>g size of the response<br />
• e.g. <strong>in</strong> randomized-block design, assume that theoretical value<br />
of difference between two treatments rema<strong>in</strong>s the same with<strong>in</strong><br />
a block where responses are low, as <strong>in</strong> one where they are<br />
high<br />
• fitt<strong>in</strong>g an additive model when non-additivity is present<br />
• often leads <strong>to</strong> detection of (spurious) <strong>in</strong>teractions<br />
• analysis will be harder <strong>to</strong> <strong>in</strong>terpret<br />
• predictions will be unreliable<br />
• but take care – genu<strong>in</strong>e <strong>in</strong>teractions may also occur e.g. if one<br />
treatment modifies the mode of action of another<br />
• data that shows signs of non-additivity often also violates<br />
other assumptions<br />
• use background knowledge of the process<br />
• if a multiplicative model appropriate take a log transformation<br />
• for percentage data, consider a logit transformation
Outliers<br />
• are extreme observation, lead<strong>in</strong>g <strong>to</strong> very large residuals<br />
• look for warn<strong>in</strong>gs <strong>in</strong> ANOVA Information Summary<br />
• or for extreme po<strong>in</strong>ts <strong>in</strong> his<strong>to</strong>gram of residuals<br />
• or high or low po<strong>in</strong>ts <strong>in</strong> plot of residuals aga<strong>in</strong>st fitted values<br />
• or po<strong>in</strong>ts away from l<strong>in</strong>e at end of Normal or half-Normal plot<br />
• outliers may arise from<br />
• errors <strong>in</strong> record<strong>in</strong>g or punch<strong>in</strong>g data<br />
• if the wrong treatment has been applied <strong>to</strong> a unit<br />
• where there is a problem <strong>in</strong> the experimental procedure<br />
• outliers<br />
• dis<strong>to</strong>rt treatment means<br />
• <strong>in</strong>flate the error variance, decreas<strong>in</strong>g the precision of estimates<br />
• if you have outliers <strong>in</strong>vestigate <strong>to</strong> see if errors have<br />
occurred<br />
• if you f<strong>in</strong>d an error try <strong>to</strong> recover the correct data value<br />
• if you cannot f<strong>in</strong>d the correct data value, <strong>in</strong>sert a miss<strong>in</strong>g value<br />
• if you cannot f<strong>in</strong>d any possible source of error, perhaps the<br />
outlier might be a true data value – is your model wrong?
Transformations<br />
• can correct failures of assumptions<br />
• e.g. <strong>to</strong> stabilize variance<br />
• counts square root<br />
• b<strong>in</strong>omial percentages angular<br />
i.e. arcs<strong>in</strong>e(sqrt(p/100))<br />
• s.e. proportional <strong>to</strong> mean log<br />
• e.g. non-additivity<br />
• multiplicative effects log<br />
e.g. log10(n+1) for counts<br />
• percentages logit = log(p/(100-p))<br />
p=100×(r+½)/(n+1)<br />
for b<strong>in</strong>omial<br />
• note: must make <strong>in</strong>ferences on transformed<br />
scale<br />
• but can present back-transformed means us<strong>in</strong>g Save and<br />
Calculate menus
Log transformed data<br />
• study of plank<strong>to</strong>n numbers<br />
• 4 types of plank<strong>to</strong>n (treatments)<br />
• sampled <strong>in</strong> 12 hauls (blocks)<br />
• compare analyses for<br />
untransformed and log10<br />
transformed numbers
Save the means
Backtransform and pr<strong>in</strong>t
Practical 4.6<br />
• spreadsheet W<strong>in</strong>e.gsh<br />
conta<strong>in</strong>s results from an<br />
experiment <strong>to</strong> assess the %<br />
alcohol of w<strong>in</strong>e<br />
• 5 types of w<strong>in</strong>e A-E<br />
• 3 bottles of each type were<br />
tested <strong>in</strong> a random order<br />
• analyse the percentages &<br />
plot residuals aga<strong>in</strong>st fitted<br />
values<br />
• transform the percentages<br />
us<strong>in</strong>g a logit transformation,<br />
re-analyse the data & replot<br />
residuals aga<strong>in</strong>st fitted values
Permutation tests<br />
• if the distributional assumptions are not satisfied, you<br />
might use a random permutation test as an alternative<br />
way <strong>to</strong> assess the significance of the terms <strong>in</strong> the analysis<br />
• model must still be additive for results <strong>to</strong> be mean<strong>in</strong>gful<br />
• but residuals need no longer follow Normal distributions with equal<br />
variances<br />
• click on Permutation Test <strong>in</strong> ANOVA Further Output menu<br />
<strong>to</strong> open ANOVA Permutation Test menu<br />
• specify Number of permutations<br />
• select Seed (0 au<strong>to</strong>matic)<br />
• click on Run<br />
• probability for each treatment<br />
term is now determ<strong>in</strong>ed from its<br />
distribution over the randomly<br />
permuted data sets
Practical 4.8<br />
• spreadsheet W<strong>in</strong>e.gsh<br />
conta<strong>in</strong>s results from an<br />
experiment <strong>to</strong> assess the %<br />
alcohol of w<strong>in</strong>e used <strong>in</strong><br />
Practical 4.6<br />
• 5 types of w<strong>in</strong>e A-E<br />
• 3 bottles of each type were<br />
tested <strong>in</strong> a random order<br />
• analyse the percentages &<br />
plot residuals aga<strong>in</strong>st fitted<br />
values<br />
• assess the differences<br />
between the types us<strong>in</strong>g a<br />
permutation test