24.01.2013 Views

An Introduction to Statistical Methods in GenStat - VSN International

An Introduction to Statistical Methods in GenStat - VSN International

An Introduction to Statistical Methods in GenStat - VSN International

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>An</strong> <strong>Introduction</strong> <strong>to</strong><br />

<strong>Statistical</strong> <strong>Methods</strong><br />

<strong>in</strong> <strong>GenStat</strong><br />

Alex Glaser and Carey Biggs<br />

<strong>VSN</strong> <strong>International</strong>, 5 The Waterhouse,<br />

Waterhouse Street, Hemel Hempstead, UK<br />

email: alex@vsni.co.uk carey@vsni.co.uk<br />

Many thanks <strong>to</strong> Roger Payne for the orig<strong>in</strong>al slides<br />

South Africa, November 2010


Programme<br />

• Day 1<br />

• <strong>Introduction</strong> <strong>to</strong> <strong>GenStat</strong><br />

• From t-test <strong>to</strong> one-way anova<br />

• Basic pr<strong>in</strong>ciples of design and block<strong>in</strong>g<br />

• Treatment structure − fac<strong>to</strong>rials & <strong>in</strong>teractions<br />

and check<strong>in</strong>g the assumptions<br />

• Day 2<br />

• Simple l<strong>in</strong>ear regression<br />

• Multiple l<strong>in</strong>ear regression<br />

• GLM – counts and b<strong>in</strong>omial data<br />

• GLM – further models and extensions


Exercise 1.1<br />

• What happens when you select <strong>in</strong>put log <strong>in</strong><br />

the w<strong>in</strong>dow naviga<strong>to</strong>r?<br />

• Can you see yourself us<strong>in</strong>g this feature <strong>in</strong> you<br />

work? If so, how?<br />

• What happens <strong>to</strong> status bar when you click the<br />

but<strong>to</strong>n?<br />

• Resize the <strong>in</strong>put log and output w<strong>in</strong>dow so<br />

that you can see both simultaneously<br />

• What happens when you click the but<strong>to</strong>n?<br />

• Use the <strong>to</strong>ols|cus<strong>to</strong>mize <strong>to</strong>olbar menu <strong>to</strong><br />

add or remove but<strong>to</strong>ns from the <strong>to</strong>olbar <strong>to</strong> suit<br />

your needs.


<strong>GenStat</strong> Client<br />

Menus Commands<br />

<strong>GenStat</strong> Server


Exercise 1.2<br />

• What happens <strong>to</strong> the text <strong>in</strong> right hand corner<br />

of the status bar if you press the <strong>in</strong>sert key?<br />

• What do you th<strong>in</strong>k this part of the status bar<br />

means?<br />

• Open a new text w<strong>in</strong>dow us<strong>in</strong>g the but<strong>to</strong>n.<br />

In this w<strong>in</strong>dow, type the follow<strong>in</strong>g <strong>GenStat</strong><br />

command<br />

PRINT ‘This is my first time us<strong>in</strong>g <strong>GenStat</strong>’<br />

• Execute the command us<strong>in</strong>g the Run|Submit<br />

L<strong>in</strong>e menu option. Now select the W<strong>in</strong>dow|<br />

Event Log entry for this action. Is there an<br />

Event log for this action?


Exercise 2<br />

• F<strong>in</strong>d help for what’s new <strong>in</strong> the 13 th edition of<br />

<strong>GenStat</strong><br />

• F<strong>in</strong>d help on the <strong>GenStat</strong> spreadsheet<br />

• Open the Tools|Options menu and f<strong>in</strong>d help<br />

about the ECHO COMMANDS sett<strong>in</strong>g on the<br />

AUDIT TRAIL tab.<br />

• Open a new test w<strong>in</strong>dow and type <strong>in</strong> the word<br />

FIT. Place the cursor <strong>in</strong> the word and press the<br />

F1 key. What is FIT? Type <strong>in</strong> a statistical term<br />

and press the F1 key.<br />

• View the <strong>Introduction</strong> <strong>to</strong> <strong>GenStat</strong> guide (pdf<br />

format)<br />

• View an example program for a two-sample ttest.


Exercise 3.1<br />

• Clear all the data from <strong>GenStat</strong> and use the<br />

file|open menu <strong>to</strong> read the data from the<br />

file sulphur.xls from <strong>in</strong>stallsets\Data<br />

• Clear all the data from <strong>GenStat</strong>. Go <strong>to</strong> the<br />

<strong>to</strong>ols|spreadsheet options|file menu and<br />

uncheck the use excel import wizard on<br />

file open option. Repeat part 1 us<strong>in</strong>g the<br />

file|open menu. Which approach best suits<br />

your way of work<strong>in</strong>g?<br />

• The file bacteria.xls, that you met earlier,<br />

conta<strong>in</strong>s data from a second experiment <strong>in</strong><br />

the worksheet called Bacteria Counts. The<br />

data are not s<strong>to</strong>red <strong>in</strong> standard format; the<br />

data can be found <strong>in</strong> the range of cells<br />

D3:E13. Clear the data core. Read the<br />

data <strong>in</strong><strong>to</strong> <strong>GenStat</strong> us<strong>in</strong>g the Excel import<br />

wizard but<strong>to</strong>n.


Exercise 3.2 & 3.3<br />

• Us<strong>in</strong>g the data <strong>in</strong> the iris.gsh file:<br />

• Produce a scatter plot of Sepal Width versus Petal<br />

Width. There is one po<strong>in</strong>t <strong>in</strong> this plot that stands<br />

alone. What are the coord<strong>in</strong>ates of this po<strong>in</strong>t? Can<br />

you suggest a method of easily identify<strong>in</strong>g <strong>to</strong> which<br />

species of iris this unusual po<strong>in</strong>t belongs?<br />

• Produce a scatter plot of Sepal Length versus Petal<br />

Length. Give each fac<strong>to</strong>r a different symbol and<br />

colours. Experiment with labell<strong>in</strong>g.<br />

• Produce a his<strong>to</strong>gram of Petal lengths versus Petal<br />

widths.<br />

• Us<strong>in</strong>g your own data, experiment with the<br />

different aspects of the graphics w<strong>in</strong>dow.<br />

That is, explore the different menus and<br />

<strong>to</strong>olbars. If you have not brought your on data<br />

sets, experiment with any of the course data<br />

files.


Exercise 4.1<br />

• Us<strong>in</strong>g the Excel Import Wizard,<br />

load <strong>in</strong> the file Traffic.xls<br />

• On the second screen enter B3:D43 <strong>in</strong><br />

the Specified Range box.<br />

• Click OK on the Select Columns <strong>to</strong><br />

Convert <strong>to</strong> Fac<strong>to</strong>rs menu<br />

• Convert Day and Month <strong>to</strong> fac<strong>to</strong>rs<br />

us<strong>in</strong>g the methods of your choice.


Exercise 4.2<br />

• Cont<strong>in</strong>ue us<strong>in</strong>g the file Traffic.xls<br />

• Select a cell <strong>in</strong> the Day column.<br />

Delete the value, type ‘F’ and then<br />

press return. Repeat the process but<br />

with the value ‘G’. What property of<br />

the <strong>GenStat</strong> spreadsheet do you th<strong>in</strong>k<br />

this illustrates.<br />

• Select the Tools|Spreadsheet<br />

Options |Conversions menu. Check<br />

the Allow new fac<strong>to</strong>r levels <strong>in</strong> Edit<br />

box. Now repeat the above question.<br />

What happens now?


Exercise 4.3<br />

• Cont<strong>in</strong>ue us<strong>in</strong>g the file Traffic.xls<br />

• Create a new variate which conta<strong>in</strong>s<br />

the log of the Counts.<br />

• Sort the columns <strong>in</strong> descend<strong>in</strong>g order<br />

of the Counts.<br />

• Use the Spread| Manipulate|<br />

Unstack <strong>to</strong> create separate variables<br />

for each day of the week.<br />

• Experiment with the Calculate menu<br />

with your own data.


1 From t-test <strong>to</strong> one-way anova<br />

• In this session you will learn<br />

• how <strong>to</strong> use the t-test <strong>to</strong> compare two treatments<br />

• the T-Test menu<br />

• how <strong>to</strong> use one-way ANOVA <strong>to</strong> compare several treatments<br />

• the model fitted <strong>in</strong> one-way anova<br />

• the statistical philosophy beh<strong>in</strong>d one-way anova<br />

• the relationship between one-way anova and the t-test for<br />

two treatments<br />

• how <strong>to</strong> use the One- and two-way ANOVA menu for oneway<br />

anova<br />

• how <strong>to</strong> plot the means from one-way anova<br />

• how <strong>to</strong> do multiple comparisons ★<br />

• Note: <strong>to</strong>pics marked ★ are optional


t-test<br />

• suppose we have 2 sets of units, that have received 2<br />

different treatments:<br />

• animals that have been fed two different diets<br />

• plots that have been given different fertilisers<br />

• subjects with different drugs<br />

• plants with different fungicides .<br />

• assume the units do not have any special structure e.g.<br />

• the animals are all of the same breed<br />

• the plots are <strong>in</strong> a fairly uniform field<br />

• the subjects are of similar ages, weights and heights<br />

• with 2 treatments we may then do a t-test<br />

• assume each group from a Normal distribution<br />

• usually assume distributions have the same s.e. (can<br />

check)<br />

• but may have different means


Data sets<br />

• filter by the course<br />

Guide <strong>to</strong> <strong>An</strong>ova<br />

and Design<br />

• select the file<br />

• click on Open data<br />

• data sets for the examples<br />

and practicals can be<br />

accessed us<strong>in</strong>g the<br />

Example Data Sets menu


t-test<br />

• experiment <strong>to</strong> study yields from 2<br />

manufactur<strong>in</strong>g methods<br />

• data <strong>in</strong> Manufacture.gsh<br />

• do yields differ more than we<br />

would expect from the random<br />

variation?<br />

• can we estimate mean yields from<br />

each method?


t-test menu<br />

• Use <strong>GenStat</strong> menus for simplicity


Output


Practical 1.2<br />

• spreadsheet Pots.gsh s<strong>to</strong>res<br />

data from a fertilizer<br />

experiment<br />

• 7 plants grown <strong>in</strong> pots with no<br />

fertilizer<br />

• 8 plants grown <strong>in</strong> similar<br />

conditions with fertilizer<br />

• do a two-sample t-test <strong>to</strong><br />

assess whether fertilizer has<br />

an effect


One-way analysis of variance<br />

• l<strong>in</strong>ear model y ij = μ + a i + ε ij<br />

• represent each mean by<br />

• grand mean μ<br />

• + effect a i<br />

• observations described by<br />

• fitted value μ + a i<br />

• + residual ε ij


Residual variation<br />

• may arise from many different causes:<br />

• the units may not be absolutely identical (discuss<br />

later how <strong>to</strong> allocate units <strong>to</strong> treatments <strong>to</strong> take<br />

account of this)<br />

• they may experience slightly different conditions<br />

dur<strong>in</strong>g the experiment<br />

• there may be measurement errors<br />

• they may be be<strong>in</strong>g dealt with by different people<br />

dur<strong>in</strong>g the experiment<br />

• and you can no doubt th<strong>in</strong>k of others!<br />

• so estimation is not exact<br />

• analysis must estimate the amount of variation<br />

• and take account of it <strong>in</strong> draw<strong>in</strong>g conclusions


One-way anova<br />

• l<strong>in</strong>ear model y ij = μ + a i + ε ij<br />

• if treatments have no effect<br />

• a 1 = a 2 = 0<br />

• y ij = μ + ε ij<br />

• estimate grand mean by average of all data values<br />

• assess lack of fit of model by sum of squared residuals (RSS 0 )<br />

• degrees of freedom (d.f.) is n 1 +n 2 −1 (fitted 1 parameter μ)<br />

• fit full model<br />

• estimate a i by average for group i m<strong>in</strong>us grand mean<br />

• assess lack of fit of model by sum of squared residuals (RSS 1 )<br />

• this has n 1 +n 2 −2 d.f. (2 parameters as (n 1 a 1 +n 2 a 2 )/(n 1 +n 2 )=0)<br />

• assess treatments<br />

• sum of squares due <strong>to</strong> treatments is TSS=RSS 0 −RSS 1 on 1 d.f.<br />

• assess underly<strong>in</strong>g variation by residual from full model RSS 1<br />

• variance ratio is treatment mean square / residual mean square<br />

• VR = {TSS / 1} / {RSS 1 / (n 1 +n 2 −2)} on 1 and (n 1 +n 2 −2) d.f.


One and two-way ANOVA<br />

menu


Output<br />

� aov table<br />

� tables of means<br />

� s.e.'s for<br />

differences<br />

between means<br />

(m1 – m2)/sed = t


ANOVA Options menu<br />

• Options menu controls the output


ANOVA Further Output menu<br />

• Further Output menu provides more output<br />

(without redo<strong>in</strong>g the analysis)


ANOVA Means Plots menu<br />

• Means Plots menu plots means<br />

• as po<strong>in</strong>ts<br />

• or jo<strong>in</strong>ed by l<strong>in</strong>es<br />

• or with orig<strong>in</strong>al data po<strong>in</strong>ts <strong>to</strong>o<br />

• or <strong>in</strong> a bar chart


Practical 1.4<br />

• spreadsheet Pots.gsh s<strong>to</strong>res<br />

data from a fertilizer<br />

experiment used <strong>in</strong> Practical<br />

1.2<br />

• 7 plants grown <strong>in</strong> pots with no<br />

fertilizer<br />

• 8 plants grown <strong>in</strong> similar<br />

conditions with fertilizer<br />

• do a one-way analysis of<br />

variance <strong>to</strong> assess if fertilizer<br />

has an effect<br />

• compare results with t-test<br />

from Practical 1.2


One-way anova with >2<br />

treatments<br />

• spreadsheet Rat.gsh has data<br />

from an experiment <strong>to</strong> study<br />

effect of dietary supplements<br />

on ga<strong>in</strong> <strong>in</strong> weight of rats<br />

• 5 diet treatments (a-e)<br />

• 20 rats allocated at random, 4<br />

per treatment<br />

• can use One-and two-way<br />

ANOVA menu, and plot means,<br />

as before


Output<br />

� aov table<br />

� means<br />

� s.e.d


Plot of means<br />

• suppose a-e<br />

represent<br />

amounts 0-4<br />

of supplement<br />

• might want <strong>to</strong><br />

assess l<strong>in</strong>ear<br />

(& quadratic?)<br />

effects of<br />

supplement


Multiple comparison tests<br />

• <strong>in</strong> favour<br />

• there may be many possible comparisons between pairs of<br />

treatment means (with t treatments there are t×(t–1)/2)<br />

• so some researchers feel their significance levels should be<br />

adjusted <strong>to</strong> take account of all the tests that they might make<br />

• aga<strong>in</strong>st<br />

• multiple-comparisons are unnecessary if you have only a small<br />

number of comparisons <strong>to</strong> make – either because there are few<br />

treatments, or because you should have identified beforehand the<br />

comparisons that you feel are likely <strong>to</strong> be of <strong>in</strong>terest<br />

• they are <strong>in</strong>appropriate also if the treatments have any sort of<br />

structure e.g. levels may represent different amounts of a<br />

substance like a fertiliser or a drug, then illogical <strong>to</strong> assume that<br />

only some of the amounts might have an effect<br />

• see on-l<strong>in</strong>e help for the menu


Multiple comparisons<br />

• check that they<br />

are enabled on<br />

the Menus tab<br />

of the Options<br />

menu


Multiple comparisons<br />

• the Multiple Comparisons but<strong>to</strong>n will then be available <strong>to</strong><br />

click on the ANOVA Further Output menu<br />

• check Multiple Comparisons<br />

• select Treatment and type of Test<br />

• click OK (and then Run on the Further Output menu)


Practical 1.9<br />

• spreadsheet Octane.gsh s<strong>to</strong>res<br />

data from an experiment <strong>to</strong> study<br />

the effect of different additives A-E<br />

on the octane level of gasol<strong>in</strong>e<br />

used <strong>in</strong> Practical 1.7<br />

• do a one-way analysis of variance<br />

<strong>to</strong> assess if Gasol<strong>in</strong>e has an effect<br />

• A-E represent 0-4 cc/gallon of<br />

additive − estimate l<strong>in</strong>ear and<br />

quadratic effects of additive<br />

• do a Bonferroni multiplecomparison<br />

test <strong>to</strong> compare the<br />

types of gasol<strong>in</strong>e


2 Block<strong>in</strong>g structures<br />

• In this session you will learn<br />

• how <strong>to</strong> improve the precision of an experiment by group<strong>in</strong>g<br />

the units <strong>in</strong><strong>to</strong> similar sets called "blocks"<br />

• how randomization can avoid bias by guard<strong>in</strong>g aga<strong>in</strong>st<br />

unforeseen differences amongst the units<br />

• how <strong>to</strong> design and analyse a complete randomized block<br />

design<br />

• how <strong>to</strong> recognise situations that may require more than<br />

one type of block<strong>in</strong>g<br />

• how <strong>to</strong> design and analyse a Lat<strong>in</strong> square design ★<br />

• Note: <strong>to</strong>pics marked ★ are optional


Completely-randomized design<br />

• design used for all examples so far<br />

• no formal structure is imposed on the units<br />

• assumes units effectively identical e.g.<br />

• <strong>in</strong> a field experiment, no systematic differences <strong>in</strong><br />

underly<strong>in</strong>g fertility, dra<strong>in</strong>age etc of the plots<br />

• <strong>in</strong> a glasshouse, assumes that light and temperature are the<br />

same for each row of pots<br />

• <strong>in</strong> a fac<strong>to</strong>ry, that workforce behaves <strong>in</strong> essentially the same<br />

way at different times of day, days of the week etc<br />

• <strong>in</strong> educational studies, that children <strong>in</strong> different schools are<br />

approximately the same, or students study<strong>in</strong>g different<br />

subjects at Universities, or <strong>in</strong> different year groups etc<br />

• treatments allocated <strong>to</strong> units at random


Non-uniform units<br />

• for example field experiment on a slope<br />

• best plots may be at <strong>to</strong>p of slope<br />

• random allocation of treatments <strong>to</strong> plots may not seem "fair"<br />

• e.g. replicates of treatment A ma<strong>in</strong>ly on "good" plots & replicates of<br />

treatment B ma<strong>in</strong>ly on "bad" plots − if no actual difference between A &<br />

B, could lead <strong>to</strong> A appear<strong>in</strong>g <strong>to</strong> be much better than B<br />

• systematic differences between plots <strong>in</strong>crease the residual sum of<br />

squares, & hence the estimate of random variability<br />

• treatment differences must be larger <strong>to</strong> give a significant F-test<br />

• standard errors of differences between treatments will be larger i.e.<br />

experiment will give less precise results<br />

• if you know there are differences between units<br />

• avoid bias & improve precision by group<strong>in</strong>g (block<strong>in</strong>g) units <strong>in</strong><strong>to</strong><br />

homogenous groups (i.e. groups that are effectively identical)


Randomized block design<br />

• s<strong>in</strong>gle group<strong>in</strong>g fac<strong>to</strong>r usually known as blocks<br />

• with<strong>in</strong> each block<br />

• same number of units for each treatment (one per<br />

treatment <strong>in</strong> a randomized-complete-block design)<br />

• treatments are allocated randomly <strong>to</strong> the units<br />

• <strong>in</strong> analysis block-effects are estimated and<br />

removed, lead<strong>in</strong>g <strong>to</strong> more-precise estimates<br />

• e.g.


One-way anova with blocks<br />

• another experiment <strong>to</strong><br />

study effect of dietary<br />

supplements on ga<strong>in</strong> <strong>in</strong><br />

weight of rats<br />

• 8 litters of 5 rats<br />

• assume rats from same<br />

litter more similar than<br />

those from different litters<br />

• 5 Diet treatments (A-E),<br />

allocated at random <strong>to</strong><br />

rats with<strong>in</strong> each litter


No block<strong>in</strong>g<br />

� residual m.s. 206.8<br />

variance ratio 0.42<br />

� s.e.d. 7.19


With litters as blocks<br />

Differences between litters<br />

residual m.s.<br />

40.63 (c.f. 206.8)<br />

� variance ratio<br />

2.13 (c.f. 0.42)<br />

� s.e.d. 3.19 (c.f. 7.19)


Practical 2.3<br />

• spreadsheet Wheatstra<strong>in</strong>s.gsh<br />

conta<strong>in</strong>s the results from a<br />

randomized block design <strong>to</strong><br />

assess 4 stra<strong>in</strong>s of wheat<br />

• analyse the experiment<br />

• give your assessment of<br />

whether the block<strong>in</strong>g was<br />

worthwhile


Block<strong>in</strong>g <strong>in</strong> 2 directions<br />

• e.g. experiment on pot plants <strong>in</strong> a glasshouse<br />

• door <strong>in</strong> east wall which may cause temperature differences<br />

• sunlight ma<strong>in</strong>ly from the south<br />

• other e.g.<br />

• weekday × time-of-day<br />

• school × year-group<br />

• fac<strong>to</strong>ry × weekday<br />

• time × location


Lat<strong>in</strong> square design<br />

• a design for t treatments<br />

• arranged <strong>in</strong> t rows and t columns (i.e. t 2 units)<br />

• each treatment occurs exactly once <strong>in</strong> each row<br />

and once <strong>in</strong> each column<br />

• randomized by randomly permut<strong>in</strong>g rows &<br />

columns<br />

• e.g.


Lat<strong>in</strong> square example<br />

• experiment <strong>to</strong> assess the<br />

(<strong>in</strong>?)consistency of 6<br />

samplers <strong>in</strong> assess<strong>in</strong>g the<br />

heights of wheat plants<br />

• 6 areas of wheat <strong>to</strong> assess<br />

• may also be order<strong>in</strong>g<br />

effects (accuracy of<br />

samplers may vary dur<strong>in</strong>g<br />

experiment)<br />

• so 6×6 Lat<strong>in</strong> square used<br />

with block<strong>in</strong>g fac<strong>to</strong>rs Areas<br />

and Orders


<strong>An</strong>alysis of Variance menu<br />

• select Design <strong>to</strong> be Lat<strong>in</strong> Square


Output<br />

� between Areas<br />

� between Orders<br />

� Samplers more<br />

precisely<br />

estimated<br />

(residual m.s.<br />

3.328 c.f. 5.801)


Practical 2.5<br />

• spreadsheet Fabric.gsh<br />

conta<strong>in</strong>s the results from<br />

a Lat<strong>in</strong> square design <strong>to</strong><br />

assess wear resistance of<br />

rubber-covered fabrics<br />

• column fac<strong>to</strong>r is 4<br />

different runs<br />

• row fac<strong>to</strong>r is four<br />

positions on test<strong>in</strong>g<br />

mach<strong>in</strong>e used <strong>to</strong><br />

generate wear under<br />

simulated natural<br />

conditions<br />

• analyse the results


3 Treatment structure<br />

• In this session you will learn how <strong>to</strong><br />

• recognise the need for more than one treatment fac<strong>to</strong>r<br />

• analyse designs with two treatment fac<strong>to</strong>rs us<strong>in</strong>g the Oneand<br />

two-way ANOVA menu<br />

• def<strong>in</strong>e and <strong>in</strong>terpret <strong>in</strong>teractions between fac<strong>to</strong>rs<br />

• analyse designs with two treatment fac<strong>to</strong>rs us<strong>in</strong>g the<br />

general <strong>An</strong>alysis of Variance menu ★<br />

• use the <strong>An</strong>ova Contrasts menu ★<br />

• estimate comparisons between levels of a treatment fac<strong>to</strong>r<br />

★<br />

• <strong>in</strong>terpret <strong>in</strong>teractions between treatment contrasts ★<br />

• use model formulae <strong>to</strong> def<strong>in</strong>e the treatment terms <strong>to</strong> be<br />

fitted<br />

• <strong>in</strong>clude control treatments <strong>in</strong> a fac<strong>to</strong>rial experiment ★<br />

• use covariates <strong>to</strong> improve precision by us<strong>in</strong>g additional<br />

background <strong>in</strong>formation about the experimental units (not<br />

used for block<strong>in</strong>g ★<br />

• Note: <strong>to</strong>pics marked ★ are optional


Types of treatment<br />

• experiments may study different types of treatment e.g.<br />

• several different drugs at a range of different doses<br />

• several different types of fertiliser<br />

• varieties of wheat and types of fungicide<br />

• represent each type of treatment by a different treatment<br />

fac<strong>to</strong>r, with levels <strong>to</strong> represent the various possibilities<br />

e.g.<br />

• Drug − levels Morph<strong>in</strong>e, Amidone, Phenadoxone, Pethid<strong>in</strong>e;<br />

• Dose − levels 2.5, 5, 10, 15;<br />

• Nitrogen − levels 0, 50, 100, 150;<br />

• Phosphate − levels 50, 100;<br />

• Fungicide − levels Carbendazim, Prochloraz;<br />

• Amount − levels 2, 3, 4.


Two treatment fac<strong>to</strong>rs<br />

• experiment on canola<br />

(oil-seed rape)<br />

• 2 treatment fac<strong>to</strong>rs<br />

• N (nitrogen) 0, 180, 230<br />

• S (sulphur) 0, 10, 20, 40<br />

• randomized-block<br />

design<br />

• with 3 blocks (fac<strong>to</strong>r<br />

block)<br />

• and 12 plots per block


One and two-way ANOVA<br />

menu<br />

• Two-way analysis (Treatment fac<strong>to</strong>rs N & S)<br />

• with Blocks (fac<strong>to</strong>r block)


Output<br />

� l<strong>in</strong>e for each term: N<br />

& S ma<strong>in</strong> effects,<br />

and N.S <strong>in</strong>teraction<br />

� table of means for<br />

each treatment term<br />

� s.e.d. for each table<br />

of means


L<strong>in</strong>ear model<br />

• y ijk = μ + β i + n j + s k + ns jk + ε ijk<br />

• β i represent the block effects (block stratum <strong>in</strong> the aov)<br />

• ε ijk are the residuals<br />

• n j represent the ma<strong>in</strong> effect of nitrogen (N)<br />

• s k represent the ma<strong>in</strong> effect of sulphur (S)<br />

• ns jk represent the <strong>in</strong>teraction between nitrogen & sulphur<br />

(N.S)<br />

• analysis fits each term <strong>in</strong> turn, so you can<br />

decide how complicated a model is required<br />

• analysis-of-variance table has a l<strong>in</strong>e for each term, so you<br />

can assess whether its parameters are needed <strong>in</strong> the model<br />

• conclusions will be much clearer if there is no <strong>in</strong>teraction


With <strong>in</strong>teraction


Without <strong>in</strong>teraction<br />

• l<strong>in</strong>es are parallel<br />

• can decide on best level of<br />

S without consider<strong>in</strong>g N<br />

• or best level of N without<br />

consider<strong>in</strong>g S<br />

• need present only one-way<br />

tables of means


General <strong>An</strong>alysis of Variance<br />

menu<br />

• Design: Two-way ANOVA (<strong>in</strong> Randomized Blocks)<br />

• click on Contrasts but<strong>to</strong>n <strong>to</strong> fit comparisons (or<br />

other contrasts)


Model formula<br />

• def<strong>in</strong>e a model <strong>to</strong> be fitted <strong>in</strong> an analysis<br />

• formed au<strong>to</strong>matically by the menus – or can def<strong>in</strong>e your own<br />

• list of model terms, l<strong>in</strong>ked by opera<strong>to</strong>r "+”<br />

• e.g. A + B<br />

• 2 terms represent<strong>in</strong>g ma<strong>in</strong> effects of fac<strong>to</strong>rs A & B<br />

• Higher-order terms specified as series of<br />

fac<strong>to</strong>rs separated by dots (e.g. <strong>in</strong>teractions):<br />

mean<strong>in</strong>g depends on contents of formula<br />

• e.g. N + S + N.S N.S is an <strong>in</strong>teraction<br />

• e.g. Block + Block.Plot Block.Plot represents plotwith<strong>in</strong>-block<br />

effects: differences between <strong>in</strong>dividual plots after<br />

remov<strong>in</strong>g the overall similarity between plots <strong>in</strong> same block


Opera<strong>to</strong>rs for formulae<br />

• cross<strong>in</strong>g opera<strong>to</strong>r * specifies fac<strong>to</strong>rial<br />

structures<br />

e.g. N * S<br />

is expanded au<strong>to</strong>matically <strong>to</strong> become N + S + N.S<br />

• nest<strong>in</strong>g opera<strong>to</strong>r / occurs most often <strong>in</strong> block<br />

formulae<br />

e.g Block / Plot<br />

is expanded <strong>to</strong> become Block + Block.Plot


Several opera<strong>to</strong>rs<br />

• 3-fac<strong>to</strong>r fac<strong>to</strong>rial model<br />

A * B * C<br />

becomes A + B + C + A.B + A.C + B.C + A.B.C<br />

• 3 nested fac<strong>to</strong>rs (e.g. block model of split-plot)<br />

block / wplot / subplot<br />

becomes block + block.wplot + block.wplot.subplot<br />

• fac<strong>to</strong>rial-plus-added-control<br />

treatment structure Control / (Drug * Dose)<br />

expands <strong>to</strong> Control + Control.Drug + Control.Dose +<br />

Control.Drug.Dose<br />

• NB: many commands and menus have a FACTORIAL<br />

option <strong>to</strong> control the number of fac<strong>to</strong>rs/variates <strong>in</strong> the<br />

terms <strong>to</strong> fit


Fac<strong>to</strong>rial plus added control<br />

• 4 different fumigants <strong>to</strong><br />

control nema<strong>to</strong>des<br />

• CN, CS, CM and CK<br />

• 2 levels of dose<br />

• s<strong>in</strong>gle and double<br />

• also <strong>in</strong>clude a control<br />

treatment<br />

• none (no fumigant at<br />

any dose)<br />

• randomized-block design<br />

• 4 blocks<br />

• 12 plots per block<br />

• (4 replicates of control<br />

treatment <strong>in</strong> each block)<br />

• effects proportional<br />

• analyse log counts


<strong>An</strong>alysis of Variance menu<br />

• select Design <strong>to</strong> be General Treatment<br />

Structure (<strong>in</strong> Randomized Blocks)


Fac<strong>to</strong>rial plus added<br />

control<br />

• treatment structure Fumigant / ( Level * Type )<br />

• Fumigant represents the overall effect of any<br />

fumigant at any (non-zero) dose<br />

• Fumigant.Level represents comparison between s<strong>in</strong>gle and<br />

double doses (averaged over different types)<br />

• Fumigant.Type represents overall differences between types<br />

(averaged over s<strong>in</strong>gle and double doses)<br />

• Fumigant.Level.Type represents the <strong>in</strong>teraction between Level<br />

and Type (given that some sort of fumigant<br />

has been applied)


Output


Output<br />

� notice different<br />

sed's accord<strong>in</strong>g <strong>to</strong><br />

the replication of<br />

the means


Covariates<br />

• provide additional background <strong>in</strong>formation<br />

• often measurements made before expt (not used for block<strong>in</strong>g)<br />

• e.g. (log) prior nema<strong>to</strong>de counts<br />

• <strong>in</strong>corporated <strong>in</strong> model as l<strong>in</strong>ear (regression) terms<br />

• y ijkl = μ + β i + f j + ft jk + fl jl + ftl jkl + b ×(x ijkl − x mean ) + ε ijkl<br />

• improve precision<br />

• remove potential biases caused by non-uniformity of units<br />

• <strong>in</strong> aov table<br />

• extra l<strong>in</strong>e(s) <strong>to</strong> assess effect of covariate(s) on y-variate, after<br />

remov<strong>in</strong>g effects of treatments<br />

• treatment s.s. (and effects) adjusted <strong>to</strong> take account of the fact<br />

that the plots with the various treatments have different covariate<br />

values<br />

• cov.ef. for treatment is efficiency rema<strong>in</strong><strong>in</strong>g after adjustment<br />

• cov.ef. for residual is amount by which its m.s. has decreased


Output<br />

� regression coefficient for<br />

adjustment <strong>in</strong> Blocks stratum<br />

� regression coefficient for<br />

adjustment with<strong>in</strong> Blocks<br />

� comb<strong>in</strong>ed estimate


Output


Practical 3.7<br />

• spreadsheet Ratmuscles.gsh conta<strong>in</strong>s<br />

data from an experiment <strong>to</strong> study the<br />

effect of electrical stimulation <strong>in</strong><br />

prevent<strong>in</strong>g the wast<strong>in</strong>g away of<br />

denervated muscles of rats<br />

• 3 treatment fac<strong>to</strong>rs<br />

• length of each treatment<br />

• number of treatment periods per day<br />

• type of current<br />

• randomized block design with 2 blocks<br />

• denervated muscles were<br />

gastrocnemius muscles on one side of<br />

each rat<br />

• the normal muscle on the other side<br />

of each rat was also measured, for<br />

use as a covariate <strong>in</strong> the analysis<br />

• analyse the experiment


4 Check<strong>in</strong>g the assumptions<br />

• In this session you will learn<br />

• what assumptions are needed <strong>to</strong> ensure validity of an aov<br />

• why the variance must be homogeneous (e.g. variability<br />

of residuals should be the same at high as low response<br />

values)<br />

• how <strong>to</strong> assess whether the variance is homogeneous<br />

• that residuals should come from identical and <strong>in</strong>dependent<br />

Normal distributions<br />

• how <strong>to</strong> assess the Normality of the residuals<br />

• why the model must be additive (i.e. differences between<br />

treatment effects must rema<strong>in</strong> the same however large or<br />

small the underly<strong>in</strong>g size of the response variable)<br />

• how <strong>to</strong> identify outliers<br />

• how transform<strong>in</strong>g the response variate may correct for<br />

failures <strong>in</strong> the assumptions ★<br />

• how <strong>to</strong> pr<strong>in</strong>t back-transformed tables of means ★<br />

• how <strong>to</strong> do a random permutation test ★<br />

• Note: <strong>to</strong>pics marked ★ are optional


Homogeneity of variance<br />

• random variation must be similar over all units<br />

• beware: it may change with the size of response<br />

• assess by plott<strong>in</strong>g residuals aga<strong>in</strong>st fitted values<br />

• homogeneous <strong>in</strong>creas<strong>in</strong>g with response


Non-homogeneity of<br />

variance<br />

• if variation <strong>in</strong>creases with size of response<br />

• s.e.d.'s between treatment means will be<br />

• over-estimated for differences between low means<br />

• under-estimated for differences between larger means<br />

• this could lead you <strong>to</strong> the wrong conclusions!<br />

• if plot of residuals aga<strong>in</strong>st fitted values<br />

<strong>in</strong>dicates non-homogeneity of variances<br />

• consider transform<strong>in</strong>g the response variate<br />

• (or us<strong>in</strong>g a generalized l<strong>in</strong>ear model; see Guide <strong>to</strong> L<strong>in</strong>ear,<br />

Nonl<strong>in</strong>ear and Generalized L<strong>in</strong>ear Models <strong>in</strong> <strong>GenStat</strong>)


Normality of residuals<br />

• his<strong>to</strong>gram – should be "bell-shaped"<br />

• Normal plot<br />

•residuals <strong>in</strong> ascend<strong>in</strong>g order plotted aga<strong>in</strong>st Normal<br />

quantiles<br />

•should give an approximately straight l<strong>in</strong>e<br />

• half-Normal plot<br />

•similar <strong>to</strong> Normal plot but plots absolute residual values


Additivity<br />

• differences between treatment effects rema<strong>in</strong> the same<br />

however large or small the underly<strong>in</strong>g size of the response<br />

• e.g. <strong>in</strong> randomized-block design, assume that theoretical value<br />

of difference between two treatments rema<strong>in</strong>s the same with<strong>in</strong><br />

a block where responses are low, as <strong>in</strong> one where they are<br />

high<br />

• fitt<strong>in</strong>g an additive model when non-additivity is present<br />

• often leads <strong>to</strong> detection of (spurious) <strong>in</strong>teractions<br />

• analysis will be harder <strong>to</strong> <strong>in</strong>terpret<br />

• predictions will be unreliable<br />

• but take care – genu<strong>in</strong>e <strong>in</strong>teractions may also occur e.g. if one<br />

treatment modifies the mode of action of another<br />

• data that shows signs of non-additivity often also violates<br />

other assumptions<br />

• use background knowledge of the process<br />

• if a multiplicative model appropriate take a log transformation<br />

• for percentage data, consider a logit transformation


Outliers<br />

• are extreme observation, lead<strong>in</strong>g <strong>to</strong> very large residuals<br />

• look for warn<strong>in</strong>gs <strong>in</strong> ANOVA Information Summary<br />

• or for extreme po<strong>in</strong>ts <strong>in</strong> his<strong>to</strong>gram of residuals<br />

• or high or low po<strong>in</strong>ts <strong>in</strong> plot of residuals aga<strong>in</strong>st fitted values<br />

• or po<strong>in</strong>ts away from l<strong>in</strong>e at end of Normal or half-Normal plot<br />

• outliers may arise from<br />

• errors <strong>in</strong> record<strong>in</strong>g or punch<strong>in</strong>g data<br />

• if the wrong treatment has been applied <strong>to</strong> a unit<br />

• where there is a problem <strong>in</strong> the experimental procedure<br />

• outliers<br />

• dis<strong>to</strong>rt treatment means<br />

• <strong>in</strong>flate the error variance, decreas<strong>in</strong>g the precision of estimates<br />

• if you have outliers <strong>in</strong>vestigate <strong>to</strong> see if errors have<br />

occurred<br />

• if you f<strong>in</strong>d an error try <strong>to</strong> recover the correct data value<br />

• if you cannot f<strong>in</strong>d the correct data value, <strong>in</strong>sert a miss<strong>in</strong>g value<br />

• if you cannot f<strong>in</strong>d any possible source of error, perhaps the<br />

outlier might be a true data value – is your model wrong?


Transformations<br />

• can correct failures of assumptions<br />

• e.g. <strong>to</strong> stabilize variance<br />

• counts square root<br />

• b<strong>in</strong>omial percentages angular<br />

i.e. arcs<strong>in</strong>e(sqrt(p/100))<br />

• s.e. proportional <strong>to</strong> mean log<br />

• e.g. non-additivity<br />

• multiplicative effects log<br />

e.g. log10(n+1) for counts<br />

• percentages logit = log(p/(100-p))<br />

p=100×(r+½)/(n+1)<br />

for b<strong>in</strong>omial<br />

• note: must make <strong>in</strong>ferences on transformed<br />

scale<br />

• but can present back-transformed means us<strong>in</strong>g Save and<br />

Calculate menus


Log transformed data<br />

• study of plank<strong>to</strong>n numbers<br />

• 4 types of plank<strong>to</strong>n (treatments)<br />

• sampled <strong>in</strong> 12 hauls (blocks)<br />

• compare analyses for<br />

untransformed and log10<br />

transformed numbers


Save the means


Backtransform and pr<strong>in</strong>t


Practical 4.6<br />

• spreadsheet W<strong>in</strong>e.gsh<br />

conta<strong>in</strong>s results from an<br />

experiment <strong>to</strong> assess the %<br />

alcohol of w<strong>in</strong>e<br />

• 5 types of w<strong>in</strong>e A-E<br />

• 3 bottles of each type were<br />

tested <strong>in</strong> a random order<br />

• analyse the percentages &<br />

plot residuals aga<strong>in</strong>st fitted<br />

values<br />

• transform the percentages<br />

us<strong>in</strong>g a logit transformation,<br />

re-analyse the data & replot<br />

residuals aga<strong>in</strong>st fitted values


Permutation tests<br />

• if the distributional assumptions are not satisfied, you<br />

might use a random permutation test as an alternative<br />

way <strong>to</strong> assess the significance of the terms <strong>in</strong> the analysis<br />

• model must still be additive for results <strong>to</strong> be mean<strong>in</strong>gful<br />

• but residuals need no longer follow Normal distributions with equal<br />

variances<br />

• click on Permutation Test <strong>in</strong> ANOVA Further Output menu<br />

<strong>to</strong> open ANOVA Permutation Test menu<br />

• specify Number of permutations<br />

• select Seed (0 au<strong>to</strong>matic)<br />

• click on Run<br />

• probability for each treatment<br />

term is now determ<strong>in</strong>ed from its<br />

distribution over the randomly<br />

permuted data sets


Practical 4.8<br />

• spreadsheet W<strong>in</strong>e.gsh<br />

conta<strong>in</strong>s results from an<br />

experiment <strong>to</strong> assess the %<br />

alcohol of w<strong>in</strong>e used <strong>in</strong><br />

Practical 4.6<br />

• 5 types of w<strong>in</strong>e A-E<br />

• 3 bottles of each type were<br />

tested <strong>in</strong> a random order<br />

• analyse the percentages &<br />

plot residuals aga<strong>in</strong>st fitted<br />

values<br />

• assess the differences<br />

between the types us<strong>in</strong>g a<br />

permutation test

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!