11.01.2014 Views

UNCORRECTED PROOF

UNCORRECTED PROOF

UNCORRECTED PROOF

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4 J. Brynjarsdóttir, G. Stefánsson / Fisheries Research xxx (2004) xxx–xxx<br />

189<br />

190<br />

191<br />

192<br />

193<br />

194<br />

195<br />

196<br />

197<br />

198<br />

199<br />

200<br />

201<br />

202<br />

203<br />

204<br />

205<br />

206<br />

207<br />

208<br />

209<br />

210<br />

211<br />

212<br />

213<br />

214<br />

215<br />

216<br />

217<br />

218<br />

219<br />

220<br />

221<br />

222<br />

223<br />

224<br />

225<br />

226<br />

227<br />

228<br />

229<br />

230<br />

231<br />

232<br />

233<br />

3. Methods<br />

The gamma and log-normal distributions share<br />

some characteristics which often make it difficult to<br />

choose between them. Both distributions have a positive<br />

probability mass only for positive values and can<br />

describe data sets for which the majority of the probability<br />

mass is at low values but there is a heavy tail to<br />

the right. They also share the same relationship between<br />

the mean and variance, i.e. the variance function:<br />

var(Y) = φE(Y) 2 (1)<br />

where φ is a constant. This relationship differs from that<br />

for other distributions such as the normal, Poisson and<br />

the negative binomial distribution and can therefore be<br />

used to distinguish these two distributions from others.<br />

A common approach to check this relationship is to examine<br />

a plot of log(sample variance) versus log(sample<br />

mean) for homogeneous groups of data, see, for example,<br />

McCullagh and Nelder (1989, p. 306). If the points<br />

lie on a straight line with the slope close to 2, the gamma<br />

and log-normal distributions with fixed scale parameters<br />

can not be rejected as the true underlying distribution.<br />

Such an investigation cannot, however, distinguish<br />

between these two distributions. Data in one subrectangle<br />

and 1 year from the Icelandic groundfish survey<br />

can be thought to be realizations of i.i.d. variables<br />

since the environmental conditions are fairly homogeneous<br />

within a sub-rectangle. The drawback is that<br />

there are few observations for each sub-rectangle; the<br />

highest number is 7 observations, resulting in high uncertainty<br />

of the estimated means and variances for the<br />

sub-rectangles. The statistical rectangles which have<br />

up to 16 observations per rectangle are therefore also<br />

considered, but since they are four times the size of the<br />

sub-rectangles, the assumption of homogeneity is not<br />

as reliable.<br />

A goodness-of-fit test with help of a generalized<br />

linear model is used to distinguish between the two<br />

proposed probability distributions, the gamma and<br />

log-normal distributions. Following the approach of<br />

Stefánsson and Pálsson (1997) and Stefánsson (1988),<br />

this was done by scaling the observations with the<br />

fitted values from a GLM and then performing a<br />

Kolmogorov–Smirnov test on the scaled data. Let Y yji<br />

be a random variable that represents the number of cod<br />

caught in year y, sub-rectangle j and tow i. It is assumed<br />

that either 234<br />

( r, µyj<br />

)<br />

Y yji ∼ G or 235<br />

r<br />

Z yji = log(Y yji ) ∼ N(a yj ,b 2 ) (2) 236<br />

where N(a, b 2 ) is the normal distribution with mean a 237<br />

and variance b 2 and G(r, µ/r) is the gamma distribu- 238<br />

tion with mean µ, variance µ 2 /r and density function 239<br />

f (y) = yr−1 e −yr/µ<br />

(µ/r) r , y > 0 (3) 240<br />

Γ (r)<br />

where Γ is the Gamma function, Γ (r) = 241<br />

∫ ∞<br />

0<br />

x r−1 e −x dx. The effects of sub-rectangles 242<br />

and years are assumed to be multiplicative on the 243<br />

original scale of number of cod and hence additive 244<br />

on the log scale. This leads to the log link if Y yji is 245<br />

gamma distributed and the identity link if log(Y yji )is 246<br />

normally distributed. We fit the models: 247<br />

log(µ yj ) = β 0 + α y + β j + γ yj and 248<br />

a yj = β 0 + α y + β j + γ yj (4) 249<br />

where β 0 is the grand mean, α y is the year effect, β j is 250<br />

the spatial effect of sub-rectangles and γ yj is the inter- 251<br />

action. The error is assumed to be gamma distributed, 252<br />

G(1,1/r), in the first model but normally distributed, 253<br />

N(0,b 2 ), in the second. For a fixed year y, the models 254<br />

become: 255<br />

log(µ yj ) = β 0 + β j and a yj = β 0 + β j (5) 256<br />

The goodness-of-fit test is based on the follow- 257<br />

ing: Firstly, a known fact is that if X ∼ G(r, µ/r) 258<br />

then X/µ ∼ G(r, 1/r). If µ yj and r were known 259<br />

we could test whether Y yji /µ yj ∼ G(r, 1/r) using the 260<br />

Kolmogorov–Smirnov test. This is done here by as- 261<br />

suming that the fitted values ˆµ yj and the estimated dis- 262<br />

persion parameter 1/ˆr obtained from model (4), with 263<br />

gamma distributed errors, are the true parameters. Sec- 264<br />

ondly, another known fact is that if X ∼ N(a, b 2 ) then 265<br />

e X−a ∼ LN(0,b 2 ). If the true parameters were known 266<br />

this could be tested via the Kolmogorov–Smirnov test. 267<br />

This is done here by assuming that the fitted values 268<br />

â yj and the estimated dispersion parameter ˆb 2 obtained 269<br />

from the model (4), with normally distributed errors, 270<br />

are the true parameters. The D n Kolmogorov test statis- 271<br />

tic measures the distance from the empirical and hy- 272<br />

pothesized distributions so we compare these test statis- 273<br />

tics to see which distribution better represents the data. 274<br />

NCORRECTED <strong>PROOF</strong><br />

FISH 1762 1–14

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!