UNCORRECTED PROOF
UNCORRECTED PROOF
UNCORRECTED PROOF
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4 J. Brynjarsdóttir, G. Stefánsson / Fisheries Research xxx (2004) xxx–xxx<br />
189<br />
190<br />
191<br />
192<br />
193<br />
194<br />
195<br />
196<br />
197<br />
198<br />
199<br />
200<br />
201<br />
202<br />
203<br />
204<br />
205<br />
206<br />
207<br />
208<br />
209<br />
210<br />
211<br />
212<br />
213<br />
214<br />
215<br />
216<br />
217<br />
218<br />
219<br />
220<br />
221<br />
222<br />
223<br />
224<br />
225<br />
226<br />
227<br />
228<br />
229<br />
230<br />
231<br />
232<br />
233<br />
3. Methods<br />
The gamma and log-normal distributions share<br />
some characteristics which often make it difficult to<br />
choose between them. Both distributions have a positive<br />
probability mass only for positive values and can<br />
describe data sets for which the majority of the probability<br />
mass is at low values but there is a heavy tail to<br />
the right. They also share the same relationship between<br />
the mean and variance, i.e. the variance function:<br />
var(Y) = φE(Y) 2 (1)<br />
where φ is a constant. This relationship differs from that<br />
for other distributions such as the normal, Poisson and<br />
the negative binomial distribution and can therefore be<br />
used to distinguish these two distributions from others.<br />
A common approach to check this relationship is to examine<br />
a plot of log(sample variance) versus log(sample<br />
mean) for homogeneous groups of data, see, for example,<br />
McCullagh and Nelder (1989, p. 306). If the points<br />
lie on a straight line with the slope close to 2, the gamma<br />
and log-normal distributions with fixed scale parameters<br />
can not be rejected as the true underlying distribution.<br />
Such an investigation cannot, however, distinguish<br />
between these two distributions. Data in one subrectangle<br />
and 1 year from the Icelandic groundfish survey<br />
can be thought to be realizations of i.i.d. variables<br />
since the environmental conditions are fairly homogeneous<br />
within a sub-rectangle. The drawback is that<br />
there are few observations for each sub-rectangle; the<br />
highest number is 7 observations, resulting in high uncertainty<br />
of the estimated means and variances for the<br />
sub-rectangles. The statistical rectangles which have<br />
up to 16 observations per rectangle are therefore also<br />
considered, but since they are four times the size of the<br />
sub-rectangles, the assumption of homogeneity is not<br />
as reliable.<br />
A goodness-of-fit test with help of a generalized<br />
linear model is used to distinguish between the two<br />
proposed probability distributions, the gamma and<br />
log-normal distributions. Following the approach of<br />
Stefánsson and Pálsson (1997) and Stefánsson (1988),<br />
this was done by scaling the observations with the<br />
fitted values from a GLM and then performing a<br />
Kolmogorov–Smirnov test on the scaled data. Let Y yji<br />
be a random variable that represents the number of cod<br />
caught in year y, sub-rectangle j and tow i. It is assumed<br />
that either 234<br />
( r, µyj<br />
)<br />
Y yji ∼ G or 235<br />
r<br />
Z yji = log(Y yji ) ∼ N(a yj ,b 2 ) (2) 236<br />
where N(a, b 2 ) is the normal distribution with mean a 237<br />
and variance b 2 and G(r, µ/r) is the gamma distribu- 238<br />
tion with mean µ, variance µ 2 /r and density function 239<br />
f (y) = yr−1 e −yr/µ<br />
(µ/r) r , y > 0 (3) 240<br />
Γ (r)<br />
where Γ is the Gamma function, Γ (r) = 241<br />
∫ ∞<br />
0<br />
x r−1 e −x dx. The effects of sub-rectangles 242<br />
and years are assumed to be multiplicative on the 243<br />
original scale of number of cod and hence additive 244<br />
on the log scale. This leads to the log link if Y yji is 245<br />
gamma distributed and the identity link if log(Y yji )is 246<br />
normally distributed. We fit the models: 247<br />
log(µ yj ) = β 0 + α y + β j + γ yj and 248<br />
a yj = β 0 + α y + β j + γ yj (4) 249<br />
where β 0 is the grand mean, α y is the year effect, β j is 250<br />
the spatial effect of sub-rectangles and γ yj is the inter- 251<br />
action. The error is assumed to be gamma distributed, 252<br />
G(1,1/r), in the first model but normally distributed, 253<br />
N(0,b 2 ), in the second. For a fixed year y, the models 254<br />
become: 255<br />
log(µ yj ) = β 0 + β j and a yj = β 0 + β j (5) 256<br />
The goodness-of-fit test is based on the follow- 257<br />
ing: Firstly, a known fact is that if X ∼ G(r, µ/r) 258<br />
then X/µ ∼ G(r, 1/r). If µ yj and r were known 259<br />
we could test whether Y yji /µ yj ∼ G(r, 1/r) using the 260<br />
Kolmogorov–Smirnov test. This is done here by as- 261<br />
suming that the fitted values ˆµ yj and the estimated dis- 262<br />
persion parameter 1/ˆr obtained from model (4), with 263<br />
gamma distributed errors, are the true parameters. Sec- 264<br />
ondly, another known fact is that if X ∼ N(a, b 2 ) then 265<br />
e X−a ∼ LN(0,b 2 ). If the true parameters were known 266<br />
this could be tested via the Kolmogorov–Smirnov test. 267<br />
This is done here by assuming that the fitted values 268<br />
â yj and the estimated dispersion parameter ˆb 2 obtained 269<br />
from the model (4), with normally distributed errors, 270<br />
are the true parameters. The D n Kolmogorov test statis- 271<br />
tic measures the distance from the empirical and hy- 272<br />
pothesized distributions so we compare these test statis- 273<br />
tics to see which distribution better represents the data. 274<br />
NCORRECTED <strong>PROOF</strong><br />
FISH 1762 1–14