Final Technical Report: - Southwest Fisheries Science Center - NOAA
Final Technical Report: - Southwest Fisheries Science Center - NOAA
Final Technical Report: - Southwest Fisheries Science Center - NOAA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
4.2 Modeling Framework : GLM and GAM<br />
4.2.1 Comparisons of GAM Algorithms<br />
During the comparison of GAM algorithms, we found a bug in the step.gam function<br />
from the R package gam code that previously had not been reported to the R mailing lists, and<br />
that was unknown to the package developer (pers comm. with Hastie). The bug prevented<br />
step.gam from including the offset term for survey effort in any encounter rate model that was<br />
examined during the stepwise search. As a result, we only modeled group size (and not<br />
encounter rates) using the step.gam algorithm from R package gam.<br />
The group size GAMs built using the S-PLUS and R package gam algorithms were<br />
essentially identical: the best models contained the exact same predictor variables and associated<br />
degrees of freedom, and the parameterization of the smoothing splines were identical, except for<br />
small differences that were likely due to the precision of the software platforms.<br />
GAMs built using R package mgcv were more variable. The mgcv gam algorithm allows<br />
users to adjust more parameters and settings to build the models compared to the S-PLUS<br />
analogue. To the knowledgeable user, this flexibility enables fine-tuning of the GAMs. On the<br />
other hand, having numerous adjustable arguments makes the algorithm less user-friendly<br />
because a greater investment of time must be spent to learn how to build appropriate models.<br />
Tables 10 and 11 show the range of encounter rate and group size models, respectively,<br />
selected as the final model by mgcv gam given the specified combination of settings for the<br />
gam.method, smoothing spline, and gamma arguments. The paired models for each<br />
species/response variable that are provided in these tables were chosen based on the sum of the<br />
absolute value of the deviation of the observed-to-predicted ratios of the response variable in the<br />
geographic strata shown in Figure 7. The “simple models” in Tables 10 and 11 represent the<br />
models having relatively few effective degrees of freedom and the smallest sum of absolute<br />
deviations of the observed-to-predicted ratios. Similarly, the “complex models” represent those<br />
having a relatively large number of effective degrees of freedom in addition to good agreement<br />
between observed and predicted values of the response variable. For cases in which a single<br />
model clearly outperformed all of the others, only one model is presented in the table.<br />
The variability in model complexity can be illustrated using the rough-toothed dolphin<br />
encounter rate models, where the preferred simple model had 8.9 degrees of freedom and the<br />
preferred complex model had over fifty degrees of freedom. The sum of absolute deviations of<br />
the observed-to-predicted ratios is smaller for the complex model. This is to be expected<br />
because the data used for predictions were also used to build the models; in this scenario, a<br />
complex model is more likely to exhibit fidelity to the data.<br />
48