25.12.2013 Views

the cynipoid genus paramblynotus - American Museum of Natural ...

the cynipoid genus paramblynotus - American Museum of Natural ...

the cynipoid genus paramblynotus - American Museum of Natural ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

10 BULLETIN AMERICAN MUSEUM OF NATURAL HISTORY NO. 304<br />

<strong>of</strong> component distributions. The form <strong>of</strong> each<br />

component contribution can be described by<br />

an equation, <strong>the</strong> component density function.<br />

Likewise, <strong>the</strong> form <strong>of</strong> <strong>the</strong> mixture can be<br />

described by a mixture density function. The<br />

number <strong>of</strong> component populations (hence <strong>the</strong><br />

number <strong>of</strong> codes) is found by fitting mixture<br />

density functions with various numbers <strong>of</strong><br />

component distributions to <strong>the</strong> observed<br />

dataset. The FMC procedure <strong>of</strong> Strait et al.<br />

(1996) has three steps. First, using likelihood<br />

estimation methods, several mixture density<br />

functions are fitted to <strong>the</strong> dataset. Specific<br />

parameters (mixing proportion, mean, and<br />

variance) are estimated for each <strong>of</strong> <strong>the</strong><br />

component distributions in a mixture. The<br />

component distributions are assumed to be<br />

normal for species means (Strait et al., 1996).<br />

The best parameters are those that maximize<br />

<strong>the</strong> likelihood statistic, L. The number <strong>of</strong><br />

mixture models to be examined is specified by<br />

<strong>the</strong> researcher. In <strong>the</strong> second step, <strong>the</strong><br />

mixture that provides <strong>the</strong> best fit with <strong>the</strong><br />

dataset is identified using <strong>the</strong> Akaike information<br />

index (AIC) (Akaike, 1974). The<br />

mixture model with <strong>the</strong> lowest AIC value<br />

describes <strong>the</strong> dataset best and is thus preferred.<br />

Finally, individual species are assigned<br />

codes by calculating <strong>the</strong> probability<br />

<strong>of</strong> a species mean being drawn from any<br />

given component distribution in <strong>the</strong> optimal<br />

mixture model.<br />

Finite mixture modeling is usually computationally<br />

expensive. Some programs are<br />

available for likelihood estimation <strong>of</strong> finite<br />

mixture models, as required by <strong>the</strong> FMC<br />

method (Pearson et al., 1992, Strait et al.,<br />

1996). These programs, however, are expensive<br />

and limited to <strong>the</strong> PC platform. Although<br />

program source code is also available,<br />

most <strong>of</strong> <strong>the</strong> people likely to use <strong>the</strong> method<br />

will not find it comfortable to use. Therefore,<br />

we took a different approach, using <strong>the</strong><br />

FMCK method. Though applying finite<br />

mixture analysis and likelihood estimation<br />

as in FMC, <strong>the</strong> FMCK method approaches<br />

<strong>the</strong> goal from a different direction by means<br />

<strong>of</strong> k-mean cluster analysis for a priori statistical<br />

modeling <strong>of</strong> component distributions.<br />

This modification enables FMCK to be<br />

implemented using readily available statistic<br />

programs with k-mean cluster analysis, such<br />

as STATISTICA, MINITAB, or SYSTAT,<br />

available on both PC and Macintosh platforms.<br />

Computationally, both k-mean cluster<br />

analysis and maximum likelihood estimation<br />

for finite mixture analysis are iterative<br />

procedures that attempt to classify observations<br />

into groups that are as distinct as<br />

possible. Maximum likelihood estimation<br />

classifies observations into groups so that<br />

<strong>the</strong> likelihood statistic L <strong>of</strong> <strong>the</strong> mixture model<br />

under investigation is maximized. The k-<br />

mean cluster analysis is like an analysis <strong>of</strong><br />

variance (ANOVA) ‘‘in reverse’’. The procedure<br />

starts with k random groups (clusters)<br />

as arbitrarily set by <strong>the</strong> researcher and <strong>the</strong>n<br />

moves objects between those groups with <strong>the</strong><br />

goal <strong>of</strong> (1) minimizing variability within<br />

groups and (2) maximizing variability between<br />

groups (Stats<strong>of</strong>t, 1995). Since <strong>the</strong><br />

likelihood value L is needed to calculate <strong>the</strong><br />

AIC, <strong>the</strong> L value <strong>of</strong> <strong>the</strong> mixture model under<br />

investigation is calculated using <strong>the</strong> resulting<br />

group parameters after <strong>the</strong> cluster analysis is<br />

completed. The resulting L value should be<br />

<strong>the</strong> same or very close to what is expected<br />

from maximum likelihood analysis. Empirical<br />

comparisons over a range <strong>of</strong> datasets<br />

showed that FMC and FMCK analysis give<br />

very similar results (Strait, personal commun.).<br />

In <strong>the</strong> following, we provide a brief description<br />

<strong>of</strong> <strong>the</strong> FMCK. For references on<br />

statistical details, Akaike (1974), Everit and<br />

Hand (1981), Everit (1985), McLachlan and<br />

Basford (1988), and, in particular, Pearson et<br />

al. (1992) and Strait et al. (1996) should be<br />

consulted.<br />

First, <strong>the</strong> measurement ratios were transformed<br />

into logarithmic values. The values<br />

for each character were <strong>the</strong>n divided into one,<br />

two, three, and four groups using k-mean<br />

cluster analysis as implemented in <strong>the</strong> Cluster<br />

Analysis module <strong>of</strong> STATISTICA version<br />

5.1. The distribution parameters (i.e., mixing<br />

proportion, mean, and variance) were calculated<br />

for each component distribution<br />

(group) <strong>of</strong> each mixture model (e.g., fourcomponent<br />

mixture).<br />

Small programs were written in STATIS-<br />

TICA BASIC language to calculate <strong>the</strong><br />

density function <strong>of</strong> each <strong>of</strong> <strong>the</strong> mixture<br />

models <strong>of</strong> a particular dataset (character<br />

measurements) using <strong>the</strong> equations provided

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!