26.12.2013 Views

Estimators based in adaptively trimming cells in the mixture model

Estimators based in adaptively trimming cells in the mixture model

Estimators based in adaptively trimming cells in the mixture model

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>in</strong>volves <strong>the</strong> use of a previous estimation <strong>based</strong> on kernel density estimation, <strong>the</strong> choice of an adequate<br />

discrepancy measure and <strong>the</strong> use of <strong>the</strong> method of moment estimates on multiple bootstrap subsamples<br />

to identify <strong>the</strong> important data substructures. The paper also provides a stopp<strong>in</strong>g rule to term<strong>in</strong>ate<br />

<strong>the</strong> weighted likelihood algorithm. In <strong>the</strong> analysis, with two normal components <strong>in</strong> <strong>the</strong> <strong>mixture</strong>, <strong>the</strong><br />

weighted likelihood method produced <strong>the</strong> values (0.804, 71.643, 95.287, 4.856) for (π, µ 1 , µ 2 , σ) while<br />

<strong>the</strong> non-robust maximum likelihood method (through <strong>the</strong> EM algorithm) gave (0.799, 72.243, 100.163,<br />

6.295). Moreover <strong>in</strong> two cases from <strong>the</strong> 100 <strong>in</strong>itial bootstrap root searches, a root correspond<strong>in</strong>g to <strong>the</strong><br />

third component was identified. By fitt<strong>in</strong>g a <strong>mixture</strong> <strong>based</strong> on three normal distributions <strong>the</strong> maximum<br />

likelihood estimator gave <strong>the</strong> estimate (0.764, 0.163, 71.557, 92.830, 110.514, 4.540), while <strong>the</strong> weighted<br />

likelihood procedure gave <strong>the</strong> values (0.768, 0.156, 71.533, 92.265, 108.554, 4.38). Markatou also reported<br />

that, <strong>in</strong> agreement with Jorgensen, <strong>the</strong> only value which obta<strong>in</strong>ed a remarkable low weight <strong>in</strong> <strong>the</strong> weighted<br />

likelihood equation was that of <strong>the</strong> scallop of lenght 126 mm.<br />

With respect to our method we must remark that <strong>the</strong> scarce presence of some cohortes <strong>in</strong> <strong>the</strong> data<br />

avoids <strong>the</strong> use of high <strong>in</strong>itial trimm<strong>in</strong>g levels. This would completely elim<strong>in</strong>ate <strong>the</strong> presence of some<br />

sub-populations <strong>in</strong> <strong>the</strong> sample. In fact <strong>the</strong> first analysis should be to answer what is <strong>the</strong> m<strong>in</strong>imum size of<br />

a subsample to be considered as a subjacent population <strong>in</strong> <strong>the</strong> data or simply as contam<strong>in</strong>ation. Tak<strong>in</strong>g<br />

<strong>in</strong>to account <strong>the</strong> observation of Jorgensen about <strong>the</strong> heterogeneity with<strong>in</strong> a cohort <strong>in</strong> our op<strong>in</strong>ion <strong>the</strong>re<br />

are two possibilities for <strong>the</strong> analysis:<br />

• To fit a <strong>mixture</strong> with three components and same standard deviation and a f<strong>in</strong>al small trimm<strong>in</strong>g<br />

size. In this way we want to ellim<strong>in</strong>ate only <strong>the</strong> troublesome data that could be not well expla<strong>in</strong>ed<br />

by <strong>the</strong> <strong>mixture</strong>. This f<strong>in</strong>al trimm<strong>in</strong>g size permits to manta<strong>in</strong> <strong>the</strong> assumption on equal variances for<br />

<strong>the</strong> components that, <strong>in</strong> o<strong>the</strong>r case, could not be justified tak<strong>in</strong>g <strong>in</strong>to account that <strong>the</strong> spread tail<br />

of larger animals could be constituted by three or more cohorts of scallops.<br />

• To fit a <strong>mixture</strong> with two components and same standard deviation with a f<strong>in</strong>al moderate trimm<strong>in</strong>g<br />

size. This would reduce <strong>the</strong> contam<strong>in</strong>ation effects of a third (and may be more) component <strong>in</strong> <strong>the</strong><br />

estimation, giv<strong>in</strong>g more precission to <strong>the</strong> estimates of <strong>the</strong> parameters of <strong>the</strong> two ma<strong>in</strong> components<br />

<strong>in</strong> <strong>the</strong> sample.<br />

The graphics <strong>in</strong> Figure 5 show <strong>in</strong> a comparative way <strong>the</strong> estimated weighted densities correspond<strong>in</strong>g<br />

to every component <strong>in</strong> <strong>the</strong> <strong>mixture</strong>, accord<strong>in</strong>g to <strong>the</strong> estimation method employed and <strong>the</strong> number of<br />

components assumed.<br />

For <strong>the</strong> estimation <strong>in</strong> <strong>the</strong> three components setup we used an <strong>in</strong>itial trimm<strong>in</strong>g size of 2%, that produced<br />

an <strong>in</strong>itial 3-cell given by <strong>the</strong> <strong>in</strong>terval (61.9526, 113), thus <strong>the</strong> trimm<strong>in</strong>g data were <strong>the</strong> four data with greater<br />

lenght. The f<strong>in</strong>al trimm<strong>in</strong>g size was 0.5% and <strong>the</strong> only discarded value was 126, which agrees with its<br />

more <strong>in</strong>fluential character <strong>in</strong> Jorgensen and Markatou reports. Our f<strong>in</strong>al estimation was (0.7672, 0.1573,<br />

71.4723, 92.3579, 108.7987, 4.4640).<br />

In <strong>the</strong> two components setup we used 20% as <strong>the</strong> <strong>in</strong>itial trimm<strong>in</strong>g size, lead<strong>in</strong>g to <strong>the</strong> 2-cell given<br />

by [65, 77.3221)∪(85.0113, 97.3335). The f<strong>in</strong>al trimm<strong>in</strong>g size was 10%, which produced <strong>the</strong> f<strong>in</strong>al 2-cell:<br />

(62.1197, 80,5475)∪(83.9271, 102.3549) and <strong>the</strong> estimation (0.8229, 71.3183, 93.1549, 5.2690).<br />

As a remarkable (and dist<strong>in</strong>guishible) fact we want to po<strong>in</strong>t out that under both <strong>model</strong>s our method<br />

produced coherent estimations for <strong>the</strong> two ma<strong>in</strong> components <strong>in</strong> <strong>the</strong> <strong>mixture</strong>. This would be also coherent<br />

12

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!