09.07.2015 Views

Robust Estimation for Zero-Inflated Poisson Regression - Franklin ...

Robust Estimation for Zero-Inflated Poisson Regression - Franklin ...

Robust Estimation for Zero-Inflated Poisson Regression - Franklin ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

8 D. B. Hall and J. Shen Scand J StatistAll simulations involve data generated from a model of the <strong>for</strong>m{ 0, with probability pi ,Y i ∼<strong>Poisson</strong>(μ i ), with probability 1 − p i ,where μ i= exp(β 1 x 1i + β 2 x 2i + β 3 x 3i + β 4 x 4i ), and{ p, in simulation study 1,p i =logit −1 (γ 1 x 1i + γ 2 x 2i + γ 3 x 3i ), in simulation studies 2 and 3.Covariates here are x 1i = I(i ≤ n ), x 2 2i = I(i > n ), x 2 3i ∼ U(0, 1) and x 4i ∼ N(0, 1). The data weregenerated under various settings of the model parameters γ and β, chosen to correspondto low versus high levels of ZI combined with low versus high levels of separation betweenthe mixture components. For every parameter setting, 500 data sets were generated, withoutliers added depending on the data contamination scheme under consideration. Bias, meansquare error (MSE) and empirical size of a nominally 0.05-level Wald test <strong>for</strong> equality withthe true value were calculated <strong>for</strong> each model parameter. In addition, we provided the MSE∑<strong>for</strong> ζ ≡ 1 nn i = 1 (1 − p i)μ i , the average marginal mean according to the model. In all threesimulation studies the tuning quantile c was set to 0.01.4.1. Study 1: ZIP regression with constant p and outliers in yIn study 1, we compare ML, MHD and RES estimation methods <strong>for</strong> ZIP data with constantp with and without contamination in y. In the contaminated scenario, 5 per cent ofthe response y in each data set were randomly selected to be replaced by y + 15. Truevalues of p and β were specified as listed in Table 2 and were chosen to make the nondegeneratecomponent’s mean large (μ ranging between 2.78 and 20 over the values of thecovariate vector x i = (x 1i , ..., x 4i ) T ) and to give a moderate level of ZI (20 per cent). Resultsappear in Table 2.Generally speaking, these results favour the RES approach over both the ML and MHDestimations. In the absence of contamination RES per<strong>for</strong>ms slightly worse than ML <strong>for</strong>n = 100 and essentially the same <strong>for</strong> the larger sample size. The MHD approach is not competitivein this setting. When contamination was present, there are a few cases <strong>for</strong> which theRES estimators exhibited greater bias than those of the MHD approach, but RES bias wasgenerally lower than that of ML and, with few exceptions, the MSE was much smaller <strong>for</strong>RES than either MHD or ML estimation. In addition, the size of the Wald tests was muchmore severely altered by the presence of outliers under the ML estimation than the MHD andRES approaches. It should be kept in mind that the degree of contamination here is fairlyextreme. Both the proportion (5 per cent) and magnitude (y + 15) of outliers here are quitelarge. Under these extreme circumstances, the Wald tests under RES and MHD estimationper<strong>for</strong>m reasonably well, and seem to retain some value as inferential tools. In contrast, thetests under ML estimation have been completely undermined.To examine the effect of more moderate degrees of contamination, we ran simulations similarin design to these but with 5 per cent of the responses increased by 7 rather than 15. Theresults from those simulations are similar to those from the bottom half of Table 2, withsmaller but still quite substantial improvements in bias and MSE achieved using the RESmethod. Because these results are, as one might expect, intermediate to those in the top andbottom halves of Table 2, we do not report them in detail here <strong>for</strong> the sake of brevity.The per<strong>for</strong>mance of MHD estimation relative to ML is somewhat surprising especially<strong>for</strong> n = 100. In simulation results covering the case in which the mixture components arepoorly separated (not reported here but see Shen, 2006), we have observed reductions in MSE<strong>for</strong> MHD relative to ML comparable with the gains exhibited by RES estimation. In this© 2009 Board of the Foundation of the Scandinavian Journal of Statistics.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!