02.11.2014 Views

estimating a parameter in incidental and structural models - CiteSeerX

estimating a parameter in incidental and structural models - CiteSeerX

estimating a parameter in incidental and structural models - CiteSeerX

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

-<br />

ESTIMATING A PARAMETER IN INCIDENTAL AND STRUCTURAL MODELS<br />

BY APPROXIMATE MAXIMUM LIKELIHOOD<br />

by<br />

Aad van der Vaart<br />

TECHNICAL REPORT No. 139<br />

October 1987 (revisedJuly 1988)<br />

Department or Statistics, GN~22<br />

University of Wash<strong>in</strong>gton<br />

Seattle, Wash<strong>in</strong>gton 98195 USA


-<br />

Estimat<strong>in</strong>g a <strong>parameter</strong> <strong>in</strong><br />

<strong>in</strong>cidental <strong>and</strong> <strong>structural</strong> <strong>models</strong><br />

by approximate maximum likelihood<br />

Aad van der Vaart 1<br />

Free University Amsterdam<br />

October 1987, revised July 1988<br />

Let Xl, X 2 1 ••• be <strong>in</strong>dependent r<strong>and</strong>om vectors, where X j has density<br />

Jh,(z) 9'(~'(Z),s) d~j(z). In the <strong>structural</strong> version ofthe model<br />

"1j = .,., is a. fixed unknown probability distribution, while <strong>in</strong> the functional<br />

model 11j is a unit mass <strong>in</strong> Z jJ where {z j} is an unknown sequence<br />

of vectors. It is proposed to estimate () by a. one-step estimator,<br />

based on a MLE for n -1 :Ej=l"1jo The construction is illustrated<br />

<strong>in</strong> the paired exponential model, the errors-<strong>in</strong>-variables model <strong>and</strong> a<br />

scale mixture over a normal family.<br />

AMS 1980 subject classifications: 62F12, 62G05. .<br />

Keywords <strong>and</strong> phrases: Structural model, <strong>in</strong>cidental <strong>parameter</strong>s, functional model,<br />

mixture model, asymptotic efficiency.<br />

1 Research partially supported by the Netherl<strong>and</strong>s Organization for the Advancement<br />

of Pnre Research (Z.W.O.) <strong>and</strong> ONR Contract N00014-80-C-0163.<br />

1


1. Introduction.<br />

Let e be an open subset of lRA: <strong>and</strong> 1-£ a collection of probability measures on a<br />

measurable space (Z,C). Given I) E e let 1/18 be a measurable map between measurable<br />

spaces (X, B) <strong>and</strong> (Y, A). Furthermore, lor each (8, z) E e x Z let p,(-,z) be a probability<br />

density with respect to a zr-f<strong>in</strong>ite measure J.L on (X,B), hav<strong>in</strong>g the form<br />

(1.1) p,(r,z) = h,(r)g,(",,(r),z),<br />

lor measurable map. h, <strong>and</strong> g, (from (X, B) <strong>in</strong>to !R<strong>and</strong> (YxZ,AxC) <strong>in</strong>to lit, respectively).<br />

Suppose that X is a r<strong>and</strong>om element with density<br />

(1.2) p",(r) = Jp,(r,z) d~(z),<br />

where the mix<strong>in</strong>g distribution '7 is an element of 1-£. Then, by the factorization theorem,<br />

1/18(X) is a sufficient statistic for '7 E 1-£, if I) is fixed. It is assumed that<br />

(1.3) g",(y) = Jg,(y,z)d~(z)<br />

is the density of ",,(X) with respect to a zr-f<strong>in</strong>ite measure v on (Y,A).<br />

This paper is concerned with <strong>estimat<strong>in</strong>g</strong> I) on the basis of the first n elements of a<br />

sequence of <strong>in</strong>dependent r<strong>and</strong>om elements X 1,X2 , •••, where Xj has density P9,1/i' {'7j}<br />

be<strong>in</strong>g an unknown sequence <strong>in</strong> 1-£. There are two versions of this model. The <strong>structural</strong><br />

model (or mixture model) is simply the i.i.d. version, where every '7j is equal to one fixed,<br />

but unknown '7. The <strong>in</strong>cidental model (or functional model) has '7i equal to a unit mass <strong>in</strong><br />

zh {Zj} be<strong>in</strong>g an unknown sequence <strong>in</strong> Z. If 1-£ conta<strong>in</strong>s the unit masses, our formula.tion<br />

<strong>in</strong>cludes both.<br />

Suppose that the score function for I), 19,1/(Z) = 'V9Iogp9,1/(z) is well-def<strong>in</strong>ed <strong>and</strong> set<br />

(1.4) 1",(r) = i",(r) - E,(i",(X) I"',(X) = r)<br />

- - ~<br />

(1.5) I", = E", l",l",(X).<br />

We propose the follow<strong>in</strong>g estimator for I). Let 71n(l)) be a (restricted) maximum likelihood<br />

estimator <strong>in</strong> the <strong>structural</strong> version of the model, when I) is given. Thus 71n(l))<br />

satisfies<br />

• n<br />

(1.6) 11P8,~.(8)(Xj) = ''P IIp,,,(Xj),<br />

j=l<br />

fJE1-£n j=l<br />

for a given (possibly data-dependent) subset fln of 1-£. Next given a prelim<strong>in</strong>ary estimator<br />

8 n for I), let Tn be its one-step 'improvement'<br />

(1.7) Tn = O. + (t<strong>in</strong> -(n)lo ,; (0 )(Xj»)-, t<strong>in</strong> _(n )(Xj).<br />

. Un,f/n Un n,",n n . Un,f/n Un<br />

J=1 J=1<br />

2


For discretized <strong>and</strong> ..;;i'-consistent 8", it will be shown that<br />

(1.8)<br />

where ijn = n- 1 L:j=t '7j' The necessary regularity conditions <strong>and</strong> the choice of the sieves<br />

are discussed <strong>in</strong> detail for three examples.<br />

For the <strong>structural</strong> version ofthe model (1.8) typically implies that Tn is asymptotically<br />

efficient <strong>in</strong> the sense of semi-parametric model theory as <strong>in</strong>troduced <strong>in</strong> Begun, Hall, Huang<br />

<strong>and</strong> Wellner (1983). Indeed, there typically exists a one-dimensional submodel t -+ P 9+th,lJr<br />

such that<br />

for every h E nt' [cf, L<strong>in</strong>dsay (1983b), Pfanzagl <strong>and</strong> Wefelmeyer (1982, Chapter 14), van<br />

der Va.a.rt (1988a». As a consequence T« is efficient <strong>in</strong> the sense of Hajek (1970, 1972) <strong>in</strong><br />

a one-dimensional submodel, <strong>and</strong> hence efficient <strong>in</strong> the whole model.<br />

For the <strong>in</strong>cidental version of the model, or more generally the model as <strong>in</strong>troduced<br />

above, there exists no satisfactory theory of asymptotic efficiency, although steps towards<br />

establish<strong>in</strong>g such a theory have been taken <strong>in</strong> Hasm<strong>in</strong>skii <strong>and</strong> Nussbaum (1984), Nussbaum<br />

(1984) <strong>and</strong> Bickel <strong>and</strong> Klaassen (1986). In tbis case tbe estimator <strong>in</strong> (1.8) is asymptotically<br />

l<strong>in</strong>ear <strong>in</strong>, what may be called, the efficient <strong>in</strong>fluence function of the average density<br />

n- 1 L:j=lPfJ,lJi = P9,iJ..' Though the def<strong>in</strong>ition of Tn is based on the work<strong>in</strong>g pr<strong>in</strong>ciple that<br />

all '7's are equal, the estimator asymptotically improves upon other estimators <strong>in</strong> the literature.<br />

A similar contradictory pr<strong>in</strong>ciple is noted by L<strong>in</strong>dsay (1985), who proposes 'to<br />

<strong>in</strong>crease efficiency' by us<strong>in</strong>g the best estimator for the <strong>structural</strong> model with a fixed parametric<br />

family '7t. The idea.<strong>in</strong> this paper is that, though it is impossible to adapt to every '7i<br />

separately, adaptation to the average ijn is possible. (It will be assumed that the sequence<br />

ijn stabilizes as n -+ 00).<br />

By an. extension of this idea. one can actually show, that T,. is asymptotically <strong>in</strong>admi,,·<br />

sible <strong>in</strong> the class T of all estimator sequences which are asymptotically normal 'uniformly<br />

•<br />

over contiguous neighbourhoods'. One notes that (tn)-1 L:J:l 7]j <strong>and</strong> (tn)-l 2:j=t"'+l '7j<br />

are estimable too, <strong>and</strong>, without go<strong>in</strong>g <strong>in</strong>to any detail, it is clear that the 'knowledge' of<br />

these two quantities should enable one to do better than when 'know<strong>in</strong>g' ij,. only. (A recipe<br />

<strong>and</strong> further discussion can be found on pages 135-138 of van der Vaart (1988». However,<br />

we don't believe that this <strong>in</strong>admissibility result necessarily means that T", is not a 'good'<br />

estimator. It rather shows the difficulty of selt<strong>in</strong>g up a theory of asymptotic optimality<br />

for non-i.i.d, <strong>models</strong> as considered here. On the positive side it can be shown that T,. is<br />

optimal <strong>in</strong> the class of estimator sequences which are <strong>in</strong> T <strong>and</strong> are also asymptotically<br />

symmetric <strong>in</strong> the sense that<br />

•<br />

.;n(T. - 6) = n-'Lg.(Xj) + OF"",,,,.. (1).<br />

j=1<br />

3


An extension of an <strong>in</strong>terest<strong>in</strong>g conjecture of Bickel <strong>and</strong> Klaassen (1986) is tha.t optimality<br />

would rema<strong>in</strong> true, when asymptotic symmetry is replaced here by symmetry <strong>in</strong> the<br />

observations for every n. However, we know of no proof of this conjecture.<br />

The organization of the paper is as follows. Section 2 conta<strong>in</strong>s conditions for the onestep<br />

estimator (1.7) to satisfy (1.8). Here 11n(9) is allowed to be a general estimator <strong>and</strong><br />

need not be the maximum likelihood estimator. It turns out that no rate of convergence of<br />

11n(9) is required. However, consistency <strong>in</strong> a suitable topology, which depends on the model,<br />

is crucial. For this reason Section 3 starts with the <strong>in</strong>troduction of a class of topologies on<br />

1i. Next it is shown that 1}"Ui) satisfy<strong>in</strong>g (1.6) is 'consistent for it,,'. Here the sets 'H." are<br />

chosen data-dependent <strong>and</strong> of a simple form, with a view towards applicability. F<strong>in</strong>ally,<br />

Sections 4-6 conta<strong>in</strong> examples.<br />

Efficient estimators for the <strong>structural</strong> version of the model were already constructed<br />

<strong>in</strong> van der veart (1988) <strong>and</strong> (for the example <strong>in</strong> Section 5) In Bickel <strong>and</strong> Ritov (1987).<br />

Both constructions are based on kernel estimators for 90,,, <strong>and</strong> use a rather large number of<br />

tricks. Advantages of the estimator (1.6)-(1.7) are that it is better adapted to the mixture<br />

form of the model, that it uses maximum likelihood, <strong>and</strong> that it is simple <strong>and</strong> avoids<br />

dependence on unknown <strong>parameter</strong>s such as the b<strong>and</strong>width of a kernel.<br />

An important open problem concerns the behaviour of the maximum likelihood estima.tor<br />

for (9,11), def<strong>in</strong>ed as the pair maximiz<strong>in</strong>g TIj=lPS,,,(Xj) over (3 x 'Hn • We feel<br />

that with a similar choice of sieves as <strong>in</strong> this paper (or maybe even without sieves), the<br />

9-component may well be asymptotically equivalent to T" given by (1.7)-(1.8), <strong>in</strong> both the<br />

<strong>structural</strong> <strong>and</strong> functional version of the model. This problem has been open s<strong>in</strong>ce Kieler<br />

<strong>and</strong> Wolfowitz (1956) considered consistency <strong>and</strong> appears to be hard.<br />

2. General theorem.<br />

Under appropriate smoothness conditions we have<br />

where 'VS1fo(z) is written <strong>in</strong> the form of a k x m matrix with the derivatives with respect<br />

to 9j <strong>in</strong> its i-th row, <strong>and</strong> Qs,'If(Y) = 'V 1J<br />

9s,,,/ 9S,,,(Y). When substract<strong>in</strong>g the conditional<br />

expectation given 1fs(X), the third term cancels. This motivates to assume the existence<br />

of measurable maps fIs, ,(fis <strong>and</strong> Qs,'If such that<br />

(2.1) 1,-n =H. +;fi, 0 Q.,.(",.)<br />

(2.2) E,(;fi,(X) I"'.(X») =o.<br />

Set<br />

Let 5 be a semi-metric, which makes 1i <strong>in</strong>to a separable metric space. Assume that<br />

•<br />

(2.3) ij,,=n- 1 L 11j !.. n<br />

j=:1<br />

4


Q,.,".(y) - Q,,"(y)<br />

(2.4) /3,. (y) - /3'(y) every y, every 1'n ---. -y,<br />

g,.".(y) - g",(y)<br />

,<br />

(2.5)<br />

{<br />

SUP IQ,•."I'/3;'g,.,s. : n = 1,2, ...} is v-equi-<strong>in</strong>tegrable,<br />

"EU<br />

for some open neighbourhood U of 1].<br />

Next assume the existence of estimators -r7n(8), based on .,p(J(X 1 ) , ••• , tP9(Xn), such<br />

that for every sequence 8 n with yn(8 n - 8) --+ h<br />

(2.6)<br />

By 'estimators' it will be understood that every 1}n(8) is a. measurable map from X n <strong>in</strong>to<br />

?i with respect to the Borel e-field on ?-t.<br />

F<strong>in</strong>ally assume the existence of measurable functions 19(z,z) such that for every sequence<br />

Bn with ,;n(Bn - B) _ h<br />

(2.7) Ji,p,dp. = 0<br />

(2.8) JJ [,;n(p:, -pi) - Wi,prj' dp.df<strong>in</strong>-O<br />

(2.9) JJli,l' s» dp.df<strong>in</strong> = 0(1)<br />

(2.10) j'r . Ii,.I' P'. dp. df<strong>in</strong> _ 0, every e > 0<br />

J{Il.,.. I~"'v'R}<br />

(2.11) JJIl,.p:' -l,p! I' dp.df<strong>in</strong> - 0<br />

(2.12) the limit po<strong>in</strong>ts of {i 9 ,'it,, } are nons<strong>in</strong>gular.<br />

We shall identify the e-ecoee i",(x) with p.. ~(x) J i,(x,z)p,(x,z)d~(z).<br />

Theorem 2.1. Let (2.1)-(2.12) hold. Let en be a ,;;:i-consistent, discretized estimator of<br />

B. DeJ<strong>in</strong>eTn by (1.7). Tben(1.8)bolds.<br />

Proof. Let ,;n(Bn - B) _ h. Assumptions (2.7)-(2.9) imply contiguity of the measures<br />

with densities ni~lP'.,.,(xj) <strong>and</strong> nj=lPB,., (xj), while (2.7)-(2.11) ensure that<br />

n<br />

(2.13) n-'''' (1 (x·)-l- (X.»)-l - hP···~,,···o<br />

L....i (1.. ,,,..) 9,'1..) 9,'1..<br />

i=l<br />

" - P.••• ", 1, :I ••••<br />

(2.14)<br />

0<br />

n-'L n l<br />

- '.,S. l B.", (X)<br />

j=l<br />

j - I B,S. - •<br />

5


To see this, one can.first note that (2.7).(2.11) imply analogous statements for correspond<strong>in</strong>g<br />

mixe~ quantities:<br />

(2.7')<br />

(2.8')<br />

(2.9')<br />

(2.10') everye> 0<br />

(2.11')<br />

(van der Vaarl (1988), Section 5.8.1). Next (2.9')-(2.10') also hold with ie" replaced by<br />

the correspond<strong>in</strong>g Ie" (van der Vaarl (1988), Lemma 5.20) <strong>and</strong> (2.11') implies<br />

•<br />

-'''j - • - .,<br />

(2.11") n L..J lie",ii"Pe"'lJi -lQ,fJ"PS,lJi I dJJ --+ 0,<br />

j=l<br />

(van dee Vaart (1988), pI68-169). F<strong>in</strong>ally, (2.7')-(2.9') imply local asymptotic normality,<br />

(2.13) follows from Proposition A.10 <strong>in</strong> van der Vaarl (1988) <strong>and</strong> (2.14) is a version of the<br />

law of large numbers.<br />

By (2.4) the map , --+ Qe,.,.(Y) is cont<strong>in</strong>uous for every y. Thus the map (y,,) --+<br />

Qe,7(Y) is measurable (ef. Chapter 13 of Pfanzagl <strong>and</strong> Wefelmeyer (1985» <strong>and</strong> T. welldef<strong>in</strong>ed.<br />

By (2.1)-(2.2) <strong>and</strong> because 1).(8) depends on 1/-e(X,), ... ,1/-e(X.) only<br />

Ee. [In-.t (le.".(e.!lXj ) -t,..•JXj») 1'1 1/-e. (X,), ... ,1/-e. (X.»)<br />

(2.15)<br />

. ,<br />

::; n"LIQe.,;.(e.l(1/-e.(X;») -Qe•.•• (1/-••(X;»)j f3U"'•• (X;»).<br />

j=l<br />

By (2.3) <strong>and</strong> (2.6), for every open neighbourhood U of ry, this can be dom<strong>in</strong>ated by<br />

Under Pe",llL,"~,... the expectation of the first term. is<br />

j sup<br />

"l't,"l'1EU<br />

IQe.,7' - Qe•."I' pi. se;»; dv.<br />

6


The latter expression converges to zero as U decreases to 7] <strong>and</strong> n _ 00, by (2.4) <strong>and</strong><br />

(2.5). It follows that both sides of (2.15) converge to zero <strong>in</strong> probability. From this <strong>and</strong><br />

(2.13)-(2.14) it can be seen that<br />

n<br />

(2.16) n-''''(l - (X·) -l - (X.») p'.'!!;""" 0<br />

(2.17)<br />

L...J<br />

;=1<br />

6.. " .. (6 ..») 6..,'1..)<br />

n<br />

n -1L (i 6 .. ,Ij,,(6.. )4.. ,q..(6 ..)(Xj) -is",,,..4.. ,,,.. (X j ») P9.. ,~,,~, ... O.<br />

;=1<br />

The rest of the proof is st<strong>and</strong>ard, us<strong>in</strong>g (2.13)-(2.14), (2.16)-(2.17), (2.12) <strong>and</strong> the<br />

discretization of 8 n •<br />

3. q-topologyr consistency of (restricted) MLE's.<br />

In this section a family of topologies on the set of mix<strong>in</strong>g distributions is <strong>in</strong>troduced,<br />

which <strong>in</strong> applications can play the role of the topology needed <strong>in</strong> (2.3)-(2.6). Furthermore,<br />

it is shown that a (restricted) maximum likelihood estimator is consistent <strong>in</strong> this topology.<br />

It is assumed that Z is an open subset of IRm (or more generally a locally compact,<br />

Hausdorff space with a countable base <strong>and</strong> countable at <strong>in</strong>f<strong>in</strong>ity). For each n = 1,2,...<br />

Y n 1 , ••• , Y n n are measurable elements <strong>in</strong> a measurable space (Y, A), <strong>and</strong> Ynj has density<br />

(3.1) gs.,,;(Y) = Jgs.(y,z)d~j(z),<br />

with respect to a u-f<strong>in</strong>ite measure v. Here 7]1, ••• , 7]n are unknown probability measures<br />

on the Borel e-fleld C of Z, <strong>and</strong> the sequence On is treated. as known.<br />

The problem is to estimate t1n = n-<br />

1:Lj=l7]nj,<br />

based on Y n 1 , ••• ,Y<br />

3.1. q.topology. Let q be a cont<strong>in</strong>uous, positive function from Z <strong>in</strong>to rn.. Let 'H. be the<br />

set of all sub-probability distributions ~ on (Z, C) with Jq d~ < 00, Def<strong>in</strong>e a (metrizable)<br />

topology, called the q-topology, on 'He by say<strong>in</strong>g that<br />

(3.2) ~n s; ~ if <strong>and</strong> only if Jcq d~n ~ Jcq d~, all c E Co(Z).<br />

Here Co(Z) is the set of all cont<strong>in</strong>uous, real functions on Z which vanish at <strong>in</strong>f<strong>in</strong>ity {i.e.<br />

the closure <strong>in</strong> the uniform norm. of the set of cont<strong>in</strong>uous functions with compact support).<br />

The convergence concept (3.2) <strong>in</strong>deed corresponds to 8. topology on 'H•.<br />

Lenuna 3.1. The convergence concept (3.2) corresponds to a topology on 'H. with the<br />

properties:<br />

(i). For every M > 0, 'H M = {~ E 'H. : Jq d~ :'0 M} is q-compacl.<br />

(li). The q-topology is metrizable by a metric b q •<br />

Proof. Embed 'H M <strong>in</strong> the set TM of positive measures T on (Z,C) with total mass not<br />

exceed<strong>in</strong>g M, through<br />

(C E C).<br />

n n •


This embedd<strong>in</strong>g extends to an embedd<strong>in</strong>g of 1i. = UM>o1i M <strong>in</strong>to the set T of all positive<br />

measure~ on (Z,C). Clearly '7" ~ TJ if <strong>and</strong> only if r« - -r <strong>in</strong> the vague topology on T .<br />

Thus the q-topology is the relative vague topology of T under the above embedd<strong>in</strong>g. Now<br />

(ii) follows immediately from metrizability of the vague topology on T. (d. Bauer (1981),<br />

p243). Next, s<strong>in</strong>ce T M is vaguely compact, it suffices for (i) to show that 1{M is closed as<br />

a subset of T M , i.e. if for {'7..} C 1i.,<br />

Jcq dryn ---> Jcdr,<br />

all c E Co(Z),<br />

then we must show that dr = q dTJ for some TJ E Ji 8 • For m = 1,2,... let Xm E Co(Z)<br />

have compact support <strong>and</strong> be such that 0 ~ Xmil as m ---+ 00. Then<br />

JXm q-l dr = lim JXm q-lq dryn ,.:; l.<br />

n_~<br />

By monotone convergence we conclude f q-l dr ~ 1. Thus we can set dTJ = q-1dr.<br />

For q(z) =1 the q-topology <strong>in</strong>duced on the set of probability measures is precisely the<br />

weak topology. However, if q(z) ----+ 00 as z converges to the po<strong>in</strong>t at <strong>in</strong>f<strong>in</strong>ity (the boundary<br />

of Z), then the q-topology is stronger than the weak topology. For <strong>in</strong>stance if Z = :m.+<br />

<strong>and</strong> q(z) = z-2 V z2, then l1n ~ 11 if <strong>and</strong> only if<br />

Jj dryn ---> Jj dry,<br />

for all cont<strong>in</strong>uous functions j with j = o(q) as z ---> 00 or z t 0, i.e. j(z) = 0(z2) for<br />

z ---+ 00 <strong>and</strong> j(z) =O(z-2) if z ! O. For this topology the subset of probability measures is<br />

also closed <strong>in</strong> 1£•.<br />

3.2. Restricted MLE's. Pfanzagl (1987) shows that <strong>in</strong> the <strong>structural</strong> version of the<br />

model the unrestricted MLE is typically consistent <strong>in</strong> the weak topology. To enforce consistency<br />

<strong>in</strong> a general q-topology, we use a simple device: restrict the ma.ximization of the<br />

likelihood to the mix<strong>in</strong>g distributions with expectation of q bounded from above. Of course<br />

we don't want to assume that the true q-moment of '7 is known. Hence we must either let<br />

the bound <strong>in</strong>crease to <strong>in</strong>f<strong>in</strong>ity as n ---+ 00, or use a bound based on an (over)estimate of<br />

the true expectation of q, In our examples the second possibility turns out to be feasible,<br />

<strong>and</strong> the more convenient one. Therefore we restrict ourselves to this device.<br />

Apart from this, it may be useful or necessary for the actual computation of the<br />

maximum likelihood estimate, to restrict the maximization to a still smaller subset of<br />

mix<strong>in</strong>g distributions (for <strong>in</strong>stance a f<strong>in</strong>ite dimensional one). Of course this subset will<br />

depend on the number of observations <strong>and</strong> <strong>in</strong>crease as n ---+ 00.<br />

Let "H be the set of all probability measures ry on (Z,C) with Jqdry < 00. Next,<br />

given a 'r<strong>and</strong>om upperbound' On for Jq d'7 <strong>and</strong> a subset 1£n of 1£, let f<strong>in</strong> be an element of<br />

Hn :="Hn n try E "H: Jqdry < Qn} such that<br />

n<br />

(3.2) IIgOn,lj.(Yn;)


-<br />

for some c > o. The choice c ~ 1 yields the 'full' restricted MLE. By Lemma 3.1(i) this<br />

certa<strong>in</strong>ly exists if 1] - gs...,,(y) is cont<strong>in</strong>uous for every y <strong>and</strong> ?<strong>in</strong> is closed as a subset of<br />

1i., both with respect to the q-topology.<br />

It is shown below that i<strong>in</strong> thus def<strong>in</strong>ed is consistent <strong>in</strong> the q-topology provided that<br />

the union of the 1t n satisfies a denseness condition. This is satisfied <strong>in</strong> particular when<br />

?<strong>in</strong> = 1£ for every n. A consequence of this strong result is that the present asymptotics<br />

give little guidance concern<strong>in</strong>g the choice of sieves ?-In' Any reasonable sequence of sieves<br />

will have the denseness property. Of course, one expects that decreas<strong>in</strong>g the size of 'H.n will<br />

improve the performance of the estimator provided the true (average) mix<strong>in</strong>g distribution<br />

is conta<strong>in</strong>ed <strong>in</strong> ?<strong>in</strong>' but will make it worse <strong>in</strong> the converse case. Then, if there is no reason<br />

to assume that the true mix<strong>in</strong>g distribution has a certa<strong>in</strong> parametric form, it seems safest<br />

to choose the sieve as large as is computationally feasible.<br />

For comput<strong>in</strong>g an unrestricted MLE several algorithms have been suggested <strong>in</strong> the<br />

literature (cf. Laird (1978), Heckman <strong>and</strong> S<strong>in</strong>ger (1984), Jewell (1982), L<strong>in</strong>dsay (1983a)).<br />

Typically the unrestricted MLE can be chosen discrete with at most n support po<strong>in</strong>ts<br />

(L<strong>in</strong>dsay (1983a». The algorithms suggested by the above authors are more or less based<br />

on this property. We don't know whether the discreteness properly is reta<strong>in</strong>ed <strong>in</strong> a restricted<br />

maximization problem as above <strong>and</strong> have not studied algorithms for comput<strong>in</strong>g<br />

a restricted MLE <strong>in</strong> any detail. Of course, if the ?<strong>in</strong> are chosen f<strong>in</strong>ite dimensional, then<br />

computation is possible by a variety of algorithms, at least <strong>in</strong> pr<strong>in</strong>ciple.<br />

The follow<strong>in</strong>g regularity conditions are assumed to hold. For any subprobability measures<br />

"t, "tn, n = 1,2, ...<br />

(3.3) for every y, every "tn ~ "t, e; -t 8.<br />

- ,<br />

(3.4) 1]n -t 1].<br />

Call 1] E ?i identifiable if there exists n0"t :/; 1] <strong>in</strong> ?i such that<br />

1<br />

(3.5) 08,"'( dv = l.<br />

{g',"T =g",,}<br />

Identifiability <strong>in</strong> the case of mixtures over an exponential family is discussed <strong>in</strong> PIanzagl<br />

(1987).<br />

Next let Qn = qn(Yn11"" Ynn, 8n) be estimators such that<br />

Qn E IN" a.s.<br />

(3.6) Qn = OP'''''I1,.~ ....(L).<br />

P9.. ,rhl'l~,...(Qn > J qd1]) -t 1,<br />

as n -t 0Cl.<br />

F<strong>in</strong>ally let ?<strong>in</strong> be an <strong>in</strong>creas<strong>in</strong>g sequence of convex subsets of 1i, satisfy<strong>in</strong>g<br />

cc<br />

(3.7) U u; n h E 1{. : f q d-y :'0 M} is q-dense <strong>in</strong> h E 1{. : f q d-y :'0 M}, every M.<br />

n=l<br />

9


Theorem 3.2. Let (3.2}-(3.7) bold <strong>and</strong> 'I be identIfiable. Tben 5,(i<strong>in</strong>,'I) --+ 0 <strong>in</strong> outer<br />

P9..,'l'1,17::t,..• -probability.<br />

Theorem 3.2 is formulated <strong>in</strong> terms of outer measure, because it is hard to say <strong>in</strong><br />

general whether 81J(r<strong>in</strong>, TI) is measurable. However, when the restricted maximum likelihood<br />

estimator is unique <strong>and</strong> it" is compact, then for every closed F<br />

Under (3.3) the latter set is measurable, so that 1}n is measurable <strong>in</strong> the Borel e-field of<br />

1-£. We shall ignore the measurability issue <strong>in</strong> the examples <strong>in</strong> Sections 4-6.<br />

Proof. Given a constant M > Jq d." let 1-(.M be all sub-probability measures -y with<br />

Jq d-y $ M. Choose a. sequence {1]n} <strong>in</strong> 1i M with 11" E ?<strong>in</strong> <strong>and</strong> TJn ..!.. TJ.<br />

Fix a E (0,1) <strong>and</strong>'1 E "liM. By convexity of u --+ ulog(l + a(u -1)), identifiability<br />

of 11 <strong>and</strong> Jensen's <strong>in</strong>equality<br />

(3.8) r log [1 + a (g, .• -1)] g". dv > O.<br />

J{u".,>O}<br />

gfJ,...,<br />

M<br />

For U C"li write ii'.,u(y) = sUP"EUg,.d(Y)' Set Urn = (-y' E"li M : 5,('1','1) <<br />

m- 1 } . By (3.4) for every m = m n -+ 00<br />

(3.9)<br />

every y, as n -+ 00.<br />

By (3.3)-(3.4), (3.8)-(3.9) <strong>and</strong> Fatou's Lemma there exists a constant M, such that<br />

l<strong>in</strong>~~Jlog [1 + a (.g,. ,.. -1)] 1\ M, s»; ,n. dv > O.<br />

99.. ,U.....<br />

(Note that log(l + ,,(u -1)) 2: Iog(l-,,) if u 2: 0). Thus for every '1 E "liM there exists a<br />

q-open neighbourhood U.., <strong>and</strong> a. constant M'l such that, with<br />

Zn;('1) = log [1 +" (~(Yn;) -1)]<br />

9f",U..,<br />

n<br />

(3.10) lim<strong>in</strong>fE,.,."." ... n-'"<br />

n-oo<br />

L.....J<br />

j=1<br />

(Zn;(-r) 1\ M,) > O.<br />

On {Qn = M} f7n <strong>and</strong> TIn are both conta<strong>in</strong>ed <strong>in</strong> 71 n . By (3.2) <strong>and</strong> convexity of ?<strong>in</strong><br />

Rewrite this as<br />

n<br />

-1"1 g,.,•••+(l-.)•• (y ) < -11<br />

n L-J og nj _ -n age.<br />

;=1 99",;'..<br />

10


i: (3.11) n- 1 log [1 + a (g,",," (Y ni ) - 1)] S -n-1logc.<br />

;=1 99..,'1..<br />

Fix 0 > O. The set A ='liM - 17 E 'liM : 5.C7.~) < e] is q-compact by Lemma 3.2(i).<br />

From the cover {U., : "f E A} extract a. f<strong>in</strong>ite subcover U-rn"" U.,•. By (3.11)<br />

(5.(~n,~) 2 e A Qn = M}<br />

cU • { n- 't log [1 +aU'""" (Yn ; ) -<br />

i=l ;=1 ge " ,U' j<br />

1)] S -n- 'lo<br />

gc}<br />

c;~ {n-1t(Zn;C7,) AM,,) S -n-1logc}<br />

By (3.10) <strong>and</strong> the law of large numbers the ps..,J/t,'I2 •... -probability of the last set converges<br />

to zero.<br />

F<strong>in</strong>ally<br />

p,"".,,,,... (5,(~n,~) 2 0)<br />

M'<br />

< L p,"""",... (5,(~n,~) 2 c A Qn = M) + p,"""",... (e, ~ (J qd~,M'J)<br />

M~[I, d,I+1<br />

Here the second term can be made arbitrarily small by (3.6) <strong>and</strong> the first term converges<br />

to zero for every fixed M' > 0, by the above argument.<br />

4. Paired exponential model.<br />

Write the observations as pairs (X, Y) <strong>and</strong> let<br />

(z E Z = m.+ , {e, y) E (ID,2)+ , () E e = IR+). Thus <strong>in</strong> the <strong>in</strong>cidental version of the model,<br />

the problem is to estimate (), the ratio of the hazard rates with<strong>in</strong> pairs of exponentially<br />

distributed variables, where the basel<strong>in</strong>e hazards %i are unknown <strong>and</strong> may differ over pairs.<br />

With ",,(X,Y) = X + OY it follows that<br />

oo<br />

- x-6y 9~ 8y-x<br />

l",(z,y) = 20(z + Oy) + g, (z + Oy) 20 '<br />

where 9,,(8) = f z2 s e- u d,,(z) for 8 > O. Given X + 6Y = 8, X - 6Y has a uniform<br />

o<br />

distribution on [-s,s]. Thus f!;(s) ~ s'/(120'). Assume that<br />

(4.1 ) 7]" --+ " <strong>in</strong> the weak topology<br />

(4.2) l-: + z')d~n(z) ~ j(z-' + z')d~(z) < 00.<br />

11


Let ?<strong>in</strong> be an <strong>in</strong>creas<strong>in</strong>g sequence of subsets of the set 'H of all probability measures on<br />

m+ ea tis.fy<strong>in</strong>g (3.7) with q(z) = z', (0 < c:s h fixed). Set<br />

(4.3) ?<strong>in</strong> = 'H." n {-y E 1/ ; f z' d-y(z) :s eln2:i~l XT'}<br />

Theorem 4.1. Let (4.1)-(4.3) bold, let en be a discretized, yTi'-consistent estimator (or<br />

8 <strong>and</strong> let ~n(8) maximize IIi~lP.,.(X;'Yj) over ?<strong>in</strong>. Then t; def<strong>in</strong>ed by (1.7) sa tisnes<br />

(1.8).<br />

3-)'1<br />

As for the regularity conditions, it is known from van der Vaart (1988), Theorem<br />

5.17, that (4.2) is unnecessary for the existence of an estimator sequence Tn satisfy<strong>in</strong>g<br />

(1.8). Thus one might hope that Theorem[1 can be slightly improved.<br />

It is well-known that the unique solution of<br />

(4.4)<br />

~Xj-fJYj =0<br />

L...X· +8Y'<br />

j=l J J<br />

is a vn-consistent estimator for fJ. In fact, it is asymptotically normal for every sequence<br />

{"Ij}, because the distribution of the Xj/Y j is <strong>in</strong>dependent of "Ii'<br />

Proof. The theorem is a corollary of Theorems 2.1 <strong>and</strong> 3.1 applied with the q-topology of<br />

q(z) = z'. It is tedious, but straightforward to check (2.7)-(2.12) (ef. van der Va.a.rt (1988),<br />

pI56-159). From the other assumptions only (2.5) needs comment. First<br />

(4.5)<br />

The set of functions .z: ---+ (s.z:)f.-ce-.u on m+, (0 < s < 1), is uniformly bounded <strong>and</strong><br />

equi-cont<strong>in</strong>uous. Hence it is pre-compact <strong>in</strong> Co(Z) <strong>and</strong> if ""( is sufficiently close to "I <strong>in</strong><br />

the as-topology, then !(.u)"'-Ce-U ZC d""(.z:) is uniformly close to !(sz)f-Ce-u.z:c d71(z) .<br />

Therefore, the right h<strong>and</strong> side of (4.5) can for s < 1 <strong>and</strong>-r sufficiently close to "I be bounded<br />

by<br />

(4.6)<br />

Let 0 < a < b < 00 be such that p = 71( a, b) > O. Then for ""( sufficiently close to 71, we have<br />

""(a,b) > ~p. But then the right h<strong>and</strong> side of (4.5) can for such rr <strong>and</strong> s > 1 be bounded<br />

12


y<br />

(4.7)<br />

s: S'Z4.-.. d7(Z)]<br />

2+2(3bs)'+2 ",<br />

[ fa sz2e-u d7(Z)<br />

g,.(s)<br />

< [2 + 18b 2s2 +2(sp~a2e-b')-ls3e-2b' 1~ Z4e-:tJ:d7(Z)] gi7n(s)<br />

::; [2 + 18b's' + 4p- 1a-'.-'·s-'64 ] g,.(8).<br />

Comb<strong>in</strong>ation of the bounds (4.5)-(4.7) shows that (2.5) is satisfied, if for sufficiently<br />

large N the set {(8' +8'-') g,.(8): n > N} is equi-<strong>in</strong>tegrable. Now by (4.1)-(4.2)<br />

J(8' + 8'-') g,. (8) ds = J(6Z-' + r(c)z'-') d'<strong>in</strong>(z)<br />

--+ J(6z-' +r(c)z'-') d1)(z) = J(8' +8'-') g.(8)d8.<br />

Then (cf. Theorem 13.47 <strong>in</strong> Hewitt <strong>and</strong> Stromberg (1965))<br />

5. Errors-<strong>in</strong>-variables.<br />

Write the observations as pairs (X, Y). The <strong>in</strong>cidental version of the model is given<br />

by<br />

Xi = Zj + e;<br />

Yi = ,,+ /3zi+ ii,<br />

where (i~), (i:),... are i.i.d. unobservable N(o,:E -1) distributed vectors, <strong>and</strong> Z i unknown<br />

numbers <strong>in</strong> Z = lEl Set (J = (a,p, E). To make this <strong>parameter</strong> identifiable <strong>in</strong> the <strong>structural</strong><br />

version of the model one can put restrictions on either 1-{. or :E. Indeed, it suffices that 1i<br />

does not conta<strong>in</strong> normal distributions (where po<strong>in</strong>tmasses are considered normal): alternatively<br />

it can be assumed that :E = if; :Eo, where Eo is known. Identifiability is obviously<br />

crucial for the existence of a. yn..consistent estimator sequence for (J. However, it does<br />

not playa role <strong>in</strong> the validity of Theorem 2.1 on the improvement of such an estimator.<br />

Therefore, we do not discuss the matter here, but refer to the rather large literature on the<br />

model. See Ander.;on (1984) <strong>and</strong> Bickel <strong>and</strong> Ritov (1987) <strong>and</strong> lhe references cited there.<br />

The assumptions imposed do a.ffect the dimension of (J, though. Below we give the formulas<br />

for the case that E is a. free positive def<strong>in</strong>ite matrix (; ~) <strong>and</strong> write (J = (a,fJ,u.,T,P).<br />

A sufficient statistic <strong>in</strong> this model is '!f9(X,Y) = (~)'E(Y~a)' Its distribution is a.<br />

mixture of normal distributions with density<br />

13<br />

-


where ul = (~)'E(~).<br />

Set M. = [- u;'(~)(;)'E. Then<br />

o 0<br />

P T<br />

;j,.(z,y) = 1 0 M, ( z ).<br />

y-Q<br />

o IJ<br />

{3 1<br />

Furthermore, 1.,.(z,y) = F.,.(Z,y)-' J1.(z ,v. z) PB(z, y, z) d'7(z), where<br />

1.(z,y,z) =<br />

(:)'M.(.:o)<br />

z(:)'M.(.:o)<br />

-t [


-<br />

treat<strong>in</strong>g 8 as known, automatically satisfies the moment condition <strong>in</strong> (5.5) (Lemma 5.1,<br />

below). Thus one can carry out the construction (1.7)-(1.8) with no restrietionat all when<br />

maximiz<strong>in</strong>g the likelihood [i.e. f<strong>in</strong> = 1-£).<br />

Proof. It is very tedious, but straightforward to check (2.7)-(2.11). Moreover Ie,,,,. -.1 8 ,,, ,<br />

'0 that (2.12) follows from (5.4). For fixed 8 the mix<strong>in</strong>g distribution '1 is identifiable <strong>in</strong><br />

g,•• by Proposition 6.2 of Pfansagl (1987). By (5.1)-(5.2) ~n -.!. '1 <strong>in</strong> the q-topology of<br />

q(z) = 1 V z2. Theorem 5.1 follows from Theorems 2.1 <strong>and</strong> 3.1, applied with this q­<br />

topology. The only condition that needs comment is (2.5).<br />

For any r<strong>and</strong>om variable V <strong>and</strong> decreas<strong>in</strong>g function b on lR it holds that Cov(V, b(V))<br />

s O. In consequence E IVI¢(V) = E 1V1¢(1V1l s E IVI E ¢(V). Therefore<br />

Ig"7(')1 = If -


Multiply<strong>in</strong>g with q(Zi) <strong>and</strong> summ<strong>in</strong>g over i gives the 'self-consistency equations'<br />

(5.6) Jq(z)di<strong>in</strong>(z) = n-1tE•• (q(Z) 1 T = Ii),<br />

i=l<br />

where (T, Z) has distribution (I - z) d'l(z) under 'I.<br />

Next perturbation of the v's yields the stationary equations<br />

t h,(Ii-Zi)(Ii-Z,)_O<br />

i=1 L;:':, h,(Ii - Z,) - •<br />

Multiply<strong>in</strong>g with r(zj) <strong>and</strong> summ<strong>in</strong>g over i gives<br />

i = 1, ... ,m.<br />

n<br />

(5.7) n-'L:E•• (r(Z)Z 1 T = Ii) =n-1L:E•• (r(Z) 1 T = li)Ii'<br />

i=l<br />

i=l<br />

Comb<strong>in</strong>ation of (5.6)-(5.7) with q(z) = z <strong>and</strong> r(z) = 1 yields the first assertion of tbe<br />

lemma. Next comb<strong>in</strong>ation with q(z) = %2 <strong>and</strong> r(z) = z gives<br />

n<br />

6. Normal scale mixture.<br />

The <strong>in</strong>cidental version of the model is given by<br />

Xj = () + zj1ej,<br />

where el,e2, ... are unobservable, <strong>in</strong>dependent st<strong>and</strong>ard normal variables, () E e = m.<br />

<strong>and</strong> zi E Z =: m.+. Of course X 1,X2 , ... are sampled from distributions which are symmetric<br />

about (), <strong>and</strong> one may estimate () with the estimators of Stone (1975), Bickel <strong>and</strong><br />

Klaassen (1986), or van der Vaart (1988), Section 5.7.4, which are fully adaptive. However,<br />

these estimators do not take the normality of the error terms <strong>in</strong>to account, whereas the<br />

MLE-based estimator (1.7)-(1.8) does.<br />

With ifJ,(X) = IX - &/ it follows that<br />

. - g~( )<br />

i".(z) = i".(z) = g. Iz - 81<br />

where g.(s) = 2J o = z(zs)d'l(z) for s > O. Clearly (3~ ,,1.<br />

Assume that<br />

. -sgu(z - B),<br />

(6.1) f<strong>in</strong>. -+ f) <strong>in</strong> the weak topology<br />

(6.2) /,= (z2 + z-2) di<strong>in</strong>(z) -/,= (z2 + z-2) d'l(z) < 00<br />

1<br />

(6.3) = z2 df<strong>in</strong>.(Z) -+ 0, every E > O.<br />

,.,;n<br />

16<br />

-


Fix ~ < c < 2. Let 1£ be the set of probability measures on m. with f zt: d7](z) < 00, let<br />

1£1'10 be a .sequence of subsets satisfy<strong>in</strong>g (3.7) with q(z) = 1 Va", <strong>and</strong> set<br />

(6.4)<br />

where Qn = Op, ... 'l'I.,,~ •..• (I) <strong>and</strong> P9..,'Ih,'12•.•.(Qn > f zt:dfJn(z» --+ O.<br />

Theorem 6.1. Let (6.1)-(6.4) boJd, Jet 81'10 be a cUscretized yn-consi6te12t estimator for<br />

8 <strong>and</strong> Jet ~n(8) maximize rrj~IP,.• (Xj,Yj) overi/n. Then Tn def<strong>in</strong>ed by (1.7) satisfies<br />

(1.8).<br />

Construction of e, suitable sequence Qn is not entirely trivial. An example is<br />

Proof. The theorem is a corollary of Theorems 2.1 <strong>and</strong> 3.1 applied with the q-topology<br />

of q(z) = 1 V zt:. As <strong>in</strong> the previous sections the ma<strong>in</strong> problem is to check (2.5). First by<br />

uniform boundedness <strong>and</strong> equi-cont<strong>in</strong>uity of the functions z -+ (sz)3-t:¢(sz), (0 < s < 1),<br />

we have for 0 < s < 1 <strong>and</strong> I sufficiently close to 7] that<br />

(6.5)<br />

This set of functions is equl-<strong>in</strong>tegrable over (0,1) by (6.2)-(6.3). Next, with the same<br />

notation as <strong>in</strong> the proof of Theorem 4.1, the left h<strong>and</strong> side of (6.5) can for s > 1 be<br />

bounded by<br />

[16b 4 s 2 +(ap)-ls2e-~·21l235] gil..(s).<br />

Equi-<strong>in</strong>tegrability of these functions follows from equi-<strong>in</strong>tegrability of {s 2 gij.. (s)<br />

1,2,...}, which follows from (6.1)-(6.2).<br />

n<br />

Acknowledgement.<br />

I thank Y. Ritov for permission to <strong>in</strong>clude Lemma 5.1 <strong>in</strong> this paper.<br />

17


References<br />

Anderson, T.W., (1984). Estimat<strong>in</strong>g l<strong>in</strong>ear statistical relationships. Annals Statist. 12,<br />

1-45.<br />

Bauer, H., (1981). ProbabilityTheory <strong>and</strong> Elements ofMeasure Theory, Academic<br />

Press, London.<br />

Begun, J.M., Hall, W.J., Huang, W.M. <strong>and</strong> Wellner, J.A., (1983). Information <strong>and</strong> asymptotic<br />

efficiency <strong>in</strong> parametric-nonparametric <strong>models</strong>. Annals Statist. 11, 432-452.<br />

Bickel, P.J., Klaassen, C.A.J., (1986). Empirical Bayes estimation <strong>in</strong> functional <strong>and</strong> <strong>structural</strong><br />

modem, <strong>and</strong> uniform adaptive estimation of location. Adv. Appl. Matn. 7,<br />

55-69.<br />

Bickel. P.J., Ritov, Y. (1987). Efficient estimation <strong>in</strong> the errors <strong>in</strong> variables model. Annals<br />

Statist. 15, 513-540.<br />

Hajek, J., (1970). A characterization of limit<strong>in</strong>g distributions of regular estimators. Z.<br />

Wabrsclz. Tb. Verw. Gebiete 14,323-330.<br />

Ha.jek, J., (1972). Local asymptotic m<strong>in</strong>imax <strong>and</strong> admissibility <strong>in</strong> estimation. Proc. Sixth<br />

Berkeley Symp. Math. Statist. Probab. 1, University of California Press, Berkeley,<br />

175-194.<br />

Hasm<strong>in</strong>skii, R.Z., Nussbaum, M. (1984). An asymptotic m<strong>in</strong>imax bound <strong>in</strong> a regression<br />

model with an <strong>in</strong>creas<strong>in</strong>g number of nuisance <strong>parameter</strong>s. P. M<strong>and</strong>l, M. Huskova<br />

(eds.]. Proc. Third Prague Symp. As. Statistics, Elsevier, Amsterda.m, 275-283.<br />

Heckman J., S<strong>in</strong>ger, B., (1984). A method for m<strong>in</strong>imiz<strong>in</strong>g the impact of distributional<br />

assumptions <strong>in</strong> economic studies for duration data. Econometrica 52, 271-320.<br />

Hewitt, E., Stromberg, K. (1965). Real <strong>and</strong> Abstract Analysis, Spr<strong>in</strong>ger Verlag,<br />

Berl<strong>in</strong>.<br />

Jewell, N.P., (1982). Mixtures of exponential distributions. Annals Statist. 10, 419-484.<br />

Kiefer J., Wolfowitz, J., (1956). Consistency of the maximum likelihood estimator <strong>in</strong><br />

the presence of <strong>in</strong>f<strong>in</strong>itely many nuisance <strong>parameter</strong>s. Ann. Math. Statist. 27,<br />

887-906.<br />

Laird, N., (1978). Nonpa.ra.metric maximum likelihood estimation of a mix<strong>in</strong>g distribution.<br />

J. Amer. Statist. Assoc. 73, 805-811.<br />

L<strong>in</strong>dsay, B.G., (1983a). The geometry of mixture likelihoods, I <strong>and</strong> II. Annals Statist. 11,<br />

86-94 <strong>and</strong> 783-792.<br />

L<strong>in</strong>dsay, B.G., (1983b). Efficiency of the conditional score <strong>in</strong> a. mixture sett<strong>in</strong>g. Annals<br />

Statist. 11, 486-497.<br />

L<strong>in</strong>dsay, B.G., (1985). Us<strong>in</strong>g empirical partially Bayes <strong>in</strong>ference for <strong>in</strong>creased efficiency.<br />

Annals Statist 13, 914-93l.<br />

Nussbaum, M., (1984). An asymptotic m<strong>in</strong>imax risk bound for estimation of a l<strong>in</strong>ear<br />

functional relationship. J. Multivariate Anal. 14,300-314.<br />

Pfa.nzagl, J., Wefelmeyer, W., (1982). Contributions to a General Asymptotic Statistical<br />

Theory. Lecture Notes <strong>in</strong> Statistics 13, Spr<strong>in</strong>ger Verlag, New York.<br />

Pfanaagl, J., We£elmeyer, W., (1985). Asymptotic Expansions for General Statistical<br />

Models. Lecture Notes <strong>in</strong> Statistics 31, Spr<strong>in</strong>ger Verlag, New York.<br />

Pfa.nzagl, J., (1987). Consistency of Maximum Likelihood Estimators for Certa<strong>in</strong> Nonparametric<br />

Families, <strong>in</strong> particular: Mixtuces. Prepr<strong>in</strong>t 110, University of Cologne.<br />

18


-<br />

Stone, C" (1975). Adaptive maximum likelihood estimation of a location <strong>parameter</strong>.<br />

Allllals Statist. 3, 267-284.<br />

Vaart, A.W. van der, (1988). Statistical Estimation <strong>in</strong> Large Parameter Spaces.<br />

CWI-tra.ct 44, Centrum voor Wiskunde en Informatica, Amsterdam.<br />

Vaarl, A.W. van der, (1988a). Estimat<strong>in</strong>g a real <strong>parameter</strong> <strong>in</strong> a. class of semi-parametric<br />

<strong>models</strong>. Annals Statist. 16, to appear.<br />

19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!