estimating a parameter in incidental and structural models - CiteSeerX
estimating a parameter in incidental and structural models - CiteSeerX
estimating a parameter in incidental and structural models - CiteSeerX
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
-<br />
ESTIMATING A PARAMETER IN INCIDENTAL AND STRUCTURAL MODELS<br />
BY APPROXIMATE MAXIMUM LIKELIHOOD<br />
by<br />
Aad van der Vaart<br />
TECHNICAL REPORT No. 139<br />
October 1987 (revisedJuly 1988)<br />
Department or Statistics, GN~22<br />
University of Wash<strong>in</strong>gton<br />
Seattle, Wash<strong>in</strong>gton 98195 USA
-<br />
Estimat<strong>in</strong>g a <strong>parameter</strong> <strong>in</strong><br />
<strong>in</strong>cidental <strong>and</strong> <strong>structural</strong> <strong>models</strong><br />
by approximate maximum likelihood<br />
Aad van der Vaart 1<br />
Free University Amsterdam<br />
October 1987, revised July 1988<br />
Let Xl, X 2 1 ••• be <strong>in</strong>dependent r<strong>and</strong>om vectors, where X j has density<br />
Jh,(z) 9'(~'(Z),s) d~j(z). In the <strong>structural</strong> version ofthe model<br />
"1j = .,., is a. fixed unknown probability distribution, while <strong>in</strong> the functional<br />
model 11j is a unit mass <strong>in</strong> Z jJ where {z j} is an unknown sequence<br />
of vectors. It is proposed to estimate () by a. one-step estimator,<br />
based on a MLE for n -1 :Ej=l"1jo The construction is illustrated<br />
<strong>in</strong> the paired exponential model, the errors-<strong>in</strong>-variables model <strong>and</strong> a<br />
scale mixture over a normal family.<br />
AMS 1980 subject classifications: 62F12, 62G05. .<br />
Keywords <strong>and</strong> phrases: Structural model, <strong>in</strong>cidental <strong>parameter</strong>s, functional model,<br />
mixture model, asymptotic efficiency.<br />
1 Research partially supported by the Netherl<strong>and</strong>s Organization for the Advancement<br />
of Pnre Research (Z.W.O.) <strong>and</strong> ONR Contract N00014-80-C-0163.<br />
1
1. Introduction.<br />
Let e be an open subset of lRA: <strong>and</strong> 1-£ a collection of probability measures on a<br />
measurable space (Z,C). Given I) E e let 1/18 be a measurable map between measurable<br />
spaces (X, B) <strong>and</strong> (Y, A). Furthermore, lor each (8, z) E e x Z let p,(-,z) be a probability<br />
density with respect to a zr-f<strong>in</strong>ite measure J.L on (X,B), hav<strong>in</strong>g the form<br />
(1.1) p,(r,z) = h,(r)g,(",,(r),z),<br />
lor measurable map. h, <strong>and</strong> g, (from (X, B) <strong>in</strong>to !R<strong>and</strong> (YxZ,AxC) <strong>in</strong>to lit, respectively).<br />
Suppose that X is a r<strong>and</strong>om element with density<br />
(1.2) p",(r) = Jp,(r,z) d~(z),<br />
where the mix<strong>in</strong>g distribution '7 is an element of 1-£. Then, by the factorization theorem,<br />
1/18(X) is a sufficient statistic for '7 E 1-£, if I) is fixed. It is assumed that<br />
(1.3) g",(y) = Jg,(y,z)d~(z)<br />
is the density of ",,(X) with respect to a zr-f<strong>in</strong>ite measure v on (Y,A).<br />
This paper is concerned with <strong>estimat<strong>in</strong>g</strong> I) on the basis of the first n elements of a<br />
sequence of <strong>in</strong>dependent r<strong>and</strong>om elements X 1,X2 , •••, where Xj has density P9,1/i' {'7j}<br />
be<strong>in</strong>g an unknown sequence <strong>in</strong> 1-£. There are two versions of this model. The <strong>structural</strong><br />
model (or mixture model) is simply the i.i.d. version, where every '7j is equal to one fixed,<br />
but unknown '7. The <strong>in</strong>cidental model (or functional model) has '7i equal to a unit mass <strong>in</strong><br />
zh {Zj} be<strong>in</strong>g an unknown sequence <strong>in</strong> Z. If 1-£ conta<strong>in</strong>s the unit masses, our formula.tion<br />
<strong>in</strong>cludes both.<br />
Suppose that the score function for I), 19,1/(Z) = 'V9Iogp9,1/(z) is well-def<strong>in</strong>ed <strong>and</strong> set<br />
(1.4) 1",(r) = i",(r) - E,(i",(X) I"',(X) = r)<br />
- - ~<br />
(1.5) I", = E", l",l",(X).<br />
We propose the follow<strong>in</strong>g estimator for I). Let 71n(l)) be a (restricted) maximum likelihood<br />
estimator <strong>in</strong> the <strong>structural</strong> version of the model, when I) is given. Thus 71n(l))<br />
satisfies<br />
• n<br />
(1.6) 11P8,~.(8)(Xj) = ''P IIp,,,(Xj),<br />
j=l<br />
fJE1-£n j=l<br />
for a given (possibly data-dependent) subset fln of 1-£. Next given a prelim<strong>in</strong>ary estimator<br />
8 n for I), let Tn be its one-step 'improvement'<br />
(1.7) Tn = O. + (t<strong>in</strong> -(n)lo ,; (0 )(Xj»)-, t<strong>in</strong> _(n )(Xj).<br />
. Un,f/n Un n,",n n . Un,f/n Un<br />
J=1 J=1<br />
2
For discretized <strong>and</strong> ..;;i'-consistent 8", it will be shown that<br />
(1.8)<br />
where ijn = n- 1 L:j=t '7j' The necessary regularity conditions <strong>and</strong> the choice of the sieves<br />
are discussed <strong>in</strong> detail for three examples.<br />
For the <strong>structural</strong> version ofthe model (1.8) typically implies that Tn is asymptotically<br />
efficient <strong>in</strong> the sense of semi-parametric model theory as <strong>in</strong>troduced <strong>in</strong> Begun, Hall, Huang<br />
<strong>and</strong> Wellner (1983). Indeed, there typically exists a one-dimensional submodel t -+ P 9+th,lJr<br />
such that<br />
for every h E nt' [cf, L<strong>in</strong>dsay (1983b), Pfanzagl <strong>and</strong> Wefelmeyer (1982, Chapter 14), van<br />
der Va.a.rt (1988a». As a consequence T« is efficient <strong>in</strong> the sense of Hajek (1970, 1972) <strong>in</strong><br />
a one-dimensional submodel, <strong>and</strong> hence efficient <strong>in</strong> the whole model.<br />
For the <strong>in</strong>cidental version of the model, or more generally the model as <strong>in</strong>troduced<br />
above, there exists no satisfactory theory of asymptotic efficiency, although steps towards<br />
establish<strong>in</strong>g such a theory have been taken <strong>in</strong> Hasm<strong>in</strong>skii <strong>and</strong> Nussbaum (1984), Nussbaum<br />
(1984) <strong>and</strong> Bickel <strong>and</strong> Klaassen (1986). In tbis case tbe estimator <strong>in</strong> (1.8) is asymptotically<br />
l<strong>in</strong>ear <strong>in</strong>, what may be called, the efficient <strong>in</strong>fluence function of the average density<br />
n- 1 L:j=lPfJ,lJi = P9,iJ..' Though the def<strong>in</strong>ition of Tn is based on the work<strong>in</strong>g pr<strong>in</strong>ciple that<br />
all '7's are equal, the estimator asymptotically improves upon other estimators <strong>in</strong> the literature.<br />
A similar contradictory pr<strong>in</strong>ciple is noted by L<strong>in</strong>dsay (1985), who proposes 'to<br />
<strong>in</strong>crease efficiency' by us<strong>in</strong>g the best estimator for the <strong>structural</strong> model with a fixed parametric<br />
family '7t. The idea.<strong>in</strong> this paper is that, though it is impossible to adapt to every '7i<br />
separately, adaptation to the average ijn is possible. (It will be assumed that the sequence<br />
ijn stabilizes as n -+ 00).<br />
By an. extension of this idea. one can actually show, that T,. is asymptotically <strong>in</strong>admi,,·<br />
sible <strong>in</strong> the class T of all estimator sequences which are asymptotically normal 'uniformly<br />
•<br />
over contiguous neighbourhoods'. One notes that (tn)-1 L:J:l 7]j <strong>and</strong> (tn)-l 2:j=t"'+l '7j<br />
are estimable too, <strong>and</strong>, without go<strong>in</strong>g <strong>in</strong>to any detail, it is clear that the 'knowledge' of<br />
these two quantities should enable one to do better than when 'know<strong>in</strong>g' ij,. only. (A recipe<br />
<strong>and</strong> further discussion can be found on pages 135-138 of van der Vaart (1988». However,<br />
we don't believe that this <strong>in</strong>admissibility result necessarily means that T", is not a 'good'<br />
estimator. It rather shows the difficulty of selt<strong>in</strong>g up a theory of asymptotic optimality<br />
for non-i.i.d, <strong>models</strong> as considered here. On the positive side it can be shown that T,. is<br />
optimal <strong>in</strong> the class of estimator sequences which are <strong>in</strong> T <strong>and</strong> are also asymptotically<br />
symmetric <strong>in</strong> the sense that<br />
•<br />
.;n(T. - 6) = n-'Lg.(Xj) + OF"",,,,.. (1).<br />
j=1<br />
3
An extension of an <strong>in</strong>terest<strong>in</strong>g conjecture of Bickel <strong>and</strong> Klaassen (1986) is tha.t optimality<br />
would rema<strong>in</strong> true, when asymptotic symmetry is replaced here by symmetry <strong>in</strong> the<br />
observations for every n. However, we know of no proof of this conjecture.<br />
The organization of the paper is as follows. Section 2 conta<strong>in</strong>s conditions for the onestep<br />
estimator (1.7) to satisfy (1.8). Here 11n(9) is allowed to be a general estimator <strong>and</strong><br />
need not be the maximum likelihood estimator. It turns out that no rate of convergence of<br />
11n(9) is required. However, consistency <strong>in</strong> a suitable topology, which depends on the model,<br />
is crucial. For this reason Section 3 starts with the <strong>in</strong>troduction of a class of topologies on<br />
1i. Next it is shown that 1}"Ui) satisfy<strong>in</strong>g (1.6) is 'consistent for it,,'. Here the sets 'H." are<br />
chosen data-dependent <strong>and</strong> of a simple form, with a view towards applicability. F<strong>in</strong>ally,<br />
Sections 4-6 conta<strong>in</strong> examples.<br />
Efficient estimators for the <strong>structural</strong> version of the model were already constructed<br />
<strong>in</strong> van der veart (1988) <strong>and</strong> (for the example <strong>in</strong> Section 5) In Bickel <strong>and</strong> Ritov (1987).<br />
Both constructions are based on kernel estimators for 90,,, <strong>and</strong> use a rather large number of<br />
tricks. Advantages of the estimator (1.6)-(1.7) are that it is better adapted to the mixture<br />
form of the model, that it uses maximum likelihood, <strong>and</strong> that it is simple <strong>and</strong> avoids<br />
dependence on unknown <strong>parameter</strong>s such as the b<strong>and</strong>width of a kernel.<br />
An important open problem concerns the behaviour of the maximum likelihood estima.tor<br />
for (9,11), def<strong>in</strong>ed as the pair maximiz<strong>in</strong>g TIj=lPS,,,(Xj) over (3 x 'Hn • We feel<br />
that with a similar choice of sieves as <strong>in</strong> this paper (or maybe even without sieves), the<br />
9-component may well be asymptotically equivalent to T" given by (1.7)-(1.8), <strong>in</strong> both the<br />
<strong>structural</strong> <strong>and</strong> functional version of the model. This problem has been open s<strong>in</strong>ce Kieler<br />
<strong>and</strong> Wolfowitz (1956) considered consistency <strong>and</strong> appears to be hard.<br />
2. General theorem.<br />
Under appropriate smoothness conditions we have<br />
where 'VS1fo(z) is written <strong>in</strong> the form of a k x m matrix with the derivatives with respect<br />
to 9j <strong>in</strong> its i-th row, <strong>and</strong> Qs,'If(Y) = 'V 1J<br />
9s,,,/ 9S,,,(Y). When substract<strong>in</strong>g the conditional<br />
expectation given 1fs(X), the third term cancels. This motivates to assume the existence<br />
of measurable maps fIs, ,(fis <strong>and</strong> Qs,'If such that<br />
(2.1) 1,-n =H. +;fi, 0 Q.,.(",.)<br />
(2.2) E,(;fi,(X) I"'.(X») =o.<br />
Set<br />
Let 5 be a semi-metric, which makes 1i <strong>in</strong>to a separable metric space. Assume that<br />
•<br />
(2.3) ij,,=n- 1 L 11j !.. n<br />
j=:1<br />
4
Q,.,".(y) - Q,,"(y)<br />
(2.4) /3,. (y) - /3'(y) every y, every 1'n ---. -y,<br />
g,.".(y) - g",(y)<br />
,<br />
(2.5)<br />
{<br />
SUP IQ,•."I'/3;'g,.,s. : n = 1,2, ...} is v-equi-<strong>in</strong>tegrable,<br />
"EU<br />
for some open neighbourhood U of 1].<br />
Next assume the existence of estimators -r7n(8), based on .,p(J(X 1 ) , ••• , tP9(Xn), such<br />
that for every sequence 8 n with yn(8 n - 8) --+ h<br />
(2.6)<br />
By 'estimators' it will be understood that every 1}n(8) is a. measurable map from X n <strong>in</strong>to<br />
?i with respect to the Borel e-field on ?-t.<br />
F<strong>in</strong>ally assume the existence of measurable functions 19(z,z) such that for every sequence<br />
Bn with ,;n(Bn - B) _ h<br />
(2.7) Ji,p,dp. = 0<br />
(2.8) JJ [,;n(p:, -pi) - Wi,prj' dp.df<strong>in</strong>-O<br />
(2.9) JJli,l' s» dp.df<strong>in</strong> = 0(1)<br />
(2.10) j'r . Ii,.I' P'. dp. df<strong>in</strong> _ 0, every e > 0<br />
J{Il.,.. I~"'v'R}<br />
(2.11) JJIl,.p:' -l,p! I' dp.df<strong>in</strong> - 0<br />
(2.12) the limit po<strong>in</strong>ts of {i 9 ,'it,, } are nons<strong>in</strong>gular.<br />
We shall identify the e-ecoee i",(x) with p.. ~(x) J i,(x,z)p,(x,z)d~(z).<br />
Theorem 2.1. Let (2.1)-(2.12) hold. Let en be a ,;;:i-consistent, discretized estimator of<br />
B. DeJ<strong>in</strong>eTn by (1.7). Tben(1.8)bolds.<br />
Proof. Let ,;n(Bn - B) _ h. Assumptions (2.7)-(2.9) imply contiguity of the measures<br />
with densities ni~lP'.,.,(xj) <strong>and</strong> nj=lPB,., (xj), while (2.7)-(2.11) ensure that<br />
n<br />
(2.13) n-'''' (1 (x·)-l- (X.»)-l - hP···~,,···o<br />
L....i (1.. ,,,..) 9,'1..) 9,'1..<br />
i=l<br />
" - P.••• ", 1, :I ••••<br />
(2.14)<br />
0<br />
n-'L n l<br />
- '.,S. l B.", (X)<br />
j=l<br />
j - I B,S. - •<br />
5
To see this, one can.first note that (2.7).(2.11) imply analogous statements for correspond<strong>in</strong>g<br />
mixe~ quantities:<br />
(2.7')<br />
(2.8')<br />
(2.9')<br />
(2.10') everye> 0<br />
(2.11')<br />
(van der Vaarl (1988), Section 5.8.1). Next (2.9')-(2.10') also hold with ie" replaced by<br />
the correspond<strong>in</strong>g Ie" (van der Vaarl (1988), Lemma 5.20) <strong>and</strong> (2.11') implies<br />
•<br />
-'''j - • - .,<br />
(2.11") n L..J lie",ii"Pe"'lJi -lQ,fJ"PS,lJi I dJJ --+ 0,<br />
j=l<br />
(van dee Vaart (1988), pI68-169). F<strong>in</strong>ally, (2.7')-(2.9') imply local asymptotic normality,<br />
(2.13) follows from Proposition A.10 <strong>in</strong> van der Vaarl (1988) <strong>and</strong> (2.14) is a version of the<br />
law of large numbers.<br />
By (2.4) the map , --+ Qe,.,.(Y) is cont<strong>in</strong>uous for every y. Thus the map (y,,) --+<br />
Qe,7(Y) is measurable (ef. Chapter 13 of Pfanzagl <strong>and</strong> Wefelmeyer (1985» <strong>and</strong> T. welldef<strong>in</strong>ed.<br />
By (2.1)-(2.2) <strong>and</strong> because 1).(8) depends on 1/-e(X,), ... ,1/-e(X.) only<br />
Ee. [In-.t (le.".(e.!lXj ) -t,..•JXj») 1'1 1/-e. (X,), ... ,1/-e. (X.»)<br />
(2.15)<br />
. ,<br />
::; n"LIQe.,;.(e.l(1/-e.(X;») -Qe•.•• (1/-••(X;»)j f3U"'•• (X;»).<br />
j=l<br />
By (2.3) <strong>and</strong> (2.6), for every open neighbourhood U of ry, this can be dom<strong>in</strong>ated by<br />
Under Pe",llL,"~,... the expectation of the first term. is<br />
j sup<br />
"l't,"l'1EU<br />
IQe.,7' - Qe•."I' pi. se;»; dv.<br />
6
The latter expression converges to zero as U decreases to 7] <strong>and</strong> n _ 00, by (2.4) <strong>and</strong><br />
(2.5). It follows that both sides of (2.15) converge to zero <strong>in</strong> probability. From this <strong>and</strong><br />
(2.13)-(2.14) it can be seen that<br />
n<br />
(2.16) n-''''(l - (X·) -l - (X.») p'.'!!;""" 0<br />
(2.17)<br />
L...J<br />
;=1<br />
6.. " .. (6 ..») 6..,'1..)<br />
n<br />
n -1L (i 6 .. ,Ij,,(6.. )4.. ,q..(6 ..)(Xj) -is",,,..4.. ,,,.. (X j ») P9.. ,~,,~, ... O.<br />
;=1<br />
The rest of the proof is st<strong>and</strong>ard, us<strong>in</strong>g (2.13)-(2.14), (2.16)-(2.17), (2.12) <strong>and</strong> the<br />
discretization of 8 n •<br />
3. q-topologyr consistency of (restricted) MLE's.<br />
In this section a family of topologies on the set of mix<strong>in</strong>g distributions is <strong>in</strong>troduced,<br />
which <strong>in</strong> applications can play the role of the topology needed <strong>in</strong> (2.3)-(2.6). Furthermore,<br />
it is shown that a (restricted) maximum likelihood estimator is consistent <strong>in</strong> this topology.<br />
It is assumed that Z is an open subset of IRm (or more generally a locally compact,<br />
Hausdorff space with a countable base <strong>and</strong> countable at <strong>in</strong>f<strong>in</strong>ity). For each n = 1,2,...<br />
Y n 1 , ••• , Y n n are measurable elements <strong>in</strong> a measurable space (Y, A), <strong>and</strong> Ynj has density<br />
(3.1) gs.,,;(Y) = Jgs.(y,z)d~j(z),<br />
with respect to a u-f<strong>in</strong>ite measure v. Here 7]1, ••• , 7]n are unknown probability measures<br />
on the Borel e-fleld C of Z, <strong>and</strong> the sequence On is treated. as known.<br />
The problem is to estimate t1n = n-<br />
1:Lj=l7]nj,<br />
based on Y n 1 , ••• ,Y<br />
3.1. q.topology. Let q be a cont<strong>in</strong>uous, positive function from Z <strong>in</strong>to rn.. Let 'H. be the<br />
set of all sub-probability distributions ~ on (Z, C) with Jq d~ < 00, Def<strong>in</strong>e a (metrizable)<br />
topology, called the q-topology, on 'He by say<strong>in</strong>g that<br />
(3.2) ~n s; ~ if <strong>and</strong> only if Jcq d~n ~ Jcq d~, all c E Co(Z).<br />
Here Co(Z) is the set of all cont<strong>in</strong>uous, real functions on Z which vanish at <strong>in</strong>f<strong>in</strong>ity {i.e.<br />
the closure <strong>in</strong> the uniform norm. of the set of cont<strong>in</strong>uous functions with compact support).<br />
The convergence concept (3.2) <strong>in</strong>deed corresponds to 8. topology on 'H•.<br />
Lenuna 3.1. The convergence concept (3.2) corresponds to a topology on 'H. with the<br />
properties:<br />
(i). For every M > 0, 'H M = {~ E 'H. : Jq d~ :'0 M} is q-compacl.<br />
(li). The q-topology is metrizable by a metric b q •<br />
Proof. Embed 'H M <strong>in</strong> the set TM of positive measures T on (Z,C) with total mass not<br />
exceed<strong>in</strong>g M, through<br />
(C E C).<br />
n n •
This embedd<strong>in</strong>g extends to an embedd<strong>in</strong>g of 1i. = UM>o1i M <strong>in</strong>to the set T of all positive<br />
measure~ on (Z,C). Clearly '7" ~ TJ if <strong>and</strong> only if r« - -r <strong>in</strong> the vague topology on T .<br />
Thus the q-topology is the relative vague topology of T under the above embedd<strong>in</strong>g. Now<br />
(ii) follows immediately from metrizability of the vague topology on T. (d. Bauer (1981),<br />
p243). Next, s<strong>in</strong>ce T M is vaguely compact, it suffices for (i) to show that 1{M is closed as<br />
a subset of T M , i.e. if for {'7..} C 1i.,<br />
Jcq dryn ---> Jcdr,<br />
all c E Co(Z),<br />
then we must show that dr = q dTJ for some TJ E Ji 8 • For m = 1,2,... let Xm E Co(Z)<br />
have compact support <strong>and</strong> be such that 0 ~ Xmil as m ---+ 00. Then<br />
JXm q-l dr = lim JXm q-lq dryn ,.:; l.<br />
n_~<br />
By monotone convergence we conclude f q-l dr ~ 1. Thus we can set dTJ = q-1dr.<br />
For q(z) =1 the q-topology <strong>in</strong>duced on the set of probability measures is precisely the<br />
weak topology. However, if q(z) ----+ 00 as z converges to the po<strong>in</strong>t at <strong>in</strong>f<strong>in</strong>ity (the boundary<br />
of Z), then the q-topology is stronger than the weak topology. For <strong>in</strong>stance if Z = :m.+<br />
<strong>and</strong> q(z) = z-2 V z2, then l1n ~ 11 if <strong>and</strong> only if<br />
Jj dryn ---> Jj dry,<br />
for all cont<strong>in</strong>uous functions j with j = o(q) as z ---> 00 or z t 0, i.e. j(z) = 0(z2) for<br />
z ---+ 00 <strong>and</strong> j(z) =O(z-2) if z ! O. For this topology the subset of probability measures is<br />
also closed <strong>in</strong> 1£•.<br />
3.2. Restricted MLE's. Pfanzagl (1987) shows that <strong>in</strong> the <strong>structural</strong> version of the<br />
model the unrestricted MLE is typically consistent <strong>in</strong> the weak topology. To enforce consistency<br />
<strong>in</strong> a general q-topology, we use a simple device: restrict the ma.ximization of the<br />
likelihood to the mix<strong>in</strong>g distributions with expectation of q bounded from above. Of course<br />
we don't want to assume that the true q-moment of '7 is known. Hence we must either let<br />
the bound <strong>in</strong>crease to <strong>in</strong>f<strong>in</strong>ity as n ---+ 00, or use a bound based on an (over)estimate of<br />
the true expectation of q, In our examples the second possibility turns out to be feasible,<br />
<strong>and</strong> the more convenient one. Therefore we restrict ourselves to this device.<br />
Apart from this, it may be useful or necessary for the actual computation of the<br />
maximum likelihood estimate, to restrict the maximization to a still smaller subset of<br />
mix<strong>in</strong>g distributions (for <strong>in</strong>stance a f<strong>in</strong>ite dimensional one). Of course this subset will<br />
depend on the number of observations <strong>and</strong> <strong>in</strong>crease as n ---+ 00.<br />
Let "H be the set of all probability measures ry on (Z,C) with Jqdry < 00. Next,<br />
given a 'r<strong>and</strong>om upperbound' On for Jq d'7 <strong>and</strong> a subset 1£n of 1£, let f<strong>in</strong> be an element of<br />
Hn :="Hn n try E "H: Jqdry < Qn} such that<br />
n<br />
(3.2) IIgOn,lj.(Yn;)
-<br />
for some c > o. The choice c ~ 1 yields the 'full' restricted MLE. By Lemma 3.1(i) this<br />
certa<strong>in</strong>ly exists if 1] - gs...,,(y) is cont<strong>in</strong>uous for every y <strong>and</strong> ?<strong>in</strong> is closed as a subset of<br />
1i., both with respect to the q-topology.<br />
It is shown below that i<strong>in</strong> thus def<strong>in</strong>ed is consistent <strong>in</strong> the q-topology provided that<br />
the union of the 1t n satisfies a denseness condition. This is satisfied <strong>in</strong> particular when<br />
?<strong>in</strong> = 1£ for every n. A consequence of this strong result is that the present asymptotics<br />
give little guidance concern<strong>in</strong>g the choice of sieves ?-In' Any reasonable sequence of sieves<br />
will have the denseness property. Of course, one expects that decreas<strong>in</strong>g the size of 'H.n will<br />
improve the performance of the estimator provided the true (average) mix<strong>in</strong>g distribution<br />
is conta<strong>in</strong>ed <strong>in</strong> ?<strong>in</strong>' but will make it worse <strong>in</strong> the converse case. Then, if there is no reason<br />
to assume that the true mix<strong>in</strong>g distribution has a certa<strong>in</strong> parametric form, it seems safest<br />
to choose the sieve as large as is computationally feasible.<br />
For comput<strong>in</strong>g an unrestricted MLE several algorithms have been suggested <strong>in</strong> the<br />
literature (cf. Laird (1978), Heckman <strong>and</strong> S<strong>in</strong>ger (1984), Jewell (1982), L<strong>in</strong>dsay (1983a)).<br />
Typically the unrestricted MLE can be chosen discrete with at most n support po<strong>in</strong>ts<br />
(L<strong>in</strong>dsay (1983a». The algorithms suggested by the above authors are more or less based<br />
on this property. We don't know whether the discreteness properly is reta<strong>in</strong>ed <strong>in</strong> a restricted<br />
maximization problem as above <strong>and</strong> have not studied algorithms for comput<strong>in</strong>g<br />
a restricted MLE <strong>in</strong> any detail. Of course, if the ?<strong>in</strong> are chosen f<strong>in</strong>ite dimensional, then<br />
computation is possible by a variety of algorithms, at least <strong>in</strong> pr<strong>in</strong>ciple.<br />
The follow<strong>in</strong>g regularity conditions are assumed to hold. For any subprobability measures<br />
"t, "tn, n = 1,2, ...<br />
(3.3) for every y, every "tn ~ "t, e; -t 8.<br />
- ,<br />
(3.4) 1]n -t 1].<br />
Call 1] E ?i identifiable if there exists n0"t :/; 1] <strong>in</strong> ?i such that<br />
1<br />
(3.5) 08,"'( dv = l.<br />
{g',"T =g",,}<br />
Identifiability <strong>in</strong> the case of mixtures over an exponential family is discussed <strong>in</strong> PIanzagl<br />
(1987).<br />
Next let Qn = qn(Yn11"" Ynn, 8n) be estimators such that<br />
Qn E IN" a.s.<br />
(3.6) Qn = OP'''''I1,.~ ....(L).<br />
P9.. ,rhl'l~,...(Qn > J qd1]) -t 1,<br />
as n -t 0Cl.<br />
F<strong>in</strong>ally let ?<strong>in</strong> be an <strong>in</strong>creas<strong>in</strong>g sequence of convex subsets of 1i, satisfy<strong>in</strong>g<br />
cc<br />
(3.7) U u; n h E 1{. : f q d-y :'0 M} is q-dense <strong>in</strong> h E 1{. : f q d-y :'0 M}, every M.<br />
n=l<br />
9
Theorem 3.2. Let (3.2}-(3.7) bold <strong>and</strong> 'I be identIfiable. Tben 5,(i<strong>in</strong>,'I) --+ 0 <strong>in</strong> outer<br />
P9..,'l'1,17::t,..• -probability.<br />
Theorem 3.2 is formulated <strong>in</strong> terms of outer measure, because it is hard to say <strong>in</strong><br />
general whether 81J(r<strong>in</strong>, TI) is measurable. However, when the restricted maximum likelihood<br />
estimator is unique <strong>and</strong> it" is compact, then for every closed F<br />
Under (3.3) the latter set is measurable, so that 1}n is measurable <strong>in</strong> the Borel e-field of<br />
1-£. We shall ignore the measurability issue <strong>in</strong> the examples <strong>in</strong> Sections 4-6.<br />
Proof. Given a constant M > Jq d." let 1-(.M be all sub-probability measures -y with<br />
Jq d-y $ M. Choose a. sequence {1]n} <strong>in</strong> 1i M with 11" E ?<strong>in</strong> <strong>and</strong> TJn ..!.. TJ.<br />
Fix a E (0,1) <strong>and</strong>'1 E "liM. By convexity of u --+ ulog(l + a(u -1)), identifiability<br />
of 11 <strong>and</strong> Jensen's <strong>in</strong>equality<br />
(3.8) r log [1 + a (g, .• -1)] g". dv > O.<br />
J{u".,>O}<br />
gfJ,...,<br />
M<br />
For U C"li write ii'.,u(y) = sUP"EUg,.d(Y)' Set Urn = (-y' E"li M : 5,('1','1) <<br />
m- 1 } . By (3.4) for every m = m n -+ 00<br />
(3.9)<br />
every y, as n -+ 00.<br />
By (3.3)-(3.4), (3.8)-(3.9) <strong>and</strong> Fatou's Lemma there exists a constant M, such that<br />
l<strong>in</strong>~~Jlog [1 + a (.g,. ,.. -1)] 1\ M, s»; ,n. dv > O.<br />
99.. ,U.....<br />
(Note that log(l + ,,(u -1)) 2: Iog(l-,,) if u 2: 0). Thus for every '1 E "liM there exists a<br />
q-open neighbourhood U.., <strong>and</strong> a. constant M'l such that, with<br />
Zn;('1) = log [1 +" (~(Yn;) -1)]<br />
9f",U..,<br />
n<br />
(3.10) lim<strong>in</strong>fE,.,."." ... n-'"<br />
n-oo<br />
L.....J<br />
j=1<br />
(Zn;(-r) 1\ M,) > O.<br />
On {Qn = M} f7n <strong>and</strong> TIn are both conta<strong>in</strong>ed <strong>in</strong> 71 n . By (3.2) <strong>and</strong> convexity of ?<strong>in</strong><br />
Rewrite this as<br />
n<br />
-1"1 g,.,•••+(l-.)•• (y ) < -11<br />
n L-J og nj _ -n age.<br />
;=1 99",;'..<br />
10
i: (3.11) n- 1 log [1 + a (g,",," (Y ni ) - 1)] S -n-1logc.<br />
;=1 99..,'1..<br />
Fix 0 > O. The set A ='liM - 17 E 'liM : 5.C7.~) < e] is q-compact by Lemma 3.2(i).<br />
From the cover {U., : "f E A} extract a. f<strong>in</strong>ite subcover U-rn"" U.,•. By (3.11)<br />
(5.(~n,~) 2 e A Qn = M}<br />
cU • { n- 't log [1 +aU'""" (Yn ; ) -<br />
i=l ;=1 ge " ,U' j<br />
1)] S -n- 'lo<br />
gc}<br />
c;~ {n-1t(Zn;C7,) AM,,) S -n-1logc}<br />
By (3.10) <strong>and</strong> the law of large numbers the ps..,J/t,'I2 •... -probability of the last set converges<br />
to zero.<br />
F<strong>in</strong>ally<br />
p,"".,,,,... (5,(~n,~) 2 0)<br />
M'<br />
< L p,"""",... (5,(~n,~) 2 c A Qn = M) + p,"""",... (e, ~ (J qd~,M'J)<br />
M~[I, d,I+1<br />
Here the second term can be made arbitrarily small by (3.6) <strong>and</strong> the first term converges<br />
to zero for every fixed M' > 0, by the above argument.<br />
4. Paired exponential model.<br />
Write the observations as pairs (X, Y) <strong>and</strong> let<br />
(z E Z = m.+ , {e, y) E (ID,2)+ , () E e = IR+). Thus <strong>in</strong> the <strong>in</strong>cidental version of the model,<br />
the problem is to estimate (), the ratio of the hazard rates with<strong>in</strong> pairs of exponentially<br />
distributed variables, where the basel<strong>in</strong>e hazards %i are unknown <strong>and</strong> may differ over pairs.<br />
With ",,(X,Y) = X + OY it follows that<br />
oo<br />
- x-6y 9~ 8y-x<br />
l",(z,y) = 20(z + Oy) + g, (z + Oy) 20 '<br />
where 9,,(8) = f z2 s e- u d,,(z) for 8 > O. Given X + 6Y = 8, X - 6Y has a uniform<br />
o<br />
distribution on [-s,s]. Thus f!;(s) ~ s'/(120'). Assume that<br />
(4.1 ) 7]" --+ " <strong>in</strong> the weak topology<br />
(4.2) l-: + z')d~n(z) ~ j(z-' + z')d~(z) < 00.<br />
11
Let ?<strong>in</strong> be an <strong>in</strong>creas<strong>in</strong>g sequence of subsets of the set 'H of all probability measures on<br />
m+ ea tis.fy<strong>in</strong>g (3.7) with q(z) = z', (0 < c:s h fixed). Set<br />
(4.3) ?<strong>in</strong> = 'H." n {-y E 1/ ; f z' d-y(z) :s eln2:i~l XT'}<br />
Theorem 4.1. Let (4.1)-(4.3) bold, let en be a discretized, yTi'-consistent estimator (or<br />
8 <strong>and</strong> let ~n(8) maximize IIi~lP.,.(X;'Yj) over ?<strong>in</strong>. Then t; def<strong>in</strong>ed by (1.7) sa tisnes<br />
(1.8).<br />
3-)'1<br />
As for the regularity conditions, it is known from van der Vaart (1988), Theorem<br />
5.17, that (4.2) is unnecessary for the existence of an estimator sequence Tn satisfy<strong>in</strong>g<br />
(1.8). Thus one might hope that Theorem[1 can be slightly improved.<br />
It is well-known that the unique solution of<br />
(4.4)<br />
~Xj-fJYj =0<br />
L...X· +8Y'<br />
j=l J J<br />
is a vn-consistent estimator for fJ. In fact, it is asymptotically normal for every sequence<br />
{"Ij}, because the distribution of the Xj/Y j is <strong>in</strong>dependent of "Ii'<br />
Proof. The theorem is a corollary of Theorems 2.1 <strong>and</strong> 3.1 applied with the q-topology of<br />
q(z) = z'. It is tedious, but straightforward to check (2.7)-(2.12) (ef. van der Va.a.rt (1988),<br />
pI56-159). From the other assumptions only (2.5) needs comment. First<br />
(4.5)<br />
The set of functions .z: ---+ (s.z:)f.-ce-.u on m+, (0 < s < 1), is uniformly bounded <strong>and</strong><br />
equi-cont<strong>in</strong>uous. Hence it is pre-compact <strong>in</strong> Co(Z) <strong>and</strong> if ""( is sufficiently close to "I <strong>in</strong><br />
the as-topology, then !(.u)"'-Ce-U ZC d""(.z:) is uniformly close to !(sz)f-Ce-u.z:c d71(z) .<br />
Therefore, the right h<strong>and</strong> side of (4.5) can for s < 1 <strong>and</strong>-r sufficiently close to "I be bounded<br />
by<br />
(4.6)<br />
Let 0 < a < b < 00 be such that p = 71( a, b) > O. Then for ""( sufficiently close to 71, we have<br />
""(a,b) > ~p. But then the right h<strong>and</strong> side of (4.5) can for such rr <strong>and</strong> s > 1 be bounded<br />
12
y<br />
(4.7)<br />
s: S'Z4.-.. d7(Z)]<br />
2+2(3bs)'+2 ",<br />
[ fa sz2e-u d7(Z)<br />
g,.(s)<br />
< [2 + 18b 2s2 +2(sp~a2e-b')-ls3e-2b' 1~ Z4e-:tJ:d7(Z)] gi7n(s)<br />
::; [2 + 18b's' + 4p- 1a-'.-'·s-'64 ] g,.(8).<br />
Comb<strong>in</strong>ation of the bounds (4.5)-(4.7) shows that (2.5) is satisfied, if for sufficiently<br />
large N the set {(8' +8'-') g,.(8): n > N} is equi-<strong>in</strong>tegrable. Now by (4.1)-(4.2)<br />
J(8' + 8'-') g,. (8) ds = J(6Z-' + r(c)z'-') d'<strong>in</strong>(z)<br />
--+ J(6z-' +r(c)z'-') d1)(z) = J(8' +8'-') g.(8)d8.<br />
Then (cf. Theorem 13.47 <strong>in</strong> Hewitt <strong>and</strong> Stromberg (1965))<br />
5. Errors-<strong>in</strong>-variables.<br />
Write the observations as pairs (X, Y). The <strong>in</strong>cidental version of the model is given<br />
by<br />
Xi = Zj + e;<br />
Yi = ,,+ /3zi+ ii,<br />
where (i~), (i:),... are i.i.d. unobservable N(o,:E -1) distributed vectors, <strong>and</strong> Z i unknown<br />
numbers <strong>in</strong> Z = lEl Set (J = (a,p, E). To make this <strong>parameter</strong> identifiable <strong>in</strong> the <strong>structural</strong><br />
version of the model one can put restrictions on either 1-{. or :E. Indeed, it suffices that 1i<br />
does not conta<strong>in</strong> normal distributions (where po<strong>in</strong>tmasses are considered normal): alternatively<br />
it can be assumed that :E = if; :Eo, where Eo is known. Identifiability is obviously<br />
crucial for the existence of a. yn..consistent estimator sequence for (J. However, it does<br />
not playa role <strong>in</strong> the validity of Theorem 2.1 on the improvement of such an estimator.<br />
Therefore, we do not discuss the matter here, but refer to the rather large literature on the<br />
model. See Ander.;on (1984) <strong>and</strong> Bickel <strong>and</strong> Ritov (1987) <strong>and</strong> lhe references cited there.<br />
The assumptions imposed do a.ffect the dimension of (J, though. Below we give the formulas<br />
for the case that E is a. free positive def<strong>in</strong>ite matrix (; ~) <strong>and</strong> write (J = (a,fJ,u.,T,P).<br />
A sufficient statistic <strong>in</strong> this model is '!f9(X,Y) = (~)'E(Y~a)' Its distribution is a.<br />
mixture of normal distributions with density<br />
13<br />
-
where ul = (~)'E(~).<br />
Set M. = [- u;'(~)(;)'E. Then<br />
o 0<br />
P T<br />
;j,.(z,y) = 1 0 M, ( z ).<br />
y-Q<br />
o IJ<br />
{3 1<br />
Furthermore, 1.,.(z,y) = F.,.(Z,y)-' J1.(z ,v. z) PB(z, y, z) d'7(z), where<br />
1.(z,y,z) =<br />
(:)'M.(.:o)<br />
z(:)'M.(.:o)<br />
-t [
-<br />
treat<strong>in</strong>g 8 as known, automatically satisfies the moment condition <strong>in</strong> (5.5) (Lemma 5.1,<br />
below). Thus one can carry out the construction (1.7)-(1.8) with no restrietionat all when<br />
maximiz<strong>in</strong>g the likelihood [i.e. f<strong>in</strong> = 1-£).<br />
Proof. It is very tedious, but straightforward to check (2.7)-(2.11). Moreover Ie,,,,. -.1 8 ,,, ,<br />
'0 that (2.12) follows from (5.4). For fixed 8 the mix<strong>in</strong>g distribution '1 is identifiable <strong>in</strong><br />
g,•• by Proposition 6.2 of Pfansagl (1987). By (5.1)-(5.2) ~n -.!. '1 <strong>in</strong> the q-topology of<br />
q(z) = 1 V z2. Theorem 5.1 follows from Theorems 2.1 <strong>and</strong> 3.1, applied with this q<br />
topology. The only condition that needs comment is (2.5).<br />
For any r<strong>and</strong>om variable V <strong>and</strong> decreas<strong>in</strong>g function b on lR it holds that Cov(V, b(V))<br />
s O. In consequence E IVI¢(V) = E 1V1¢(1V1l s E IVI E ¢(V). Therefore<br />
Ig"7(')1 = If -
Multiply<strong>in</strong>g with q(Zi) <strong>and</strong> summ<strong>in</strong>g over i gives the 'self-consistency equations'<br />
(5.6) Jq(z)di<strong>in</strong>(z) = n-1tE•• (q(Z) 1 T = Ii),<br />
i=l<br />
where (T, Z) has distribution (I - z) d'l(z) under 'I.<br />
Next perturbation of the v's yields the stationary equations<br />
t h,(Ii-Zi)(Ii-Z,)_O<br />
i=1 L;:':, h,(Ii - Z,) - •<br />
Multiply<strong>in</strong>g with r(zj) <strong>and</strong> summ<strong>in</strong>g over i gives<br />
i = 1, ... ,m.<br />
n<br />
(5.7) n-'L:E•• (r(Z)Z 1 T = Ii) =n-1L:E•• (r(Z) 1 T = li)Ii'<br />
i=l<br />
i=l<br />
Comb<strong>in</strong>ation of (5.6)-(5.7) with q(z) = z <strong>and</strong> r(z) = 1 yields the first assertion of tbe<br />
lemma. Next comb<strong>in</strong>ation with q(z) = %2 <strong>and</strong> r(z) = z gives<br />
n<br />
6. Normal scale mixture.<br />
The <strong>in</strong>cidental version of the model is given by<br />
Xj = () + zj1ej,<br />
where el,e2, ... are unobservable, <strong>in</strong>dependent st<strong>and</strong>ard normal variables, () E e = m.<br />
<strong>and</strong> zi E Z =: m.+. Of course X 1,X2 , ... are sampled from distributions which are symmetric<br />
about (), <strong>and</strong> one may estimate () with the estimators of Stone (1975), Bickel <strong>and</strong><br />
Klaassen (1986), or van der Vaart (1988), Section 5.7.4, which are fully adaptive. However,<br />
these estimators do not take the normality of the error terms <strong>in</strong>to account, whereas the<br />
MLE-based estimator (1.7)-(1.8) does.<br />
With ifJ,(X) = IX - &/ it follows that<br />
. - g~( )<br />
i".(z) = i".(z) = g. Iz - 81<br />
where g.(s) = 2J o = z(zs)d'l(z) for s > O. Clearly (3~ ,,1.<br />
Assume that<br />
. -sgu(z - B),<br />
(6.1) f<strong>in</strong>. -+ f) <strong>in</strong> the weak topology<br />
(6.2) /,= (z2 + z-2) di<strong>in</strong>(z) -/,= (z2 + z-2) d'l(z) < 00<br />
1<br />
(6.3) = z2 df<strong>in</strong>.(Z) -+ 0, every E > O.<br />
,.,;n<br />
16<br />
-
Fix ~ < c < 2. Let 1£ be the set of probability measures on m. with f zt: d7](z) < 00, let<br />
1£1'10 be a .sequence of subsets satisfy<strong>in</strong>g (3.7) with q(z) = 1 Va", <strong>and</strong> set<br />
(6.4)<br />
where Qn = Op, ... 'l'I.,,~ •..• (I) <strong>and</strong> P9..,'Ih,'12•.•.(Qn > f zt:dfJn(z» --+ O.<br />
Theorem 6.1. Let (6.1)-(6.4) boJd, Jet 81'10 be a cUscretized yn-consi6te12t estimator for<br />
8 <strong>and</strong> Jet ~n(8) maximize rrj~IP,.• (Xj,Yj) overi/n. Then Tn def<strong>in</strong>ed by (1.7) satisfies<br />
(1.8).<br />
Construction of e, suitable sequence Qn is not entirely trivial. An example is<br />
Proof. The theorem is a corollary of Theorems 2.1 <strong>and</strong> 3.1 applied with the q-topology<br />
of q(z) = 1 V zt:. As <strong>in</strong> the previous sections the ma<strong>in</strong> problem is to check (2.5). First by<br />
uniform boundedness <strong>and</strong> equi-cont<strong>in</strong>uity of the functions z -+ (sz)3-t:¢(sz), (0 < s < 1),<br />
we have for 0 < s < 1 <strong>and</strong> I sufficiently close to 7] that<br />
(6.5)<br />
This set of functions is equl-<strong>in</strong>tegrable over (0,1) by (6.2)-(6.3). Next, with the same<br />
notation as <strong>in</strong> the proof of Theorem 4.1, the left h<strong>and</strong> side of (6.5) can for s > 1 be<br />
bounded by<br />
[16b 4 s 2 +(ap)-ls2e-~·21l235] gil..(s).<br />
Equi-<strong>in</strong>tegrability of these functions follows from equi-<strong>in</strong>tegrability of {s 2 gij.. (s)<br />
1,2,...}, which follows from (6.1)-(6.2).<br />
n<br />
Acknowledgement.<br />
I thank Y. Ritov for permission to <strong>in</strong>clude Lemma 5.1 <strong>in</strong> this paper.<br />
17
References<br />
Anderson, T.W., (1984). Estimat<strong>in</strong>g l<strong>in</strong>ear statistical relationships. Annals Statist. 12,<br />
1-45.<br />
Bauer, H., (1981). ProbabilityTheory <strong>and</strong> Elements ofMeasure Theory, Academic<br />
Press, London.<br />
Begun, J.M., Hall, W.J., Huang, W.M. <strong>and</strong> Wellner, J.A., (1983). Information <strong>and</strong> asymptotic<br />
efficiency <strong>in</strong> parametric-nonparametric <strong>models</strong>. Annals Statist. 11, 432-452.<br />
Bickel, P.J., Klaassen, C.A.J., (1986). Empirical Bayes estimation <strong>in</strong> functional <strong>and</strong> <strong>structural</strong><br />
modem, <strong>and</strong> uniform adaptive estimation of location. Adv. Appl. Matn. 7,<br />
55-69.<br />
Bickel. P.J., Ritov, Y. (1987). Efficient estimation <strong>in</strong> the errors <strong>in</strong> variables model. Annals<br />
Statist. 15, 513-540.<br />
Hajek, J., (1970). A characterization of limit<strong>in</strong>g distributions of regular estimators. Z.<br />
Wabrsclz. Tb. Verw. Gebiete 14,323-330.<br />
Ha.jek, J., (1972). Local asymptotic m<strong>in</strong>imax <strong>and</strong> admissibility <strong>in</strong> estimation. Proc. Sixth<br />
Berkeley Symp. Math. Statist. Probab. 1, University of California Press, Berkeley,<br />
175-194.<br />
Hasm<strong>in</strong>skii, R.Z., Nussbaum, M. (1984). An asymptotic m<strong>in</strong>imax bound <strong>in</strong> a regression<br />
model with an <strong>in</strong>creas<strong>in</strong>g number of nuisance <strong>parameter</strong>s. P. M<strong>and</strong>l, M. Huskova<br />
(eds.]. Proc. Third Prague Symp. As. Statistics, Elsevier, Amsterda.m, 275-283.<br />
Heckman J., S<strong>in</strong>ger, B., (1984). A method for m<strong>in</strong>imiz<strong>in</strong>g the impact of distributional<br />
assumptions <strong>in</strong> economic studies for duration data. Econometrica 52, 271-320.<br />
Hewitt, E., Stromberg, K. (1965). Real <strong>and</strong> Abstract Analysis, Spr<strong>in</strong>ger Verlag,<br />
Berl<strong>in</strong>.<br />
Jewell, N.P., (1982). Mixtures of exponential distributions. Annals Statist. 10, 419-484.<br />
Kiefer J., Wolfowitz, J., (1956). Consistency of the maximum likelihood estimator <strong>in</strong><br />
the presence of <strong>in</strong>f<strong>in</strong>itely many nuisance <strong>parameter</strong>s. Ann. Math. Statist. 27,<br />
887-906.<br />
Laird, N., (1978). Nonpa.ra.metric maximum likelihood estimation of a mix<strong>in</strong>g distribution.<br />
J. Amer. Statist. Assoc. 73, 805-811.<br />
L<strong>in</strong>dsay, B.G., (1983a). The geometry of mixture likelihoods, I <strong>and</strong> II. Annals Statist. 11,<br />
86-94 <strong>and</strong> 783-792.<br />
L<strong>in</strong>dsay, B.G., (1983b). Efficiency of the conditional score <strong>in</strong> a. mixture sett<strong>in</strong>g. Annals<br />
Statist. 11, 486-497.<br />
L<strong>in</strong>dsay, B.G., (1985). Us<strong>in</strong>g empirical partially Bayes <strong>in</strong>ference for <strong>in</strong>creased efficiency.<br />
Annals Statist 13, 914-93l.<br />
Nussbaum, M., (1984). An asymptotic m<strong>in</strong>imax risk bound for estimation of a l<strong>in</strong>ear<br />
functional relationship. J. Multivariate Anal. 14,300-314.<br />
Pfa.nzagl, J., Wefelmeyer, W., (1982). Contributions to a General Asymptotic Statistical<br />
Theory. Lecture Notes <strong>in</strong> Statistics 13, Spr<strong>in</strong>ger Verlag, New York.<br />
Pfanaagl, J., We£elmeyer, W., (1985). Asymptotic Expansions for General Statistical<br />
Models. Lecture Notes <strong>in</strong> Statistics 31, Spr<strong>in</strong>ger Verlag, New York.<br />
Pfa.nzagl, J., (1987). Consistency of Maximum Likelihood Estimators for Certa<strong>in</strong> Nonparametric<br />
Families, <strong>in</strong> particular: Mixtuces. Prepr<strong>in</strong>t 110, University of Cologne.<br />
18
-<br />
Stone, C" (1975). Adaptive maximum likelihood estimation of a location <strong>parameter</strong>.<br />
Allllals Statist. 3, 267-284.<br />
Vaart, A.W. van der, (1988). Statistical Estimation <strong>in</strong> Large Parameter Spaces.<br />
CWI-tra.ct 44, Centrum voor Wiskunde en Informatica, Amsterdam.<br />
Vaarl, A.W. van der, (1988a). Estimat<strong>in</strong>g a real <strong>parameter</strong> <strong>in</strong> a. class of semi-parametric<br />
<strong>models</strong>. Annals Statist. 16, to appear.<br />
19