On Spatial Processes and Asymptotic Inference under Near$Epoch ...
On Spatial Processes and Asymptotic Inference under Near$Epoch ...
On Spatial Processes and Asymptotic Inference under Near$Epoch ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>On</strong> <strong>Spatial</strong> <strong>Processes</strong> <strong>and</strong> <strong>Asymptotic</strong> <strong>Inference</strong><br />
<strong>under</strong> Near-Epoch Dependence<br />
Nazgul Jenish 1 <strong>and</strong> Ingmar R. Prucha 2<br />
October 21, 2011<br />
1<br />
Department of Economics, New York University, 19 West 4th Street, New York,<br />
NY 10012. Tel.: 212-998-3891, Email: nazgul.jenish@nyu.edu<br />
2<br />
Department of Economics, University of Maryl<strong>and</strong>, College Park, MD 20742. Tel.:<br />
301-405-3499, Email: prucha@econ.umd.edu
Abstract<br />
The development of a general inferential theory for nonlinear models with crosssectionally<br />
or spatially dependent data has been hampered by a lack of appropriate<br />
limit theorems. To facilitate a general asymptotic inference theory relevant<br />
to economic applications, this paper …rst extends the notion of near-epoch dependent<br />
(NED) processes used in the time series literature to r<strong>and</strong>om …elds.<br />
The class of processes that is NED on, say, an -mixing process, is shown to<br />
be closed <strong>under</strong> in…nite transformations, <strong>and</strong> thus accommodates models with<br />
spatial dynamics. This would generally not be the case for the smaller class of<br />
-mixing processes. The paper then derives a central limit theorem <strong>and</strong> law<br />
of large numbers for NED r<strong>and</strong>om …elds. These limit theorems allow for fairly<br />
general forms of heterogeneity including asymptotically unbounded moments,<br />
<strong>and</strong> accommodate arrays of r<strong>and</strong>om …elds on unevenly spaced lattices. The<br />
limit theorems are employed to establish consistency <strong>and</strong> asymptotic normality<br />
of GMM estimators. These results provide a basis for inference in a wide range<br />
of models with cross-sectional or spatial dependence.<br />
JEL Classi…cation: C10, C21, C31<br />
Key words: R<strong>and</strong>om …elds, near-epoch dependent processes, central limit<br />
theorem, law of large numbers, GMM estimator
1 Introduction 1<br />
Models with cross-sectionally or spatially dependent data have recently attracted<br />
considerable attention in various …elds of economics including labor <strong>and</strong><br />
public economics, IO, political economy, macroeconomics, international economics,<br />
regional science <strong>and</strong> urban economics. In these models cross sectional<br />
dependence typically stems from agents’strategic interaction, neighborhood effects,<br />
shared resources <strong>and</strong> common shocks. Furthermore, economic agents are<br />
typically heterogeneous, e.g., vary in size. Mathematically, agents’observations<br />
may thus be viewed as a realization of a dependent heterogenous process indexed<br />
by a point in R d , d > 1, i.e., a r<strong>and</strong>om …eld. 2<br />
The aim of this paper is to de…ne a class of r<strong>and</strong>om …elds that is su¢ -<br />
ciently general to accommodate many applications of interest, <strong>and</strong> to establish<br />
corresponding limit theorems that can be used for asymptotic inference. In<br />
particular, we apply these limit theorems to prove consistency <strong>and</strong> asymptotic<br />
normality of generalized method of moments (GMM) estimators for a general<br />
class of nonlinear spatial models.<br />
There have been di¤erent approaches to modeling cross-sectional dependence<br />
in the econometrics literature. Two large str<strong>and</strong>s of the literature have focused<br />
on (i) linear spatial autoregressive models, also known as Cli¤-Ord (1981) type<br />
models 3 , <strong>and</strong> (ii) common factor models 4 . In both cases, progress was facilitated,<br />
loosely speaking, by imposing speci…c structural conditions on the data<br />
generating process, <strong>and</strong>/or by exploiting some <strong>under</strong>lying (conditional) independence<br />
assumptions. Another common approach to model dependence is through<br />
mixing conditions. Various mixing concepts developed for time series processes<br />
have been extended to r<strong>and</strong>om …elds. However, the respective limit theorems<br />
for r<strong>and</strong>om …elds have not been su¢ ciently general to accommodate many of<br />
the processes encountered in economics. This hampered the development of a<br />
general asymptotic inference theory for nonlinear models with cross-sectional<br />
dependence. Towards …lling this gap, Jenish <strong>and</strong> Prucha (2009) have recently<br />
introduced a set of limit theorems (CLT, ULLN, LLN) for -mixing r<strong>and</strong>om<br />
…elds on unevenly spaced lattices that allow for nonstationary processes with<br />
trending moments.<br />
Yet some dependent spatial processes do not satisfy mixing conditions. As<br />
with time series, spatial processes may be generated as functions of in…nitely<br />
many elements of an input process, with a spatial moving average process being<br />
1 We thank the participants of the Cowles Foundation Conference, Yale, June 2009, as well<br />
as the seminar participants at the Columbia University for helpful discussions. This research<br />
bene…tted from a University of Maryl<strong>and</strong> Ann G. Wylie Dissertation Fellowship for the …rst<br />
author <strong>and</strong> from …nancial support from the National Institute of Health through SBIR grant<br />
1 R43 AG027622 for the second author.<br />
2 The space <strong>and</strong> metric are not restricted to physical space <strong>and</strong> distance.<br />
3 For recent contributions see, e.g., Robinson (2009, 2007), Yu, de Jong <strong>and</strong> Lee (2008),<br />
Kelejian <strong>and</strong> Prucha (2007a,b,2004), Lee (2007a,b, 2004), <strong>and</strong> Chen <strong>and</strong> Conley (2001).<br />
4 For recent contributions, see, e.g., Andrews (2005), Bai <strong>and</strong> Ng (2004), Bai (2003), Phillips<br />
<strong>and</strong> Sul (2003).<br />
1
a leading example. As is well-known, the mixing property is not necessarily<br />
preserved <strong>under</strong> transformations involving in…nite number of lags. For instance,<br />
Andrews (1984) shows that a simple time series AR(1) process fails to be -<br />
mixing though its input process is independent. Thus, it is important to develop<br />
an asymptotic theory for a generalized class of r<strong>and</strong>om …elds that is “closed with<br />
respect to in…nite transformations.”<br />
To tackle this problem, the paper …rst extends the concept of near-epoch dependent<br />
(NED) processes used in the time series literature to spatial processes.<br />
The notion dates back to Ibragimov (1962), <strong>and</strong> Billingsley (1968). The NED<br />
concept, or variants thereof, have been used extensively in the time series literature<br />
by McLeish (1975a,b), Bierens (1981), Wooldridge (1986), Gallant (1987),<br />
Gallant <strong>and</strong> White (1988), Andrews (1988), Pötscher <strong>and</strong> Prucha (1991,1997),<br />
Davidson (1992, 1993, 1994) <strong>and</strong> de Jong (1997), among others. Doukhan <strong>and</strong><br />
Louhichi (1999), Ango Nze <strong>and</strong> Doukhan (2004) introduced an alternative class<br />
of dependent processes called “ -weakly dependent.”We provide various examples<br />
suggesting that the class of NED spatial processes, which subsumes mixing<br />
processes, is su¢ ciently broad to cover many applications of interest. For<br />
instance, we show that it covers Cli¤-Ord type processes, autoregressive <strong>and</strong><br />
moving average r<strong>and</strong>om …elds, <strong>and</strong>, more generally, nonlinear spatial Bernoulli<br />
shifts.<br />
The paper then derives a CLT <strong>and</strong> an LLN for spatial processes that are near<br />
epoch dependent on an -mixing input process. These limit theorems allow<br />
for fairly general forms of heterogeneity including asymptotically unbounded<br />
moments, <strong>and</strong> accommodate arrays of r<strong>and</strong>om …elds on unevenly spaced lattices.<br />
The LLN can be combined with the generic ULLN in Jenish <strong>and</strong> Prucha (2009)<br />
to obtain an ULLN for NED spatial processes. In the time series literature,<br />
CLTs for NED processes were derived by Wooldridge (1986), Davidson (1992,<br />
1993), <strong>and</strong> de Jong (1997). Interestingly, our CLT contains as a special case the<br />
CLT of Wooldridge (1986), Theorem 3.13 <strong>and</strong> Corollary 4.4.<br />
In addition, we give conditions <strong>under</strong> which the NED property is preserved<br />
<strong>under</strong> transformations. These results play a key role in verifying the NED property<br />
in applications. Thus, the NED property is compatible with considerable<br />
heterogeneity <strong>and</strong> dependence, invariant <strong>under</strong> transformations, <strong>and</strong> leads to<br />
a CLT <strong>and</strong> LLN <strong>under</strong> fairly general conditions. All these features make it a<br />
convenient tool for modeling spatial dependence.<br />
As an application, we establish consistency <strong>and</strong> asymptotic normality of<br />
spatial GMM estimators. These results provide a fundamental basis for constructing<br />
con…dence intervals <strong>and</strong> testing hypothesis for GMM estimators in<br />
nonlinear spatial models. Our results also exp<strong>and</strong> on Conley (1999), who established<br />
the asymptotic properties of GMM estimators assuming that the data<br />
generating process <strong>and</strong> the moment functions are stationary <strong>and</strong> -mixing. 5<br />
The rest of the paper is organized as follows. Section 2 introduces the concept<br />
of NED spatial processes <strong>and</strong> gives some mild conditions that ensure preserva-<br />
5 This important early contribution employs Bolthausen’s (1982) CLT for stationary -<br />
mixing r<strong>and</strong>om …elds on the regular lattice Z 2 . However, the mixing <strong>and</strong> stationarity assumptions<br />
may not hold in many applications. The present paper relaxes these critical assumptions.<br />
2
tion of the NED property <strong>under</strong> transformations. Section 3 provides examples<br />
of NED spatial processes. Section 4 contains the CLT <strong>and</strong> LLN for NED spatial<br />
processes. Section 5 establishes the asymptotic properties of spatial GMM<br />
estimators. All proofs are collected in the appendices.<br />
2 De…nition of NED <strong>Spatial</strong> <strong>Processes</strong><br />
Let D R d , d 1, be a (possibly) unevenly spaced lattice of locations in R d ,<br />
<strong>and</strong> let Z = fZi;n; i 2 Dn; n 1g <strong>and</strong> " = f"i;n; i 2 Tn; n 1g be triangular<br />
arrays of r<strong>and</strong>om …elds de…ned on a probability space ( ; F; P ) with Dn Tn<br />
D. The space R d is equipped with the metric (i; j) = max1 l d jjl ilj, where<br />
il is the l-th component of i. The distance between any subsets U; V D is<br />
de…ned as (U; V ) = inf f (i; j) : i 2 U <strong>and</strong> j 2 V g. Furthermore, let jUj denote<br />
the cardinality of a …nite subset U D.<br />
The r<strong>and</strong>om variables Zi;n <strong>and</strong> "i;n are possibly vector-valued taking their<br />
values in R pz <strong>and</strong> R p" . We assume that R pz <strong>and</strong> R p" are normed metric spaces<br />
equipped with the Euclidian norm, which we denote (in an obvious misuse of<br />
notation) as j:j. For any r<strong>and</strong>om vector Y , let kY k p = [E jY j p ] 1=p , p 1;<br />
denote its Lp-norm. Finally, let Fi;n(s) = ("j;n; j 2 Tn : (i; j) s) be the<br />
-…eld generated by the r<strong>and</strong>om vectors "j;n located in the s-neighborhood of<br />
location i.<br />
Throughout the paper, we maintain the above notational conventions as well<br />
as the following assumption concerning D.<br />
Assumption 1 The lattice D R d , d 1, is in…nite countable. All elements<br />
in D are located at distances of at least 0 > 0 from each other, i.e., for all i;<br />
j 2 D : (i; j) 0; w.l.o.g. we assume that 0 > 1.<br />
The assumption of a minimum distance has also been used by Conley (1999)<br />
<strong>and</strong> Jenish <strong>and</strong> Prucha (2009). It ensures the growth of the sample size as the<br />
sample regions Dn <strong>and</strong> Tn exp<strong>and</strong>. The setup is thus geared towards what is<br />
referred to in the spatial literature as increasing domain asymptotics.<br />
We now introduce the notion of near-epoch dependent (NED) r<strong>and</strong>om …elds.<br />
De…nition 1 Let Z = fZi;n; i 2 Dn; n 1g be a r<strong>and</strong>om …eld with kZi;nk p <<br />
1, p 1, let " = f"i;n; i 2 Tn; n 1g be a r<strong>and</strong>om …eld, where jTnj ! 1 as<br />
n ! 1, <strong>and</strong> let d = fdi;n; i 2 Dn; n 1g be an array of …nite positive constants.<br />
Then the r<strong>and</strong>om …eld Z is said to be Lp(d)-near-epoch dependent on the r<strong>and</strong>om<br />
…eld " if<br />
kZi;n E(Zi;njFi;n(s))k p di;n (s) (1)<br />
3
for some sequence (s) 0 with lims!1 (s) = 0. The (s), which are w.l.o.g.<br />
assumed to be non-increasing, are called the NED coe¢ cients, <strong>and</strong> the di;n are<br />
called NED scaling factors. Z is said to be Lp-NED on " of size if (s) =<br />
O(s ) for some > > 0. Furthermore, if sup n sup i2Dn di;n < 1, then Z is<br />
said to be uniformly Lp-NED on ":<br />
Recall that Dn Tn. Typically, Tn will be an in…nite subset of D, <strong>and</strong><br />
often Tn = D. However, to cover Cli¤-Ord type processes, see Section 3.3, Tn<br />
is allowed to depend on n <strong>and</strong> to be …nite provided that it increases in size with<br />
n.<br />
The role of the scaling factors fdi;ng is to allow for the the possibility of<br />
“unbounded moments”, i.e., sup n sup i2Dn di;n = 1. Unbounded moments may<br />
re‡ect trends in the moments in certain directions, in which case we may also<br />
use, as in the time series literature, the terminology of “trending moments”. The<br />
NED property is thus compatible with a considerable amount of heterogeneity.<br />
In establishing limit theorems for NED processes, we will have to impose restrictions<br />
on the scaling factors di;n. In this respect, observe that<br />
kZi;n E(Zi;njFi;n(s))k p kZi;nk p + kE(Zi;njFi;n(s))k p 2 kZi;nk p<br />
by the Minkowski <strong>and</strong> the conditional Jensen inequalities. Given this, we may<br />
choose di;n 2 kZi;nk p , <strong>and</strong> consequently w.l.o.g. 0 (s) 1; see, e.g.,<br />
Davidson (1994), p. 262, for a corresponding discussion within the context of<br />
time series processes. Note that by the Lyapunov inequality, if Zi;n is Lp-NED,<br />
then it is also Lq-NED with the same coe¢ cients fdi;ng <strong>and</strong> f (s)g for any<br />
q p.<br />
Our de…nition of NED for spatial processes is adapted from the de…nition of<br />
NED for time series processes. In the time series literature, the NED concept<br />
…rst appeared in the works of Ibragimov (1962) <strong>and</strong> Billingsley (1968), although<br />
they did use the present term. The concept of time series NED processes was<br />
later formalized by McLeish (1975a,b), Wooldridge (1986), Gallant (1987), Gallant<br />
<strong>and</strong> White (1988). These authors considered only L2-NED processes. Andrews<br />
(1988) generalized it to Lp-NED processes for p 1: Davidson (1992,<br />
1993, 1994) <strong>and</strong> de Jong (1997) further extended it to allow for trending time<br />
series processes.<br />
An important motivation for considering NED processes is that mixing is<br />
not necessarily preserved <strong>under</strong> transformations involving in…nitely many arguments;<br />
see, e.g., Andrews (1984) for simple examples in a time series context.<br />
However, as illustrated below, for a wide range of models the output process<br />
is obtained as a function of in…nitely many input variables. In those situations<br />
mixing of the input process does not necessarily carry over to the output process,<br />
<strong>and</strong> thus limit theorems for averages of the output process cannot simply be established<br />
from limit theorems for mixing processes. Nevertheless, as with time<br />
series processes, we show below that limit theorems can be extended to spatial<br />
processes that are NED on a mixing input process, provided the approxima-<br />
4
tion error declines “su¢ ciently fast” as the conditioning set of input variables<br />
exp<strong>and</strong>s.<br />
A further attractive feature of NED processes is that the NED property is<br />
preserved <strong>under</strong> transformations. Econometric estimators are usually de…ned<br />
either explicitly as functions of some <strong>under</strong>lying data generating processes or<br />
implicitly as optimizers of a function of the data generating process. Thus, if<br />
the data generating process is NED on some input process, the question arises<br />
<strong>under</strong> what conditions functions of r<strong>and</strong>om …elds are also NED on the same<br />
input process.<br />
Various conditions that ensure preservation of the NED property <strong>under</strong><br />
transformations have been established in the time series literature by Gallant<br />
<strong>and</strong> White (1988), <strong>and</strong> Davidson (1994). In fact, these results extend to r<strong>and</strong>om<br />
…elds. In particular, the NED property is preserved <strong>under</strong> summation <strong>and</strong> multiplication,<br />
<strong>and</strong> carries over from a r<strong>and</strong>om vector to its components <strong>and</strong> vice<br />
versa. For future reference, we now state some results for generalized classes<br />
of nonlinear functions. Their proofs are analogous to those in the time series<br />
literature, <strong>and</strong> therefore omitted.<br />
Consider transformations of Zi;n given by a family of functions<br />
gi;n : R pz ! R:<br />
The functions gi;n are assumed Borel-measurable for all n <strong>and</strong> i 2 D. They<br />
are furthermore assumed to satisfy the following Lipschitz condition: For all<br />
(z; z ) 2 R pz R pz <strong>and</strong> all i 2 Dn <strong>and</strong> n 1:<br />
where<br />
jgi;n(z) gi;n(z )j Bi;n(z; z ) jz z j (2)<br />
Bi;n(z; z ) : R pz R pz ! R+;<br />
is Borel-measurable. Of course, this condition would be devoid of meaning without<br />
further restrictions on Bi;n(z; z ), which are given in the next propositions.<br />
Proposition 1 Suppose gi;n( ) satis…es Lipschitz condition (2) with<br />
jBi;n(z; z )j C < 1<br />
for all (z; z ) 2 R pz R pz <strong>and</strong> all i <strong>and</strong> n. If for p 1 the fZi;ng are Lp-NED<br />
of size on f"i;ng with scaling factors fdi;ng, then gi;n(Zi;n) is also Lp-NED<br />
of size on f"i;ng with scaling factors f2Cdi;ng.<br />
Proposition 2 Suppose gi;n( ) satis…es Lipschitz condition (2) with<br />
sup<br />
s<br />
B (s)<br />
i;n 2 < 1 <strong>and</strong> sup<br />
s<br />
5<br />
B (s)<br />
i;n Zi;n<br />
eZ s i;n r < 1 (3)
for some r > 2; where B (s)<br />
i;n = Bi;n(Zi;n; e Z s i;n ) <strong>and</strong> e Z s i;n = E [Zi;njFi;n(s)]. If<br />
kgi;n(Zi;n)k 2 < 1 <strong>and</strong> Zi;n is L2-NED of size on f"i;ng with scaling factors<br />
fdi;ng ; then gi;n(Zi;n) is L2-NED of size (r 2)=(2r 2) on f"i;ng with scaling<br />
factors d0 2)=(2r<br />
i;n = d(r i;n<br />
2)<br />
sups B (s)<br />
i;n<br />
(r<br />
2<br />
2)=(2r 2)<br />
B (s)<br />
i;n Zi;n<br />
eZ s i;n<br />
r=(2r<br />
r<br />
2)<br />
Thus, the NED property is hereditary <strong>under</strong> reasonably weak conditions.<br />
These conditions facilitate veri…cation of the NED property in practical application.<br />
In particular, we will use them in the proof of asymptotic normality of<br />
spatial GMM estimators in Section 5.<br />
3 Examples of NED <strong>Spatial</strong> <strong>Processes</strong><br />
In this section, we give examples of some important classes of spatial processes,<br />
which are widely used in applications, <strong>and</strong> show that they satisfy the NED<br />
property. Of course, to establish limit theorems the NED property would be<br />
accompanied by mixing assumptions on the input process.<br />
3.1 Moving Average R<strong>and</strong>om Fields<br />
Consider the in…nite moving average (or linear) r<strong>and</strong>om …eld on Zd , say, Y =<br />
fYi; i 2 Zdg de…ned as<br />
Yi = X<br />
gij"j; (4)<br />
j2Z d<br />
where " = f"i; i 2 Zdg is a zero mean vector-valued r<strong>and</strong>om …eld on Zd <strong>and</strong> the<br />
gij are some real numbers. We assume furthermore that the coe¢ cients gij <strong>and</strong><br />
the innovations "i satisfy, respectively,<br />
<strong>and</strong><br />
lim<br />
s!1 sup<br />
X<br />
i2Zd j2Zd ; (i;j)>s<br />
jgijj = 0; (5)<br />
sup<br />
i2Zd k"ikp < 1 where p 1: (6)<br />
Linear stationary r<strong>and</strong>om …elds on Z 2 have been explored by Whittle (1954)<br />
in a paper that signi…cantly in‡uenced the subsequent modeling of spatial dependencies.<br />
More recently, linear r<strong>and</strong>om …elds on Z d , d 1, have been studied<br />
by Doukhan <strong>and</strong> Guyon (1991), see also Doukhan (1994). Analogous to the<br />
time series literature, Yi is de…ned as the limit in Lp as s ! 1 of the partial<br />
sums Y s<br />
i<br />
= P<br />
j2Z d ; (i;j) s gij"j. The following existence results is proven by<br />
Doukhan (1994), pp. 75-81. 6<br />
6 In fact, Doukhan (1994) proves a more general result that also covers the case p < 1.<br />
6<br />
:
Proposition 3 (Doukhan, 1994) Under conditions (5)-(6) the distribution of<br />
linear r<strong>and</strong>om …eld Y in (4) is well de…ned. In particular, the …nite dimensional<br />
distributions of Y are limits in Lp as s ! 1 of the those of the r<strong>and</strong>om …elds<br />
Y s = fY s<br />
i ; i 2 Zd g.<br />
We next give a result that shows that <strong>under</strong> the same conditions, the linear<br />
…eld Y is Lp-NED on the …eld ".<br />
Proposition 4 Under conditions (5)-(6) the linear r<strong>and</strong>om …eld Y in (4) is<br />
uniformly Lp-NED on ":<br />
The nice feature of this result is that it does not impose any additional<br />
conditions over <strong>and</strong> above those employed in Proposition 3 to establish the<br />
existence of the linear …eld.<br />
3.2 Autoregressive R<strong>and</strong>om Fields<br />
The linear r<strong>and</strong>om …elds of the previous example may arise as solutions of<br />
autoregressive spatial models. For example, Whittle (1954) considered the following<br />
simple autoregressive model on Z 2 :<br />
Yi1;i2 = Yi1+1;i2 + Yi1;i2+1 + "i1;i2 (7)<br />
where j j + j j < 1 <strong>and</strong> " = f"i1;i2 ; (i1; i2) 2 Z 2 g are i.i.d. r<strong>and</strong>om variables<br />
with zero mean <strong>and</strong> …nite variance. Whittle (1954), p. 447, gives the following<br />
the stationary solution of (7):<br />
Yi1;i2 =<br />
=<br />
1X<br />
1X<br />
j1=0 j2=0<br />
1X<br />
1X<br />
j1=i1 j2=i2<br />
j1 + j2<br />
Thus, (8) is a linear r<strong>and</strong>om …eld with<br />
j1<br />
j1 i1 + j2 i2<br />
j1 i1<br />
gij = j1 i1 + j2 i2<br />
j1 i1<br />
j1 j2 "i1+j1;i2+j2 (8)<br />
j1 i1 j2 i2 "j1;j2<br />
j1 i1 j2 i2 :<br />
for j = (j1; j2) i = (i1; i2) <strong>and</strong> zero, otherwise. Furthermore, note that<br />
the assumptions of Proposition 4 are satis…ed for p = 2, since P<br />
P 1<br />
k=0<br />
P k<br />
l=0<br />
k<br />
l j j l j j k l = P1 k=0 (j j + j j)k < 1, <strong>and</strong> hence<br />
0 lim<br />
s!1 sup<br />
X<br />
i2Z2 j i; (i;j)>s<br />
observing that j j + j j < 1.<br />
jgijj lim<br />
7<br />
s!1<br />
k=s<br />
1X<br />
(j j + j j) k = 0;<br />
j i jgijj =
3.3 Cli¤-Ord Type <strong>Spatial</strong> <strong>Processes</strong><br />
Consider the following Cli¤-Ord type model:<br />
Yn = WnYn + Xn + un (9)<br />
un = Mnun + vn<br />
where Yn = (Y1;n; :::; Yn;n) 0 is an n 1 vector of endogenous variables, Xn =<br />
(Xij;n) is n k matrix of regressors, Wn = (wij;n) <strong>and</strong> Mn = (mij;n) are n n<br />
nonstochastic weight matrices, vn = (v1;n; : : : ; vn;n) 0 is an n 1 vector of disturbances,<br />
<strong>and</strong> are scalar parameters typically referred to as spatial autoregressive<br />
parameters, <strong>and</strong> is a k 1 parameter vector. To simplify presentation, we<br />
assume in the following that k = 1 <strong>and</strong> use the notation Xn = (X1;n; :::; Xn;n) 0 .<br />
The above model is frequently referred to as a spatial ARAR(1,1) model, <strong>and</strong><br />
WnYn <strong>and</strong> Mnun are referred to as spatial lags. For recent contributions on<br />
estimation strategies for this model see, e.g., Robinson (2009, 2007), Kelejian<br />
<strong>and</strong> Prucha (2007a,b, 2004), <strong>and</strong> Lee (2007a,b, 2004).<br />
The spatial weights mij;n <strong>and</strong> wij;n are typically thought of as to depend<br />
on some measure of distance, say dij, between units i <strong>and</strong> j, where the weights<br />
decline as the distance increases. Thus although in Cli¤-Ord models observations<br />
are indexed by natural numbers, i.e., i = 1; : : : ; n, a leading case would<br />
be where i correspond to some location, say si 2 R d , <strong>and</strong> the distance between<br />
i <strong>and</strong> j is given by dij = (si; sj). In the following, we assume that<br />
Dn = fs1; : : : ; sng D, where the lattice D satis…es Assumption 1. However,<br />
to simplify notation we will use (i; j) instead of (si; sj). We assume<br />
furthermore that for all n<br />
In Wn <strong>and</strong> In Mn are nonsingular. (10)<br />
The reduced form of the model is then given by Yn = AnXn + Bnvn:with<br />
An = (aij;n) = (In Wn) 1 <strong>and</strong> Bn = (bij;n) = (In Wn) 1 (In Mn) 1 .<br />
In scalar notation we have for i = 1; :::; n:<br />
Yi;n =<br />
nX<br />
j=1<br />
aij;nXj;n +<br />
nX<br />
j=1<br />
bij;nvj;n;<br />
Although for …xed n; the output process Yi;n depends on only a …nite number of<br />
elements of the input process "i;n = (Xi;n; vi;n) 0 , the mixing property of "i;n may<br />
not carry over to Yi;n: The reason is that the number of elements composing the<br />
spatial lags grows unboundedly with the sample size so that the mixing property<br />
can break down in the limit. This is especially important when analyzing the<br />
asymptotic properties of Cli¤-Ord type processes.<br />
Towards establishing that Y = fYi;n; si 2 Dn; n 1g is NED on " =<br />
f"i;n; si 2 Dn; n 1g, we maintain the following assumptions:<br />
lim<br />
s!1 sup sup<br />
n<br />
X<br />
jaij;nj + jbij;nj = 0 (11)<br />
1 i n<br />
1 j n: (i;j)>s<br />
8
<strong>and</strong><br />
sup<br />
n<br />
sup<br />
1 i n<br />
We have the following result.<br />
k"i;nk p < 1 for some p 1: (12)<br />
Proposition 5 Under conditions (10)-(12), the process Y = fYi;n; si 2 Dn; n<br />
1g, where Yi;n is de…ned by (9), is uniformly Lp-NED on the process " =<br />
f"i;n; si 2 Dn; n 1g, where "i;n = (Xi;n; vi;n) 0 :<br />
since<br />
It is readily seen that a su¢ cient condition for (11) is that for some > 0,<br />
sup<br />
n<br />
sup<br />
n<br />
sup<br />
n<br />
sup<br />
sup<br />
1 i n<br />
j=1<br />
nX<br />
(jaij;nj + jbij;nj) (i; j) < 1 (13)<br />
X<br />
jaij;nj + jbij;nj<br />
1 i n<br />
1 j n; (i;j)>s<br />
sup<br />
X<br />
(jaij;nj + jbij;nj)<br />
1 i n<br />
1 j n; (i;j)>s<br />
1<br />
s sup sup<br />
n<br />
1 i n<br />
j=1<br />
(i; j)<br />
s<br />
nX<br />
(jaij;nj + jbij;nj) (i; j) ! 0 as s ! 1:<br />
A condition analogous to (13) has been used recently by Kelejian <strong>and</strong> Prucha<br />
(2007a), <strong>and</strong> should be satis…ed in a wide range of applications. It is slightly<br />
stronger than the typical assumption in the Cli¤-Ord literature which maintains<br />
assumptions that imply that the row <strong>and</strong> column sums of the absolute elements<br />
of the matrices An <strong>and</strong> Bn are uniformly bounded; compare, e.g., Kelejian <strong>and</strong><br />
Prucha (2007b) <strong>and</strong> Lee (2007a,b, 2004).<br />
3.4 <strong>Spatial</strong> Bernoulli Shifts<br />
Consider a real-valued r<strong>and</strong>om …eld Y = fYi; i 2 Z d g de…ned as:<br />
Yi = fi("i+j; j 2 Z d ) (14)<br />
where " = f"i; i 2 Z d g is a real-valued r<strong>and</strong>om …eld <strong>and</strong> the fi : R Zd<br />
! R<br />
are measurable functions. The process (14) is a generalization of one-parameter<br />
Bernoulli shifts considered by Doukhan <strong>and</strong> Louhichi (1999). A simple example<br />
of a spatial Bernoulli shift is the in…nite moving average r<strong>and</strong>om …eld of the<br />
…rst example. More generally, the fi may be nonlinear functions. We assume<br />
that the fi satisfy the following Lipschitz-type regularity condition:<br />
fi(xj; j 2 Z d ) fi(yj; j 2 Z d ) X<br />
wij jxj yjj (15)<br />
9<br />
j2Z d
for some nonnegative constants wij such that<br />
lim<br />
s!1 sup<br />
X<br />
wij = 0: (16)<br />
i2Zd j2Zd : (i;j)>s<br />
Finally, we assume that the input process " = f"i; i 2 Z d g has uniformly<br />
bounded second moments, i.e.,<br />
Then, one can establish the following result.<br />
sup<br />
i2Zd k"ik2 < 1 (17)<br />
Proposition 6 Under conditions (15)-(17), the r<strong>and</strong>om …eld Y = fYi; i 2 Z d g<br />
given by (14) is well-de…ned <strong>and</strong> uniformly L2-NED on the r<strong>and</strong>om …eld " =<br />
f"i; i 2 Z d g.<br />
Conditions (15)-(17) are analogous to those used by Doukhan <strong>and</strong> Louhichi<br />
(1999) for time-series Bernoulli shifts. Condition (15) is ful…lled if the functions<br />
fi have bounded partial derivatives, i.e., j@fi=@xjj wij.<br />
Thus, the class of NED spatial processes covers fairly large classes of linear<br />
<strong>and</strong> nonlinear transformations of r<strong>and</strong>om …elds.<br />
4 Limit Theorems<br />
4.1 Central Limit Theorem<br />
In the following, we introduce our CLT for real valued r<strong>and</strong>om …elds Y =<br />
fYi;n; i 2 Dn; n 1g, where Z is assumed to be L2-NED on some vector-valued<br />
-mixing r<strong>and</strong>om …eld " = f"i;n; i 2 Tn; n 1g with the NED coe¢ cients<br />
f (s)g, where Dn Tn D <strong>and</strong> the lattice D satis…es Assumption 1. In the<br />
sequel, we will use the following notation:<br />
Sn = X<br />
i2Dn<br />
Yi;n;<br />
2<br />
n = var(Sn):<br />
For ease of reference, we state below the de…nition of the -mixing coe¢ cients<br />
employed in the paper.<br />
De…nition 2 Let A <strong>and</strong> B be two -algebras of F, <strong>and</strong> let<br />
(A; B) = sup(jP (AB) P (A)P (B)j; A 2 A; B 2 B);<br />
For U Dn <strong>and</strong> V Dn, let n(U) = ("i;n; i 2 U) <strong>and</strong> n(U; V ) =<br />
( n(U); n(V )). Then, the -mixing coe¢ cients for the r<strong>and</strong>om …eld " are<br />
de…ned as:<br />
(u; v; r) = sup<br />
n<br />
sup(<br />
n(U; V ); jUj u; jV j v; (U; V ) r):<br />
U;V<br />
10
Dobrushin (1968) showed that weak dependence conditions based on the<br />
above mixing coe¢ cients are satis…ed by broad classes of r<strong>and</strong>om …elds including<br />
Markov …elds. In contrast to st<strong>and</strong>ard mixing numbers for time-series processes,<br />
the mixing coe¢ cients for r<strong>and</strong>om …elds depend not only on the distance between<br />
two datasets but also their sizes. To explicitly account for such dependence, it<br />
is furthermore assumed that<br />
(u; v; r) '(u; v)b(r) (18)<br />
where the function '(u; v) is nondecreasing in each argument, <strong>and</strong> b(r) ! 0<br />
as r ! 1. The idea is to account separately for the two di¤erent aspects of<br />
dependence: (i) decay of dependence with the distance, <strong>and</strong> (ii) accumulation of<br />
dependence as the sample region exp<strong>and</strong>s. The two common choices of '(u; v)<br />
in the r<strong>and</strong>om …elds literature are<br />
<strong>and</strong><br />
'(u; v) = (u + v) ; 0; (19)<br />
'(u; v) = min fu; vg : (20)<br />
The above mixing conditions have been used extensively in the r<strong>and</strong>om …elds<br />
literature including Neaderhouser (1978), Takahata (1983), Nahapetian (1987),<br />
Bulinskii (1989), Bulinskii <strong>and</strong> Doukhan (1990), Tran (1990) <strong>and</strong> Bradley (1993).<br />
They are satis…ed by fairly large classes of r<strong>and</strong>om …elds. Bradley (1993) provides<br />
examples of r<strong>and</strong>om …elds satisfying conditions (18)-(19) with u = v <strong>and</strong><br />
= 1. Furthermore, Bulinskii (1989) constructs in…nite moving average r<strong>and</strong>om<br />
…elds satisfying the same conditions with = 1 for any given decay rate of coe¢<br />
cients b(r): Clearly, st<strong>and</strong>ard mixing coe¢ cients in the time series literature<br />
are covered by conditions (18)-(19) when = 0.<br />
Following the literature, we employ the above mixing conditions for the<br />
input r<strong>and</strong>om …eld, <strong>and</strong> impose the following restriction on the decay rates of<br />
the mixing coe¢ cients.<br />
Assumption 2 The -mixing coe¢ cients of " satisfy (18) <strong>and</strong> (19) for some<br />
0 <strong>and</strong> b(r), such that for some > 0<br />
where = =(2 + ).<br />
1X<br />
r=1<br />
r d( +1) 1 b 2(2+ ) (r) < 1; (21)<br />
Note that if = 1 this assumption also covers the case where '(u; v) is given<br />
by (20). Furthermore, the CLT relies on the following conditions regarding<br />
moments, the NED coe¢ cients <strong>and</strong> the NED scaling factors.<br />
Assumption 3 (a) (Uniform L2+ integrability)<br />
lim<br />
k!1 sup sup E[jYi;nj 2+ 1(jYi;nj > k)] = 0;<br />
n<br />
i2Dn<br />
where 1( ) is the indicator function <strong>and</strong> > 0 is as in Assumption 2.<br />
11
(b) infn jDnj 1 2 n > 0, where fDng is a sequence of …nite subsets of D such<br />
that jDnj ! 1 as n ! 1:<br />
(c) NED coe¢ cients satisfy P 1<br />
r=1 rd 1 (r) < 1:<br />
We can now state the following CLT for L2-NED r<strong>and</strong>om …elds.<br />
Theorem 1 Suppose fDng is a sequence of …nite subsets such that jDnj ! 1<br />
as n ! 1 <strong>and</strong> fTng is a sequence of subsets such that Dn Tn D of the<br />
lattice D satisfying Assumption 1: Let Y = fYi;n; i 2 Dn; n 1g be a real valued<br />
zero-mean r<strong>and</strong>om …eld that is L2-NED on a vector-valued -mixing r<strong>and</strong>om<br />
…eld " = f"i;n; i 2 Tn; n 1g. Suppose Assumptions 2 <strong>and</strong> 3 hold, then<br />
1<br />
n Sn =) N(0; 1):<br />
The above CLT contains the CLT for -mixing r<strong>and</strong>om …elds given in Jenish<br />
<strong>and</strong> Prucha (2009) as a special case. As shown earlier, the class of NED r<strong>and</strong>om<br />
…elds is an important extension of the class of -mixing r<strong>and</strong>om …elds, in that the<br />
class includes important <strong>and</strong> widely considered …elds, e.g., spatial autoregressive<br />
processes, that do not generally satisfy mixing conditions. It is for that reason<br />
that an estimation theory limited to the class of -mixing r<strong>and</strong>om …elds is not<br />
fully satisfactory; see, e.g., Gallant <strong>and</strong> White (1988) <strong>and</strong> Pötscher <strong>and</strong> Prucha<br />
(1997) for an analogous discussion in the time series literature. We note that<br />
Theorem 1 also contains as a special case the CLT for time series NED processes<br />
of Wooldridge (1986), see Theorem 3.13 <strong>and</strong> Corollary 4.4.<br />
Assumption 2 restricts the dependence structure of the input process ":<br />
Assumptions 3(a),(b) are st<strong>and</strong>ard in the limit theory of mixing processes,<br />
e.g., Wooldridge (1986), Davidson (1992), <strong>and</strong> de Jong (1997), <strong>and</strong> Jenish <strong>and</strong><br />
Prucha (2009).<br />
Assumption 3(a) is satis…ed if the Yi;n are uniformly Lp-bounded for some<br />
p > 2 + , i.e., sup n sup i2Dn kYi;nk p < 1. Assumption 3(b) is an asymptotic<br />
negligibility condition that ensures that no single summ<strong>and</strong> in‡uences disproportionately<br />
the entire sum. Assumption 3(c) controls the size of the NED<br />
coe¢ cients which measure the error in the approximation of Yi;n by ". Intuitively,<br />
the approximation errors have to decline su¢ ciently fast with each<br />
successive approximation. Assumption 3(c) is satis…ed if (r) = O(r d ) for<br />
some > 0; i.e., (r) is of size d.<br />
Theorem 1 can be easily extended to vector-valued …elds using the st<strong>and</strong>ard<br />
Cramér-Wold device.<br />
Corollary 1 Suppose fDng is a sequence of …nite subsets such that jDnj ! 1<br />
as n ! 1 <strong>and</strong> fTng is a sequence of subsets such that Dn Tn D of the<br />
lattice D satisfying Assumption 1: Let Y = fYi;n; i 2 Dn; n 1g with Yi;n 2<br />
12
R k be a zero-mean r<strong>and</strong>om …eld that is L2-NED on a vector-valued -mixing<br />
r<strong>and</strong>om …eld " = f"i;n; i 2 Tn; n 1g. Suppose Assumptions 2 <strong>and</strong> 3 hold with<br />
jYi;nj denoting the Euclidean norm of Yi;n <strong>and</strong> 2 n replaced by min( n), where<br />
n = V ar(Sn) <strong>and</strong> min(:) is the smallest eigenvalue, then<br />
1=2<br />
n Sn =) N(0; Ik):<br />
Furthermore, sup n jDnj 1 max ( n) < 1, where max(:) denotes the largest<br />
eigenvalue.<br />
4.2 Law of Large Numbers<br />
To prove consistency of spatial estimators, we also derive a weak LLN for r<strong>and</strong>om<br />
…elds which are for L1-NED on some mixing …eld. This LLN can then be<br />
used to establish uniform convergence of r<strong>and</strong>om functions by combining it<br />
with the generic ULLN given in Jenish <strong>and</strong> Prucha (2009), which transforms<br />
pointwise LLNs (at a given parameter value) into ULLNs. The LLN maintains<br />
the following moment <strong>and</strong> mixing assumptions.<br />
Assumption 4 (a) sup n sup i2Dn E jYi;nj p < 1:; p > 1:<br />
(b) The -mixing coe¢ cients of the input …eld " satisfy (18) for some function<br />
'(u; v) which is nondecreasing in each argument, <strong>and</strong> some b(r) such that<br />
P 1<br />
r=1 rd 1 b (r) < 1.<br />
We now have the following weak LLN.<br />
Theorem 2 Let fDng be a sequence of arbitrary …nite subsets of D such that<br />
jDnj ! 1 as n ! 1; where D Rd , d 1 is as in Assumption 1, <strong>and</strong> let<br />
Tn be a sequence of subsets of D such that Dn Tn. Suppose further that<br />
Y = fYi;n; i 2 Dn; n 1g is L1-NED on " = f"i;n; i 2 Tn; n 1g. If Y <strong>and</strong> "<br />
satisfy Assumption 4, then<br />
1<br />
jDnj<br />
X<br />
i2Dn<br />
(Yi;n EYi;n) L1<br />
! 0;<br />
We note that Y <strong>and</strong> " can be vector-valued r<strong>and</strong>om …elds. In this case, jYi;nj<br />
should be <strong>under</strong>stood as the Euclidean norm in the respective vector-space.<br />
Assumption 4(a) is a st<strong>and</strong>ard moment condition employed in weak LLNs for<br />
dependent processes. It requires existence of moments of order slightly greater<br />
than 1. Assumption 4(b) is weaker than the mixing conditions used for the CLT.<br />
Furthermore, the LLN does not require any restrictions on the NED coe¢ cients.<br />
In the time series literature, weak LLNs for NED processes have been obtained<br />
by Andrews (1988) <strong>and</strong> Davidson (1993), among others. Andrews (1988)<br />
derives an L1-law for triangular arrays of L1-mixingales. He then shows that<br />
NED processes are L1-mixingales, <strong>and</strong> hence, satisfy his LLN. Davidson (1993)<br />
extends the latter result to processes with trending moments.<br />
13
5 Large Sample Properties of <strong>Spatial</strong> GMM Estimators<br />
In this section, we apply the above developed limit theorems to establish the<br />
large sample properties of spatial GMM estimators <strong>under</strong> a reasonably general<br />
set of assumptions that should cover a wide range of empirical problems. More<br />
speci…cally, our consistency <strong>and</strong> asymptotic normality results (i) maintain only<br />
that the spatial data process is NED on an -mixing basis process to accommodate<br />
spatial lags in the data process as discussed above, (ii) allow for the data<br />
process to be located on an unevenly spaced grid, <strong>and</strong> (iii) allow for the data<br />
process to be non-stationary, which will frequently be the case in empirical applications.<br />
We also give our results <strong>under</strong> a set of primitive su¢ cient conditions<br />
for easier interpretation by the applied researcher. 7<br />
We continue with the basic set-up of Section 2. Consider the moment function<br />
qi;n : R pz ! R pq , where denotes the parameter space, <strong>and</strong> let 0 n 2<br />
denote the parameter vector of interest (which we allow to depend on n for reasons<br />
of generality). Suppose the following moment conditions hold<br />
Eqi;n(Zi;n; 0 n) = 0; (22)<br />
then the corresponding spatial GMM estimator is de…ned as<br />
where Qn: ! R,<br />
b n = arg min<br />
2 Qn(!; ); (23)<br />
Qn(!; ) = Rn( ) 0 PnRn( );<br />
Rn( ) = jDnj<br />
X<br />
1<br />
qi;n(Zi;n; );<br />
i2Dn<br />
where the Pn are a positive de…nite weighting matrices. To prove that b n is<br />
0<br />
a consistent estimator for n consider the following non-stochastic analogue of<br />
Qn, say<br />
Qn( ) = [ERn( )] 0 P [ERn( )] ; (24)<br />
where P denotes the probability limit of Pn. Then given the moment condition<br />
(22) clearly Rn( o o<br />
n) = 0, <strong>and</strong> the functions Qn are minimized at n with<br />
Qn( o n) = 0. In proving consistency, we follow the classical approach; see, e.g.,<br />
7 In an important contribution, Conley (1999) gives a …rst set of results regarding the<br />
asymptotic properties of GMM estimators <strong>under</strong> the assumption that the data process is stationary<br />
<strong>and</strong> -mixing. Conley also maintains some high level assumption such as …rst moment<br />
continuity of the moment function, which in turn immediately implies uniform convergence<br />
- see, e.g., Pötscher <strong>and</strong> Prucha (1989) for a discussion. Our results extend Conley (1999)<br />
in several important directions, as indicated above. We establish uniform convergence from<br />
primitive su¢ cient conditions via the generic uniform law of large numbers given in Jenish<br />
<strong>and</strong> Prucha (2009) <strong>and</strong> the law of large numbers given as Theorem 2 above.<br />
14
Gallant <strong>and</strong> White (1988) or Pötscher <strong>and</strong> Prucha (1997) for more recent expo-<br />
o<br />
sitions. In particular, given identi…able uniqueness of n we establish, loosely<br />
speaking, convergence of the minimizers b o<br />
n to the minimizers n by establishing<br />
convergence of the objective function Qn(!; ) to its non-stochastic analogue<br />
Qn( ) uniformly over the parameter space.<br />
Throughout the sequel, we maintain the following assumptions regarding the<br />
parameter space, the GMM objective function <strong>and</strong> the unknown parameters 0 n.<br />
Assumption 5 (a) The parameter space is a compact metric space with<br />
metric .<br />
(b) The functions qi;n : R pz ! R pq are B pz =B pq -measurable for each<br />
2 , <strong>and</strong> continuous on for each z 2 R pz :<br />
(c) The elements of the pq pq real matrices Pn are B-measurable, <strong>and</strong> Pn is<br />
positive de…nite a.s. Furthermore P = p lim Pn exists <strong>and</strong> P is positive<br />
de…nite.<br />
(d) The minimizers o n are identi…ably unique in the sense that for every " > 0,<br />
lim infn!1 inf 2 : ( ; o n ) " [ERn( )] 0 [ERn( )] > 0:<br />
Compactness of the parameter space as maintained in Assumption 5(a) is<br />
typical for the GMM literature. Assumptions 5(b),(c) imply that Qn( ; ) is<br />
measurable for all 2 , <strong>and</strong> Qn(!; ) is continuous on . Given those assumptions<br />
the existence of measurable functions b n that solves (23) follows, e.g., from<br />
Lemma A3 of Pötscher <strong>and</strong> Prucha (1997).<br />
Since P is positive de…nite, it is readily seen that Assumption 5(d) implies<br />
that for every " > 0 :<br />
lim inf<br />
n!1<br />
inf<br />
2 : ( ; o n ) " Qn( ) Qn( o n) > 0;<br />
observing that Qn( o n) = 0. Thus <strong>under</strong> Assumption 5(d) the minimizers<br />
are identi…ably unique; compare, e.g., Gallant <strong>and</strong> White (1988), p.19. For<br />
interpretation consider the important special case where o n = o , Rn( ) = R( )<br />
o<br />
<strong>and</strong> R( ) is continuous. In this case identi…able uniqueness of is equivalent<br />
to the assumption that<br />
o<br />
is the unique solution of the moment conditions, i.e.,<br />
Rn( ) 6= 0 for all 6= o ; compare, e.g., Pötscher <strong>and</strong> Prucha (1997), p. 16.<br />
5.1 Consistency<br />
o<br />
Given the minimizers n are identi…ably unique, b n is a consistent estimator<br />
o<br />
p<br />
for n if Qn converges uniformly to Qn, i.e., if sup 2 Qn(!; ) Qn( ) ! 0<br />
as n ! 1; this follows immediately from, e.g, Pötscher <strong>and</strong> Prucha (1997),<br />
Lemma 3.1.<br />
We now proceed by giving a set of primitive domination <strong>and</strong> Lipschitz type<br />
conditions for the moment functions that ensure uniform convergence of Qn to<br />
15<br />
o<br />
n
Q n. The conditions are in line with those maintained in the general literature<br />
on M-estimation, e.g., Andrews (1987), Gallant <strong>and</strong> White (1988), <strong>and</strong> Pötscher<br />
<strong>and</strong> Prucha (1989,1994).<br />
De…nition 3 Let fi;n : R pz ! R pq be B pz =B pq -measurable functions for<br />
each 2 , then:<br />
(a) The r<strong>and</strong>om functions fi;n(Zi;n; ) are said to be p-dominated on for<br />
some p > 1 if sup n sup i2Dn E sup 2 jfi;n(Zi;n; )j p < 1.<br />
(b) The r<strong>and</strong>om functions fi;n(Zi;n; ) are said to be Lipschitz in the parameter<br />
on if<br />
jfi;n(Zi;n; ) fi;n(Zi;n; )j Li;n(Zi;n)h( ( ; )) a.s., (25)<br />
for all ; 2 <strong>and</strong> i 2 Dn; n 1, where h is a nonr<strong>and</strong>om function<br />
with h(x) # 0 as x # 0, <strong>and</strong> Li;n are r<strong>and</strong>om variables with<br />
< 1 for some > 0:<br />
lim sup n!1 jDnj 1 P<br />
i2Dn EL i;n<br />
Towards establishing consistency of b n we furthermore maintain the following<br />
moment <strong>and</strong> mixing assumptions.<br />
Assumption 6 The moment functions qi;n(Zi;n; ) have the following properties:<br />
(a) They are p-dominated on for p = 2.<br />
(b) They are uniformly L1-NED on " = f"i;n; i 2 Tn; n 1g, where Dn Tn<br />
D, <strong>and</strong> " is -mixing with -mixing coe¢ cients the conditions stated in<br />
Assumption 4(b).<br />
(c) They are Lipschitz in the parameter on .<br />
Assumption 6(a) implies that sup n;i2Dn E jqi;n(Zi;n; )j p < 1 for each 2<br />
. Assumptions 6(b) then allow us to apply the LLN given as Theorem 2 above<br />
the sample moments Rn( ) = jDnj 1 P<br />
i2Dn qi;n(Zi;n; ).<br />
To verify Assumption 6(b) one can use either Proposition 1 or Proposition<br />
2 to imply this condition from the lower level assumption that the data Zi;n are<br />
L1-NED. For example, the qi;n are L1-NED, if the Zi;n are L1-NED <strong>and</strong> satisfy<br />
the Lipschitz condition of Proposition 1. Note that no restrictions on the sizes<br />
of the NED coe¢ cients are required.<br />
Assumption 6(c) ensures stochastic equicontinuity of qi;n w.r.t. . Stochastic<br />
equicontinuity jointly with Assumption 6(a) <strong>and</strong> the pointwise LLN enable us<br />
to invoke the ULLN of Jenish <strong>and</strong> Prucha (2009) to prove uniform convergence<br />
of the sample moments, which in turn is used to establish that Qn converges<br />
16
uniformly to Q n. A su¢ cient condition for Assumption 6(c) is existence of<br />
integrable partial derivatives of qi;n w.r.t. if 2 R k .<br />
Our consistency results for the spatial GMM estimator given by (23) is summarized<br />
by the next theorem.<br />
Theorem 3 (Consistency) Suppose fDng is a sequence of …nite sets of D such<br />
that jDnj ! 1 as n ! 1; where D R d ; d 1 is as in Assumption 1. Suppose<br />
further that Assumptions 5 <strong>and</strong> 6 hold. Then<br />
( b n; o n) p ! 0 as n ! 1;<br />
<strong>and</strong> Q n( ) is uniformly equicontinuous on .<br />
5.2 <strong>Asymptotic</strong> Normality<br />
We next establish that the spatial GMM estimators de…ned by (23) is asymptotically<br />
normally distributed. For that purpose, we need a stronger set of assumptions<br />
than for consistency, including di¤erentiability of the moment functions in<br />
. It proofs helpful to adopt the notation r in place of @=@ . 8<br />
o<br />
Assumption 7 (a) The minimizers n lie uniformly in the interior of with<br />
Rk . Furthermore E [Rn( o n)] = 0.<br />
(b) The functions qi;n : R pz ! R pq are continuously di¤erentiable w.r.t.<br />
in the interior of for each z 2 R pz :<br />
(c) The functions qi;n(Zi;n; o n) are uniformly L2-NED on " of size d, <strong>and</strong><br />
supn;i2Dn E jqi;n(Zi;n; o 0<br />
2+<br />
n)j < 1 for some<br />
r qi;n(Zi;n; ) are uniformly L1-NED on ".<br />
0<br />
> 0. The functions<br />
(d) The input process " = f"i;n; i 2 Tn; n 1g, where Dn Tn D, is -<br />
mixing <strong>and</strong> the mixing coe¢ cients satisfy Assumption 2 for some < 0 ,<br />
where 0 is the same as in Assumption 7(c).<br />
(e) The functions r qi;n are p-dominated on for some p > 1:<br />
(f) The functions r qi;n are Lipschitz in on .<br />
(g) infn min(jDnj 1 n) > 0 where n = V ar P<br />
(h) infn min [Er Rn( o n) 0 r Rn( o n)] > 0.<br />
i2Dn qi;n(Zi;n; o n) .<br />
8 To ensure that the derivatives are de…ned on the border of , we assume in the following<br />
that the moment functions are de…ned on an open set containing , <strong>and</strong> that the qi;n <strong>and</strong><br />
r qi;n are restrictions to .<br />
17
The …rst part of Assumption 7(a) is needed to ensure that the estimator<br />
b n lies in the interior of with probability tending to one, <strong>and</strong> facilitates the<br />
application of the mean value theorem to Rn( b n) around o n. The second part<br />
states in essence that the moment conditions are correctly speci…ed. Its violation<br />
will generally invalidate the limiting distribution result.<br />
Assumptions 7(c),(d),(g) enable us to apply the CLT for vector-valued NED<br />
processes given above as Corollary 1 to Rn( o n). Some low level su¢ cient conditions<br />
for Assumption 7(c) are given below. To establish asymptotic normality,<br />
we also need uniform convergence of r Rn on , which is implied via Assumptions<br />
7(c),(d),(e),(f).<br />
Given the above assumptions, we have the following asymptotic normality<br />
result for the spatial GMM estimator de…ned by(23).<br />
Theorem 4 Suppose fDng is a sequence of …nite sets of D such that jDnj ! 1<br />
as n ! 1; where D R d ; d 1 is as in Assumption 1. Suppose further that<br />
Assumptions 5- 7 hold. Then<br />
where<br />
A 1<br />
n BnB 0 nA 10<br />
n<br />
1=2 jDnj 1=2 b n<br />
o<br />
n =) N(0; Ik);<br />
An = [Er Rn( o n)] 0 P [Er Rn( o n)] <strong>and</strong> Bn = [Er Rn( o n)] 0 P<br />
h<br />
jDnj 1 i1=2 n :<br />
Moreover, jAnj = O(1); A 1<br />
n = O(1); jBnj = O(1); (BnB 0 n) 1 = O(1) <strong>and</strong><br />
hence, b n is jDnj 1=2 -consistent for<br />
o<br />
n.<br />
As remarked above, relative to the existing literature Theorem 4 allows for<br />
nonstationary processes <strong>and</strong> only assumes that qi;n <strong>and</strong> r qi;n are NED on<br />
an -mixing input process, rather than postulating that qi;n <strong>and</strong> r qi;n are -<br />
mixing. The latter assumption is restrictive in that the -mixing property is not<br />
preserved <strong>under</strong> transformation. Since the NED property is preserved <strong>under</strong> a<br />
wide range of transformations it accommodates a much larger class of dependent<br />
spatial processes, including in…nite moving average <strong>and</strong> spatial autoregressive<br />
models. As such, Theorem 4 should provide a basis for constructing con…dence<br />
intervals <strong>and</strong> hypothesis testing in a wider range of spatial models.<br />
Using Proposition 2, we now give some su¢ cient conditions for Assumption<br />
7(c).<br />
Assumption 8 The process fZi;n; i 2 Dn Tn; n 1g is uniformly L2-NED<br />
on f"i;n; i 2 Tn; n 1g of size 2d(r 1)=(r 2) for some r > 2:<br />
Assumption 9 For every sequence f ng on , the functions qi;n(Zi;n; n) <strong>and</strong><br />
r qi;n(Zi;n; n) satisfy Lipschitz condition (2) in z, that is, for gi;n = qi;n or<br />
r qi;n:<br />
jgi;n(z; n) gi;n(z ; n)j Bi;n(z; z ) jz z j :<br />
18
Furthermore, for the r > 2 as speci…ed in Assumption 8,<br />
sup sup<br />
n;i2Dn s<br />
B (s)<br />
i;n < 1 <strong>and</strong> sup sup<br />
2<br />
n;i2Dn s<br />
B (s)<br />
i;n Zi;n<br />
where B (s)<br />
i;n = Bi;n(Zi;n; e Zs i;n ) with e Zs i;n = E [Zi;njFi;n(s)].<br />
6 Conclusion<br />
eZ s i;n r<br />
< 1<br />
The paper develops an asymptotic inference theory for a class of dependent<br />
nonstationary r<strong>and</strong>om …elds that could be used in a wide range of econometric<br />
models with cross-sectional or spatial dependence, e.g., social interaction models.<br />
More speci…cally, the paper extends the notion of near-epoch dependent<br />
(NED) processes used in the time series literature to spatial processes. This<br />
allows to accommodate larger classes of dependent processes than mixing r<strong>and</strong>om<br />
…elds. The class of NED r<strong>and</strong>om …elds is “closed with respect to in…nite<br />
transformations ” <strong>and</strong> thus should be su¢ ciently broad for many applications<br />
of interest. In particular, it covers autoregressive <strong>and</strong> in…nite moving average<br />
r<strong>and</strong>om …elds as well as nonlinear spatial Bernoulli shifts. The NED property<br />
is also compatible with considerable heterogeneity <strong>and</strong> preserved <strong>under</strong> transformations<br />
<strong>under</strong> fairly mild conditions. Furthermore, a CLT <strong>and</strong> an LLN are<br />
derived for spatial processes that are NED on an -mixing process. Apart from<br />
covering a larger class of dependent processes, these limit theorems also allow<br />
for arrays of nonstationary r<strong>and</strong>om …elds on unevenly spaced lattices. Building<br />
on these limit results, the paper develops an asymptotic theory of spatial GMM<br />
estimators, which provides a basis for inference in a broad range of models with<br />
cross-sectional or spatial dependence.<br />
Much of the r<strong>and</strong>om …elds literature assumes that the process resides on an<br />
equally spaced grid. In contrast, <strong>and</strong> as in Jenish <strong>and</strong> Prucha (2009), we allow<br />
for locations to be unequally spaced. The implicit assumption of …xed locations<br />
seems reasonable for a large class of applications, especially in the short run.<br />
Still, an important direction for future work would be to extend the asymptotic<br />
theory to spatial processes with endogenous locations, while maintaining a set<br />
of assumptions that are reasonably easy to interpret. 9 <strong>On</strong>e possible approach<br />
may be to augment the contributions of the present paper with theory from<br />
point processes.<br />
A Appendix: Proofs for Section 3<br />
Proof of Proposition 4: Let Fi(s) = ("j; j 2 Z d : (i; j) s). By the<br />
Minkowski inequality for in…nite sums, the conditional Jensen inequality, <strong>and</strong><br />
9 Pinkse et al. (2007) made an interesting contribution in this direction. Their catalogue<br />
of assumption is at the level of Bernstein blocks. Without further su¢ cient conditions, veri-<br />
…cation of those assumptions would typically be challenging in practical situations.<br />
19
(6), we have<br />
kYi E[YijFi(s)]k p = X<br />
X<br />
j2Z d : (i;j)>s<br />
j2Z d ; (i;j)>s<br />
jgijj k"j E ["jjFi(s)]k p K (s):<br />
gij f"j E ["jjFi(s)]g<br />
P<br />
with K = 2 supj2Zd k"jkp < 1 <strong>and</strong> (s) = supi2Zd j2Zd : (i;j)>s jgijj. Since<br />
(s) ! 0 as s ! 1 by condition (5) we have kYi E[YijFi(s)]kp ! 0 as s ! 1,<br />
which proves the claim. Q.E.D.<br />
Proof of Proposition 5: Let Fi;n(s) = ("j;n; 1 j n: (i; j) s). By the<br />
Minkowski inequality, the conditional Jensen inequality, <strong>and</strong> (12), we have<br />
X<br />
kYi;n E[Yi;njFi;n(s)]k p<br />
+<br />
X<br />
1 j n; (i;j)>s<br />
1 j n; (i;j)>s<br />
jbij;nj kvj;n E[vj;njFi;n(s)]k p K (s)<br />
jaij;nj kXj;n E[Xj;njFi;n(s)]k p<br />
P<br />
with K = maxf2 kXj;nkp ; 2 kvj;nkpg < 1 <strong>and</strong> (s) = supn;1 i n 1 j n; (i;j)>s (jaij;nj + jbij;nj).<br />
Since (s) ! 0 as s ! 1 by condition (11) we have kYi E[YijFi(s)]kp ! 0 as<br />
s ! 1, which proves the claim. Q.E.D.<br />
Proof of Proposition 6: We …rst show that Yi is well de…ned as the limit in<br />
L2 as s ! 1 of the r<strong>and</strong>om variables<br />
where<br />
Y s<br />
i = fi(" (s)<br />
i+j ; j 2 Zd );<br />
" (s)<br />
i+j = "i+j for (i; j) s<br />
0 for (i; j) > s :<br />
In light of condition (15) we have for any s, m 2 N:<br />
Y s+m<br />
i<br />
Y s<br />
i 2<br />
= fi(" (s+m)<br />
i+j ) fi(" (s)<br />
i+j ) 2<br />
sup<br />
i2Zd k"ik2<br />
X<br />
j2Z d :s< (i;j) s+m<br />
X<br />
j2Z d :s< (i;j) s+m<br />
wij K sup<br />
p<br />
wij k"i+jk 2<br />
X<br />
i2Zd j2Zd wij<br />
: (i;j)>s<br />
with K = supi2Zd k"ik2. In light of (16)-(17) the r.h.s. converges monotonically<br />
to zero as s ! 1. Thus clearly Y s<br />
i is a Cauchy series in L2, <strong>and</strong> hence Yi is<br />
well de…ned since the L2 space is a Banach space <strong>and</strong> as such complete.<br />
Now, let Fi(s) = ("i+j; j 2 Zd : (i; j) s). By the minimum mean-squared<br />
error property of the conditional expectation, we have<br />
20
with<br />
kYi E[YijFi(s)]k 2 fi("i+j; Z d ) fi(" (s)<br />
i+j ; j 2 Zd ) 2<br />
(s) = sup<br />
K<br />
X<br />
j2Z d : (i;j)>s<br />
X<br />
i2Zd j2Zd : (i;j)>s<br />
wij K (s);<br />
wij ! 0<br />
as s ! 1 by condition (16). This completes the proof of the lemma. Q.E.D.<br />
B Appendix: Proofs for Section 4<br />
Throughout this appendix let Fi;n(s) = ("j;n; j 2 Tn : (i; j) s) be the<br />
-…eld generated by the r<strong>and</strong>om vectors "j;n located in the s-neighborhood of<br />
location i. Furthermore, C denotes a generic constant that does not depend on<br />
n <strong>and</strong> may be di¤erent from line to line.<br />
The proof of the CLT builds on Ibragimov <strong>and</strong> Linnik (1971), pp. 352-355,<br />
<strong>and</strong> makes use of the following lemmata:<br />
Lemma B.1 (Brockwell <strong>and</strong> Davis (1991), Proposition 6.3.9). Let Yn, n =<br />
1; 2; ::: <strong>and</strong> Vns; s = 1; 2; :::; n = 1; 2; :::, be r<strong>and</strong>om vectors such that<br />
(i) Vns =) Vs as n ! 1 for each s = 1; 2; :::<br />
(ii) Vs =) V as s ! 1, <strong>and</strong><br />
(iii) lims!1 lim sup n!1 P (jYn Vnsj > ) = 0 for every > 0:<br />
Then Yn =) V as n ! 1:<br />
Lemma B.2 (Ibragimov <strong>and</strong> Linnik (1971)) Let Lp(F1) <strong>and</strong> Lp(F2) denote,<br />
respectively, the class of F1-measurable <strong>and</strong> F2-measurable r<strong>and</strong>om variables<br />
satisfying k k p < 1. Let X 2 Lp(F1) <strong>and</strong> Y 2 Lq(F2): Then, for any 1<br />
p; q; r < 1 such that p 1 + q 1 + r 1 = 1,<br />
jCov(X; Y )j < 4 1=r (F1; F2) kXk p kY k q<br />
where (F1; F2) = sup A2F1;B2F2 (jP (AB) P (A)P (B)j).<br />
To prove the CLT for NED r<strong>and</strong>om …elds, we …rst establish some moment<br />
inequalities <strong>and</strong> a slightly modi…ed version of the CLT for mixing …elds<br />
developed in Jenish <strong>and</strong> Prucha (2009). It is helpful to introduce the following<br />
notation. Let X = fXi;n; i 2 Dn; n 1g be a r<strong>and</strong>om …eld, then<br />
kXk q := sup n;i2Dn kXi;nk q for q 1.<br />
21
Lemma B.3 Let fXi;ng be uniformly L2-NED on a r<strong>and</strong>om …eld f"i;ng with<br />
-mixing coe¢ cients (u; v; r) (u + v) b(r), 0. Let Sn = P<br />
i2Dn Xi;n<br />
<strong>and</strong> suppose that the NED coe¢ cients of fXi;ng satisfy P1 r=1 rd 1 (r) < 1<br />
<strong>and</strong> kXk2+ < 1 for some > 0. Then,<br />
(a)<br />
(b)<br />
jCov (Xi;nXj;n)j kXk2+<br />
n<br />
C1kXk2+ [h=3] d<br />
b =(2+ ) o<br />
([h=3]) + C2 ([h=3])<br />
where h = (i; j) <strong>and</strong> = =(2+ ). If in addition P 1<br />
r=1 rd( +1) 1 b =(2+ ) (r) <<br />
1, then for some constant C < 1, not depending on n<br />
jCov (Xi;nXj;n)j kXk2<br />
V ar (Sn) C jDnj :<br />
n<br />
C3kXk2+ [h=3] d<br />
b =(4+2 ) o<br />
([h=3]) + C4 ([h=3])<br />
where h = (i; j) <strong>and</strong> = =(4+2 ). If, in addition, P 1<br />
r=1 rd( +1) 1 b =(4+2 ) (r) <<br />
1 where = =(4 + 2 ), then for some constant C < 1, not depending<br />
on n<br />
V ar (Sn) CkXk2 jDnj :<br />
Proof of Lemma B.3: (a) For any i 2 Dn <strong>and</strong> m > 0, let<br />
m<br />
i;n = E(Xi;njFi;n(m));<br />
m<br />
i;n = Xi;n<br />
By the Jensen <strong>and</strong> Lyapunov inequalities, we have for all i 2 Dn; n; m 2 N<br />
<strong>and</strong> any 1 q 2 + :<br />
E m i;n<br />
<strong>and</strong> thus<br />
m<br />
i;n q<br />
m<br />
i;n<br />
q = EfjE(Xi;njFi;n(m))j q g EfE(jXi;nj q jFi;n(m))g = E jXi;nj q<br />
kXi;nk q kXk 2+ ;<br />
m<br />
i;n q<br />
2 kXi;nk q 2 kXk 2+ :<br />
Thus, both m i;n <strong>and</strong> m i;n are uniformly L2+ bounded. Also, note that<br />
sup<br />
n;i2Dn<br />
m<br />
i;n 2<br />
(m);<br />
given that the fXi;ng is uniformly L2-NED on f"i;ng <strong>and</strong> thus the NED-scaling<br />
factors can be chosen w.l.g. to be one. Furthermore, let ( m i;n) denote the<br />
-…eld generated by m i;n: Since ( m i;n) Fi;n(m), the mixing coe¢ cients of m i;n<br />
satisfy<br />
(1; 1; r)<br />
1; r 2m<br />
(Mm d ; Mm d ; r 2m); r > 2m<br />
22
where (u; v; r) are the mixing coe¢ cients of the input process ", since the mneighborhood<br />
of any point on D contains at most Mm d points of D for some M<br />
that does not depend on m, see the proof of Lemma A.1 of Jenish <strong>and</strong> Prucha<br />
(2009).<br />
Now, decompose Xi;n <strong>and</strong> <strong>and</strong> Xj;n as<br />
Xi;n = [h=3]<br />
i;n<br />
where h = (i; j). Then,<br />
jCov (Xi;nXj;n)j = Cov<br />
[h=3]<br />
+ i;n ; <strong>and</strong> Xj;n = [h=3] [h=3]<br />
j;n + j;n ;<br />
Cov<br />
+ Cov<br />
[h=3]<br />
i;n<br />
[h=3]<br />
i;n ;<br />
[h=3]<br />
j;n ;<br />
+ [h=3]<br />
i;n ;<br />
[h=3]<br />
j;n<br />
[h=3]<br />
i;n<br />
[h=3]<br />
j;n<br />
+ Cov<br />
+ Cov<br />
+ [h=3]<br />
j;n<br />
[h=3]<br />
i;n ;<br />
[h=3]<br />
i;n ;<br />
[h=3]<br />
j;n<br />
[h=3]<br />
j;n<br />
:<br />
(B.1)<br />
We will now bound separately each term on the r.h.s. of the last inequality.<br />
First, using Lemma B.2 with p = q = 2 + ; <strong>and</strong> r = (2 + )= yields the<br />
following bound on the …rst term:<br />
Cov<br />
[h=3]<br />
i;n ;<br />
[h=3]<br />
j;n<br />
4<br />
[h=3]<br />
i;n<br />
4kXk 2 2+<br />
2+<br />
C1kXk 2 2+ [h=3] d<br />
[h=3]<br />
j;n<br />
2+<br />
=(2+ ) (1; 1; [h=3]) (B.2)<br />
=(2+ ) M [h=3] d ; M [h=3] d ; h 2 [h=3]<br />
b =(2+ ) ([h=3])<br />
where = =(2 + ).<br />
Second, the Cauchy-Schwartz inequality gives the following bound on the<br />
second <strong>and</strong> third terms:<br />
Cov<br />
[h=3]<br />
i;n ;<br />
[h=3]<br />
j;n<br />
4<br />
[h=3]<br />
i;n<br />
2<br />
[h=3]<br />
j;n<br />
Similarly, the fourth term can be bounded as:<br />
Cov<br />
[h=3]<br />
i;n ;<br />
[h=3]<br />
j;n<br />
4<br />
[h=3]<br />
i;n<br />
2<br />
[h=3]<br />
j;n<br />
Collecting (B.2)-(B.4), we have<br />
n<br />
jCov (Xi;nXj;n)j kXk2+ C1kXk2+ [h=3] d<br />
2<br />
2<br />
4kXk2 ([h=3]) (B.3)<br />
8kXk2 ([h=3]) (B.4)<br />
b =(2+ ) o<br />
([h=3]) + C2 ([h=3])<br />
which proves the …rst inequality.<br />
Using this inequality as well as the bounds on the sizes of the sets given in<br />
23
Lemma A.1 of Jenish <strong>and</strong> Prucha (2009), we have<br />
V ar (Sn)<br />
X<br />
V ar (Xi;n) +<br />
X<br />
i2Dn<br />
i;j2Dn;i6=j<br />
2jDnj kXk 2<br />
2+ + C1kXk 2 2+<br />
+C2kXk2+<br />
X<br />
1X<br />
jCov (Xi;nXj;n)j<br />
X<br />
1X<br />
i2Dn r=1 j2Dn: (i;j)2[r;r+1)<br />
X<br />
i2Dn r=1 j2Dn: (i;j)2[r;r+1)<br />
X<br />
2jDnj kXk 2<br />
2+ + C jDnj kXk2+<br />
for some constant C < 1, not depending on n.<br />
r=1<br />
([ (i; j)=3])<br />
[ (i; j)=3] d<br />
b =(2+ ) ([ (i; j)=3])<br />
"<br />
1X<br />
r d( +1) 1 b =(2+ ) 1X<br />
(r) + r d 1 #<br />
(r)<br />
(b) To prove the second part of the lemma, apply Lemma B.2 with p = 2+ ;<br />
q = 2, <strong>and</strong> r = 2(2 + )= to obtain the following bound on the …rst term:<br />
Cov<br />
[h=3]<br />
i;n ;<br />
[h=3]<br />
j;n<br />
C3kXk2+ kXk2 [h=3] d<br />
r=1<br />
b =(4+2 ) ([h=3]) ; (B.5)<br />
where = =(4 + 2 ). The other terms on the r.h.s. of (B.1) are bounded as<br />
in part (a). Collecting (B.3)-(B.5) gives<br />
jCov (Xi;nXj;n)j<br />
n<br />
kXk2 C3kXk2+ [h=3] d<br />
b =(4+2 ) o<br />
([h=3]) + C4 ([h=3])<br />
as required. Finally, using similar arguments as in the proof of part (a), we can<br />
bound V ar (Sn) as<br />
V ar (Sn) CkXk2jDnj<br />
for some constant C < 1, not depending on n. Q.E.D.<br />
Theorem B.1 Suppose fDng is a sequence of …nite subsets of D, satisfying<br />
Assumption 1, with jDnj ! 1 as n ! 1: Suppose further that f"i;n; i 2 Dn; n 2<br />
Ng is an array of zero-mean r<strong>and</strong>om variables with -coe¢ cients (u; v; r)<br />
C(u + v) b(r) for some constants C < 1 <strong>and</strong> 0. Suppose for some > 0<br />
<strong>and</strong> > 0<br />
<strong>and</strong><br />
lim<br />
k!1 sup E[j"i;nj<br />
n;i2Dn<br />
2+ 1(j"i;nj > k)] = 0<br />
b(r) = O(r<br />
d(2 +1)<br />
with = max f ; 1= g, <strong>and</strong> suppose lim infn!1 jDnj 1 2 n > 0, then<br />
where 2 n = V ar P<br />
1<br />
n<br />
i2Dn "i;n .<br />
X<br />
"i;n =) N(0; 1):<br />
i2Dn<br />
24<br />
)<br />
CjDnj
The above CLT is in essence a variant of CLT for -mixing r<strong>and</strong>om …elds<br />
given as Corollary 1 of Theorem 1 in Jenish <strong>and</strong> Prucha (2009), applied to<br />
mixing coe¢ cients of the type (u; v; r) C(u + v) b(r); 0.<br />
Proof of Theorem B.1: The proof of the theorem is largely the same as the<br />
proof of Theorem 1 in Jenish <strong>and</strong> Prucha (2009). We will …rst show that all<br />
assumptions of that theorem except Assumption 3(c) are satis…ed. We will then<br />
show that the entire proof goes through if Assumption 3(c) is replaced by the<br />
condition b(r) = O(r d(2 +1) ) with = max f ; 1= g.<br />
Clearly, Assumptions 1, 2 <strong>and</strong> 5 of that theorem with ci;n = 1 are satis…ed.<br />
To verify Assumptions 3(a) note that<br />
1X<br />
(1; 1; r)r [d(2+ )= ] 1 1X<br />
1X<br />
2d( 1= ) 1<br />
C r C r 1 < 1;<br />
r=1<br />
r=1<br />
since = max f ; 1= g. As shown in the proof of Corollary 1 in Jenish <strong>and</strong><br />
Prucha (2009), the latter condition implies Assumption 3(a) of Theorem 1 in<br />
Jenish <strong>and</strong> Prucha (2009). Furthermore, it is easy to see that Assumption 3 (b)<br />
is also satis…ed. Indeed, for any u + v 4, we have<br />
1X<br />
d<br />
(u; v; r)r<br />
1<br />
1X<br />
4 C r d 1 b(r)<br />
1X<br />
C r d 1 r d(2 +1) < 1:<br />
r=1<br />
r=1<br />
Thus, all assumptions, except Assumption 3(c), of Theorem 1 in Jenish <strong>and</strong><br />
Prucha (2009) hold, <strong>and</strong> hence, all steps of its proof which do not rely on that<br />
assumptions remain valid in our case. Assumption 3(c) is only used in step 5<br />
of that proof. Speci…cally, all arguments in that step continue to hold given we<br />
show that there exists sequence mn such that<br />
<strong>and</strong><br />
r=1<br />
r=1<br />
m d njDnj 1=2 ! 0 as n ! 1 (B.6)<br />
(1; jDnj ; mn)jDnj 1=2 ! 0 as n ! 1 (B.7)<br />
(Though 1;1 is used instead of 1;jDnj in the proof of Theorem 1 of Jenish <strong>and</strong><br />
Prucha (2009), in fact the proof only relies on the coe¢ cient 1;jDnj; see Step 9<br />
(jEA3;nj ! 0) of the proof in the working version of the paper).<br />
The desired sequence mn can be chosen as<br />
It is immediate that (B.6) holds,<br />
To verify (B.7), observe<br />
(1; jDnj ; mn)jDnj 1=2<br />
mn = jDnj 1=2 = log jDnj 1=d<br />
:<br />
m d njDnj 1=2 = (log jDnj) 1 ! 0:<br />
CjDnj +1=2 m<br />
CjDnj +1=2 jDnj<br />
CjDnj<br />
d(2 +1)<br />
n<br />
1=2 jDnj<br />
=(2d) [log jDnj] (2 +1)+ =d ! 0:<br />
25<br />
=(2d) (2 +1)+ =d<br />
[log jDnj]
The rest of the proof is the same, word-by-word, as the proof of Theorem 1 of<br />
Jenish <strong>and</strong> Prucha (2009). Q.E.D.<br />
Proof of Theorem 1: Since the proof is lengthy it is broken into steps.<br />
1. Decomposition of Yi;n<br />
For any …xed s > 0, decompose Xi;n as<br />
where<br />
Let<br />
Yi;n = s i;n + s i;n<br />
s<br />
i;n = E(Yi;njFi;n(s)),<br />
Sn;s = X<br />
i2Dn<br />
s<br />
i;n = Yi;n<br />
s<br />
i;n; e Sn;s = X<br />
i2Dn<br />
s<br />
i;n<br />
s<br />
i;n<br />
2<br />
n;s = V ar [Sn;s] ; e 2<br />
h i<br />
n;s = V ar eSn;s<br />
Repeated use of the Minkowski inequality yields:<br />
Observe that<br />
<strong>and</strong> hence<br />
j n n;sj en;s, j n en;sj n;s: (B.8)<br />
E [E(Yi;njFi;n(s))jFi;n(m))] =<br />
E(Yi;njFi;n(s)); m s;<br />
E(Yi;njFi;n(m)); m < s:<br />
s<br />
i;n E( s i;njFi;n(m))<br />
2<br />
= kYi;n E[Yi;njFi;n(s)] E[Yi;njFi;n(m)] + E[(Yi;njFi;n(s))jFi;n(m)]k2 =<br />
kYi;n E(Yi;njFi;n(m))k2 C (m); if m s;<br />
kYi;n E(Yi;njFi;n(s))k2 C (s) C (m); if m < s:<br />
since by de…nition the sequence (m) is non-increasing. Thus, for any …xed<br />
s > 0, s i;n is uniformly L2-NED on " with the same NED coe¢ cients (m)<br />
as the r<strong>and</strong>om …eld fYi;ng. Furthermore, as shown in the proof of Lemma B.3,<br />
s<br />
i;n is also uniformly L2+ bounded.<br />
2. Bounds for the Variances of P Yi;n <strong>and</strong> P s i;n<br />
First note that exists 0 < B < 1 such that for all n<br />
BjDnj<br />
2<br />
n: (B.9)<br />
Furthermore, in light of Assumption 2, <strong>and</strong> observing that = =(4 + 2 )<br />
= =(2 + ) <strong>and</strong> b =(2+ ) (r) b 2(2+ ) (r) we have<br />
1X<br />
r d( +1) 1 b<br />
r=1<br />
=(2+ )<br />
1X<br />
r d( +1) 1 b =(4+2 ) (r)<br />
r=1<br />
26<br />
1X<br />
r=1<br />
1X<br />
r=1<br />
r d( +1) 1 b 2(2+ ) (r) < 1;<br />
r d( +1) 1 b 2(2+ ) (r) < 1:
Using part (a) of Lemma B.3 with Xi;n = Yi;n, we have<br />
BjDnj<br />
2<br />
n = V ar (Sn) C jDnj .<br />
for some B > 0. Using part (b) of Lemma B.3 with Xi;n = s i;n we have<br />
Hence,<br />
e 2<br />
n;s = V ar e Sn;s C jDnj k s i;nk2 = CjDnj (s) (B.10)<br />
lim lim sup<br />
s!1 n!1<br />
Furthermore, by (B.8) we have<br />
lim lim sup<br />
s!1 n!1<br />
<strong>and</strong> hence for all s 1 <strong>and</strong> n 1<br />
1<br />
e 2<br />
n;s<br />
2<br />
n<br />
n;s<br />
n<br />
n;s<br />
n<br />
C lim<br />
s!1<br />
(s) = 0: (B.11)<br />
en;s<br />
lim lim sup = 0 (B.12)<br />
s!1 n!1 n<br />
C < 1: (B.13)<br />
3. CLT for P s<br />
i2Dn i;n<br />
We now show that for any …xed s > 0, s i;n satis…es the CLT given in Theorem<br />
B.1.<br />
h<br />
si;n 2+<br />
First, since supn;i2Dn E<br />
i<br />
< 1, the process is uniformly<br />
L2+<br />
0-integrable for<br />
Second, note that since<br />
<strong>and</strong> r > 2s<br />
0<br />
= =2, i.e.,<br />
lim<br />
k!1 sup E[<br />
n;i2Dn<br />
s 2+ =2<br />
i;n 1(<br />
s<br />
i;n > k)] = 0:<br />
s<br />
i;n<br />
s<br />
i;n is a measurable function of "i;n for any u; v 2 N<br />
(u; v; r) (uMs d ; vMs d ; r 2s) C (u + v) b (r 2s)<br />
We next to show that b(r) = O(r d(2 +1) ) for = max f ; 2= g <strong>and</strong> some<br />
> 0. By assumption,<br />
1X<br />
r=1<br />
where = =(2 + ), which implies<br />
r d( +1) 1 b 2(2+ ) (r) < 1;<br />
b (r) = o(r 2d(2+ )( +1)= ) = o(r d[2( +2= )+1] d ) = o(r d[2 +1] d )<br />
since +2= for = max f ; 2= g. Thus, b(r) = O(r d(2 +1) ) for = d.<br />
27
We next show that for su¢ ciently large s;<br />
By (B.9),<br />
0 < lim inf<br />
n!1 jDnj 1 2 n;s:<br />
B 1=2<br />
inf jDnj 1=2 n<br />
Since lims!1 (s) = 0; there exists s such that in light of (B.10) for all s s ,<br />
Hence by (B.8) for all s s ;<br />
jDnj 1=2 en;s C 1=2 (s) B 1=2 =2: (B.14)<br />
jDnj 1=2 ( n en;s) jDnj 1=2 n;s<br />
<strong>and</strong> thus infn jDnj 1=2 n;s infn jDnj 1=2 n sup n jDnj 1=2 en;s. Using (??)<br />
<strong>and</strong> (B.14), we have<br />
Thus, for all s s ,<br />
lim inf<br />
n!1 jDnj 1=2 1=2 B1=2 B1=2<br />
n;s B = > 0<br />
2 2<br />
1<br />
n;s<br />
X<br />
i2Dn<br />
s<br />
i;n =) N(0; 1) as n ! 1: (B.15)<br />
Since the …rst s terms do not a¤ect the analysis below we take in the following<br />
s = 1:<br />
P 1<br />
n i2Dn Yi;n<br />
4. CLT for<br />
Finally, using Lemma B.1 we now show that, given the maintained NED<br />
assumption, the just established CLT in (B.15) for the approximators<br />
be carried over to the the Yi;n. De…ne<br />
s<br />
i;n can<br />
X<br />
X<br />
X<br />
Wn =<br />
1<br />
n<br />
i2Dn<br />
Yi;n; Vns = 1<br />
n<br />
i2Dn<br />
s<br />
i;n, Wn Vns = 1<br />
n<br />
so that we can exploit Lemma B.1 to prove that<br />
X<br />
Wn = Yi;n =) V N(0; 1):<br />
1<br />
n<br />
i2Dn<br />
i2Dn<br />
We …rst verify condition (iii) of Lemma B.1. By Markov’s inequality <strong>and</strong> (B.11),<br />
for every > 0 we have<br />
lim lim sup P (jWn Vnsj > ) = lim lim sup P (<br />
s!1 n!1<br />
s!1 n!1<br />
lim lim sup<br />
s!1 n!1<br />
28<br />
e 2<br />
n;s<br />
= 0:<br />
2 2<br />
n<br />
1<br />
n<br />
X<br />
i2Dn<br />
s<br />
i;n<br />
s<br />
i;n > )
Next observe that<br />
Vns = n;s<br />
n<br />
"<br />
1<br />
n;s<br />
X<br />
i2Dn<br />
We proceed to show Wn =) V by contradiction. For that purpose let M be the<br />
set of all probability measures on (R;B), <strong>and</strong> observe that we can metrize M by,<br />
e.g., the Prokhorov distance d(:; :). Let n <strong>and</strong> be the probability measures<br />
corresponding to Wn <strong>and</strong> V , respectively, then Wn =) V , or n =) , i¤<br />
d( n; ) ! 0 as n ! 1. Now suppose n does not converge to . Then for<br />
some > 0 there exists a subsequence fn(m)g such that d( n(m); ) > for<br />
all n(m). By (B.13), we have 0 n;s= n C < 1 for all s; n 1. Hence,<br />
0 n(m);s= n(m) C < 1 for all n(m). Consequently, for s = 1 there<br />
exists a subsubsequence fn(m(l1))g such that n(m(l1));1= n(m(l1)) ! p(1) as<br />
l1 ! 1. For s = 2, there exists a subsubsubsequence fn(m(l1(l2)))g such that<br />
n(m(l1(l2)));2= n(m(l1(l2))) ! p(2) as l2 ! 1. The argument can be repeated<br />
for s = 3; 4:::. Now construct a subsequence fnlg such that n1 corresponds<br />
to the …rst element of fn(m(l1))g, n2 corresponds to the second element of<br />
fn(m(l1(l2)))g, <strong>and</strong> so on, then<br />
lim<br />
l!1<br />
nl;s<br />
nl<br />
s<br />
i;n<br />
for s = 1; 2; : : : Given (B.15), it follows that as l ! 1<br />
Then, it follows from (B.12) that<br />
lim jp(s) 1j lim<br />
s!1 s!1 lim<br />
l!1 p(s)<br />
#<br />
Vnls =) Vs N(0; p 2 (s)):<br />
:<br />
= p(s) (B.16)<br />
nl;s<br />
nl<br />
+ lim<br />
s!1 sup<br />
n 1<br />
n;s<br />
n<br />
1 = 0:<br />
Thus Vs =) V <strong>and</strong> thus by Lemma B.1 Wnl =) V N(0; 1) as l ! 1.<br />
Since fnlg fn(m)g this contradicts the assumption that d( n(m); ) > for<br />
all n(m). This completes the proof of the CLT. Q.E.D.<br />
Proof of Corollary 1: To prove the theorem, we apply the Cramer-Wold<br />
device, <strong>and</strong> verify that for every 2 Rk P 1<br />
with j j = 1, n Vi;n ) N(0; 1),<br />
where Vi;n = 0 Yi;n. Observe that using the properties of norms, we have<br />
<strong>and</strong><br />
jVi;nj = 0 Yi;n j j jYi;nj = jYi;nj<br />
1(jVi;nj > k) = 1(<br />
0 Yi;n > k) 1(jYi;nj > k);<br />
<strong>and</strong> thus limk!1 sup n;i2Dn E[jVi;nj 2+ 1(jVi;nj > k)] = 0. Furthermore, observe<br />
that<br />
kVi;n E(Vi;n j Fi;n(s))k 2 j j kYi;n EYi;n j Fi;n(s))k 2 di;n (s)<br />
29
<strong>and</strong> that for 2 n = V ar( P<br />
i2Dn Vi;n) = 0 n we have<br />
inf<br />
n jDnj 1 2 n = inf<br />
n jDnj 1 0 n inf<br />
n jDnj 1 min( n) > 0:<br />
From this we see that <strong>under</strong> the maintained assumptions Vi;n satis…es all assumptions<br />
P<br />
of the CLT for scalar-valued r<strong>and</strong>om …elds (Theorem 1) <strong>and</strong>, there-<br />
1 fore, n Vi;n ) N(0; 1) as claimed.<br />
Next de…ne Xi;n = Vi;n, then by analogous arguments as above<br />
jXi;nj jYi;nj :<br />
From the maintained uniform L2+ integrability of jYi;nj it then follows that<br />
kXk 2+ kY k 2+ , which shows that the 2+ moments of Xi;n can be bounded<br />
by a constant that does not depend on . Consequently if follows from the last<br />
inequality in the proof of part (a) of Lemma B.3 that<br />
V ar( X<br />
i2Dn<br />
Xi;n) = 0 n CjDnj<br />
where C < 1 does not depend on n <strong>and</strong> . Hence<br />
sup<br />
n<br />
jDnj 1 max( n) = sup jDnj<br />
n<br />
1 sup<br />
j j=1<br />
0 n C < 1.<br />
This proves the second claim of the lemma. Q.E.D.<br />
Proof of Theorem 2: To prove the theorem, it su¢ ces to show that jDnj 1 P<br />
0. First we show that for each given s > 0, the conditional mean V s<br />
i;n =<br />
E(Yi;njFi;n(s)) satis…es the assumptions of the L1-norm LLN of Jenish <strong>and</strong><br />
Prucha (2009, Theorem 3). Using the Jensen <strong>and</strong> Lyapunov inequalities gives<br />
for all s > 0, i 2 Dn; n 1:<br />
E V s p<br />
i;n EfE(jYi;nj p jFi;n(s))g sup E jYi;nj<br />
n;i2Dn<br />
p < 1:<br />
So, V s<br />
i;n is uniformly Lp-bounded for p > 1 <strong>and</strong> hence uniformly integrable.<br />
For each …xed s; V s<br />
i;n is a measurable function of f"j;n; j 2 Tn : (i; j) sg.<br />
Observe that <strong>under</strong> Assumption 1 there exists a …nite constant C such that the<br />
cardinality of the set fj 2 Tn : (i; j) sg is bounded by Csd ; compare Lemma<br />
A.1 in Jenish <strong>and</strong> Prucha (2009). Hence,<br />
V s(1; 1; r)<br />
<strong>and</strong> thus in light of Assumption 6(b)<br />
1X<br />
r=1<br />
r d 1 V s(1; 1; r)<br />
X2s<br />
r=1<br />
1; r 2s<br />
Cs d ; Cs d ; r 2s ; r > 2s<br />
r d 1 + '(Cs d ; Cs d )<br />
30<br />
1X<br />
(r + 2s) d 1 b(r) < 1:<br />
r=1<br />
!<br />
i2Dn (Yi;n EYi;n) L1
The above shows that indeed, for each …xed s, V s<br />
i;n satis…es the assumptions<br />
of the L1-norm LLN of Jenish <strong>and</strong> Prucha (2009, Theorem 3). Therefore, for<br />
each s; we have<br />
jDnj<br />
X<br />
1<br />
[E(Yi;njFi;n(s)) EYi;n]<br />
i2Dn<br />
1<br />
! 0 as n ! 1: (B.17)<br />
Furthermore observe that from (??) <strong>and</strong> the Minkowski inequality<br />
jDnj<br />
1 X<br />
i2Dn<br />
(Yi;n E(Yi;njFi;n(s)))<br />
1<br />
(s): (B.18)<br />
Given (B.17) <strong>and</strong> (B.18), <strong>and</strong> observing that lims!1 (s) = 0 it now follows<br />
that<br />
lim<br />
n!1<br />
jDnj 1 X<br />
lim lim sup<br />
s!1 n!1<br />
+ lim<br />
s!1 lim<br />
n!1<br />
i2Dn<br />
jDnj<br />
(Yi;n EYi;n)<br />
1 X<br />
i2Dn<br />
jDnj 1 X<br />
i2Dn<br />
1<br />
= lim<br />
s!1 lim<br />
n!1<br />
(Yi;n E(Yi;njFi;n(s)))<br />
(E(Yi;njFi;n(s)) EYi;n)<br />
jDnj 1 X<br />
1<br />
1<br />
= 0:<br />
i2Dn<br />
(Yi;n EYi;n)<br />
This completes the proof of the LLN. Q.E.D.<br />
C Appendix: Proofs for Section 5<br />
Proof of Theorem 3: We show that<br />
sup<br />
2<br />
Qn( ) Q n( )<br />
p<br />
! 0 (C.1)<br />
as n ! 1. As discussed in the text, given that the o n are identi…ably unique it<br />
then follows immediately from, e.g., Pötscher <strong>and</strong> Prucha (1997), Lemma 3.1,<br />
that ( b n; o n) p ! 0 as n ! 1 as claimed.<br />
We start by proving that<br />
jDnj<br />
1 X<br />
i2Dn<br />
[qi;n(Zi;n; ) Eqi;n(Zi;n; )] p ! 0 (C.2)<br />
for each 2 , by applying the LLN given as Theorem 2 in the text to<br />
qi;n(Zi;n; ). First note that by Assumption 6(a), we have sup n;i2Dn E jqi;n(Zi;n; )j p <<br />
1 for each 2 <strong>and</strong> p = 2, which veri…es Assumption 4(a) for qi;n(Zi;n; ) with<br />
ci;n = 1. By Assumption 6(b), the qi;n(Zi;n; ) are uniformly L1-NED on ", <strong>and</strong><br />
31<br />
1
hence w.l.g. we can take di;n = 1. Furthermore, by Assumption 6(b) the input<br />
process " is -mixing, <strong>and</strong> the -mixing coe¢ cients satisfy Assumption 4(b).<br />
Consequently (C.2) follows directly from Theorem 2 applied to qi;n(Zi;n; ).<br />
Next observe that by Proposition 1 of Jenish <strong>and</strong> Prucha (2009), Assumption<br />
6(c) implies that qi;n is L0 stochastically equicontinuous on , i.e., for every<br />
" > 0<br />
1<br />
lim sup<br />
n!1 jDnj<br />
X<br />
P<br />
i2Dn<br />
sup<br />
( ; )<br />
jqi;n(Zi;n; ) qi;n(Zi;n; )j > "<br />
!<br />
! 0 as ! 0:<br />
Furthermore, in light of Assumption 6(a) the qi;n(Zi;n; ) clearly satisfy the<br />
domination condition postulated by the ULLN in Jenish <strong>and</strong> Prucha (2009),<br />
stated as Theorem 2 in that paper. Given that we have already veri…ed the<br />
pointwise LLN in (C.2) it now follows directly from that theorem that<br />
1 P<br />
sup jRn( ) ERn( )j<br />
2<br />
p ! 0 (C.3)<br />
with Rn( ) = jDnj i2Dn qi;n(Zi;n; ), <strong>and</strong> that the ERn( ) are uniformly<br />
equicontinuous on in the sense that<br />
lim sup sup<br />
n!1 2<br />
sup<br />
( ; )<br />
To prove (C.1) observe that<br />
sup<br />
2<br />
sup<br />
2<br />
sup<br />
2<br />
jERn( ) ERn( )j ! 0 as ! 0:<br />
Qn( ) Q n( ) (C.4)<br />
jRn( ) 0 P Rn( ) ERn( )P ERn( )j + sup jRn( )<br />
2<br />
0 (Pn P )Rn( )j<br />
jRn( ) 0 P Rn( ) ERn( )P ERn( )j + 2 sup jRn( )j<br />
2<br />
2 jPn P j :<br />
Furthermore observe that Assumption 6(a) we have E [sup 2 jqi;n(Zi;n; )j]<br />
K <strong>and</strong> E [sup 2 jqi;n(Zi;n; )j] 2 K for some …nite constant K. Thus<br />
<strong>and</strong><br />
sup<br />
2<br />
E jRn( )j E sup jRn( )j jDnj<br />
2<br />
E sup jRn( )j<br />
2<br />
2<br />
jDnj<br />
jDnj<br />
2 X<br />
i;j2Dn<br />
2 X<br />
i;j2Dn<br />
E sup<br />
2<br />
"<br />
1 X<br />
i2Dn<br />
E sup jqi;n(Zi;n; )j K (C.5)<br />
2<br />
jqi;n(Zi;n; )j sup jqj;n(Zj;n; )j (C.6)<br />
2<br />
E sup jqi;n(Zi;n; )j<br />
2<br />
2 # 1=2 "<br />
E sup jqj;n(Zj;n; )j<br />
2<br />
Now consider the …rst terms on the r.h.s. of the last inequality of (C.4). From<br />
(C.5) we see that E jRn( )j takes on its values in a compact set. Given (C.3) it<br />
32<br />
2 # 1=2<br />
K:
now follows immediately from part (a.I) of Lemma 3.3 of Pötscher <strong>and</strong> Prucha<br />
(1997) that<br />
sup jRn( )<br />
2<br />
0 P Rn( ) ERn( )P ERn( )j p ! 0: (C.7)<br />
Next we show that also the second term on the r.h.s. of the last inequality<br />
of (C.4) converges in probability to zero. To see that this is indeed the case<br />
observe that sup 2 jRn( )j 2 = Op(1) in light of (C.6) <strong>and</strong> jPn P j p ! 0 by<br />
assumption. This completes the proof of (C.1).<br />
Having established that ERn( ) are uniformly equicontinuous on , the<br />
uniform equicontinuity of Q n( ) on follows immediately from Lemma 3.3(b)<br />
of Pötscher <strong>and</strong> Prucha (1997). Q.E.D.<br />
Proof of Theorem 4: Clearly by Theorem 3 we have b o<br />
n n = op(1).<br />
Step 1. The estimators b n corresponding to the objective function (23)<br />
satisfy the following …rst order conditions:<br />
h<br />
jDnj 1=2 Rn( b i<br />
n) = op(1): (C.8)<br />
r Rn( b n) 0 Pn<br />
The op(1) term on the r.h.s. re‡ects that the …rst order conditions may not<br />
hold if b n falls onto the boundary of , <strong>and</strong> that the probability of that event<br />
goes to zero as n ! 1, since the o n are uniformly in the in the interior of by<br />
Assumption 7(a). If b n is in the interior of , then the l.h.s. of (C.8) is zero.<br />
Taking the mean value expansion of Rn( b n) about<br />
Rn( b n) = Rn( o n) + r Rn( e n)( b n<br />
o<br />
n yields<br />
o<br />
n) (C.9)<br />
where e n 2 is between b n <strong>and</strong> o n (component-by-component). Let<br />
bAn = r Rn( b n) 0 Pnr Rn( e n) <strong>and</strong> b Bn = r Rn( b n) 0 Pn<br />
then combining (C.8) <strong>and</strong> (C.9) gives:<br />
=<br />
=<br />
jDnj 1=2 b n<br />
h<br />
I<br />
h<br />
I<br />
i<br />
A b+ b<br />
n An<br />
i<br />
A b+ b<br />
n An<br />
o<br />
n<br />
jDnj 1=2 b n<br />
jDnj 1=2 b n<br />
where b A + n denotes the generalized inverse of b An.<br />
o<br />
n<br />
o<br />
n<br />
h<br />
jDnj 1 i1=2 n ;<br />
(C.10)<br />
bA + n r Rn( b n) 0 h<br />
Pn jDnj 1=2 Rn( o i<br />
n) + b A + n op(1)<br />
bA + n b h<br />
1=2<br />
Bn n jDnj Rn( o i<br />
n) + b A + n op(1);<br />
Step 2. By Assumptions 7(c) the qi;n(Zi;n; o n) are uniformly L2-NED <strong>and</strong><br />
uniformly L2+ -integrable with ci;n = 1. Given Assumptions 7(d),(g) it is now<br />
readily seen that the process fqi;n(Zi;n; o n); i 2 Dng satis…es all assumptions<br />
of the CLT for vector-valued NED processes, given as Corollary 1 in the text,<br />
33
with cin = 1. (Note that Assumption 3(d) is satis…ed automatically since the<br />
qi;n(Zi;n; o n) are uniformly L2-NED.) Hence,<br />
1=2<br />
n jDnj Rn( o X<br />
1=2<br />
n) = n qi;n(Zi;n; o n) =) N(0; Ipq ); (C.11)<br />
with n = V ar P<br />
i2Dn<br />
i2Dn qi;n(Zi;n; o n) <strong>and</strong> sup n max<br />
h<br />
jDnj 1 i<br />
n < 1.<br />
Step 3. By Assumptions 7(c),(d),(e) the functions r qi;n(Zi;n; ) satisfy for<br />
each 2 the LLN given as Theorem 2 in the text with ci;n = 1, observing<br />
that Assumption 4(b) is implied by 2. By argumentation analogous as used in<br />
the proof of consistency we have<br />
jDnj<br />
1 X<br />
i2Dn<br />
(r qi;n(Zi;n; ) Er qi;n(Zi;n; )) p ! 0:<br />
By Proposition 1 of Jenish <strong>and</strong> Prucha (2009), Assumption 7(f) implies that the<br />
r qi;n(Zi;n; ) are uniformly L0-equicontinuous on . Given L0-equicontinuity<br />
<strong>and</strong> Assumption 7(e), we have by the ULLN of Jenish <strong>and</strong> Prucha (2009), Theorem<br />
2,<br />
sup jr Rn( )<br />
2<br />
Er Rn( )j p ! 0: (C.12)<br />
<strong>and</strong> furthermore, the Er Rn( ) are uniformly equicontinuous on<br />
that<br />
in the sense<br />
sup jEr Rn( ) Er Rn( )j ! 0 (C.13)<br />
lim sup sup<br />
n!1 02 j<br />
as ! 0. In light of (C.12) <strong>and</strong> (C.13), <strong>and</strong> given that b n<br />
hence e o<br />
n n = op(1), if follows further that<br />
0 j<<br />
o<br />
n = op(1) <strong>and</strong><br />
r Rn( b n) Er Rn( o n) p ! 0; <strong>and</strong> r Rn( e n) Er Rn( o n) p ! 0:<br />
Hence,<br />
bAn An<br />
where b An <strong>and</strong> b Bn are as de…ned above, <strong>and</strong><br />
p<br />
p<br />
! 0 <strong>and</strong> Bn<br />
b Bn ! 0; (C.14)<br />
An = [Er Rn( o n)] 0 P [Er Rn( o n)] <strong>and</strong> Bn = [Er Rn( o n)] 0 P<br />
h<br />
jDnj 1 i1=2 n :<br />
Step 4. Given Assumptions 7(e),(f), <strong>and</strong> since P is positive de…nite, we<br />
have jAnj = O(1) <strong>and</strong> A 1<br />
n = O(1), respectively. Hence by, e.g., Lemma<br />
F1 in Pötscher <strong>and</strong> Prucha (1997) we have b An = Op(1), b A + n = Op(1), b An is<br />
nonsingular with probability tending to one, <strong>and</strong> b A + n A 1 p<br />
n ! 0. In light of the<br />
above it follows from (C.10) that<br />
jDnj 1=2 b<br />
n<br />
o<br />
n =<br />
h<br />
A b+ b 1=2<br />
n Bn n jDnj Rn( o =<br />
i<br />
n) + op(1)<br />
A 1<br />
h<br />
1=2<br />
n Bn n jDnj Rn( o i<br />
n) + op(1)<br />
34
Recalling that supn h<br />
max jDnj 1 i<br />
n < 1, Assumptions 7(e) implies that<br />
jBnj = Op(1). In light of Assumption 7(g),(h) BnB0 n is invertible <strong>and</strong> fur-<br />
thermore (BnB0 n) 1 = O(1). Thus A 1<br />
n BnB0 nA 10<br />
n<br />
O(1) <strong>and</strong> therefore<br />
A 1<br />
n BnB 0 nA 10<br />
n<br />
= A 1<br />
n BnB 0 nA 10<br />
n<br />
The claim that A 1<br />
n BnB0 nA 10<br />
n<br />
1=2 jDnj 1=2 b n<br />
h<br />
1=2 1<br />
An Bn<br />
1=2 jDnj 1=2 b n<br />
o<br />
n<br />
1<br />
1=2<br />
n jDnj Rn( o n)<br />
o<br />
n<br />
jAnj 2 (BnB 0 n) 1 =<br />
i<br />
+ op(1):<br />
=) N(0; Ik) now fol-<br />
lows, e.g., from Corollary F4(b) in Pötscher <strong>and</strong> Prucha (1997): Q.E.D.<br />
References<br />
[1] Andrews, D.W.K. (1984): "Non-Strong Mixing Autoregressive processes,"<br />
Journal of Applied Probability, 21, 930-934.<br />
[2] Andrews, D.W.K. (1987): "Consistency in Nonlinear Econometric Models:<br />
a Generic Uniform Law of Large Numbers," Econometrica, 55, 1465-1471.<br />
[3] Andrews, D.W.K. (1988): "Laws of Large Numbers for Dependent Nonidentically<br />
Distributed Variables," Econometric Theory, 4, 458-467.<br />
[4] Andrews, D.W.K. (2005): "Cross-section Regression with Common<br />
Shocks," Econometrica, 2005, 73, 1551–1585.<br />
[5] Ango Nze, P. <strong>and</strong> P. Doukhan (2004): "Weak Dependence: Models <strong>and</strong><br />
Applications to Econometrics," Econometric Theory, 20, 995-1045.<br />
[6] Bai, J. (2003): "Inferential Theory for Factor Models of Large Dimensions,"<br />
Econometrica 71, 135-171.<br />
[7] Bai, J. <strong>and</strong> S. Ng (2004): "A PANIC Attack on Unit Roots <strong>and</strong> Cointegration,",<br />
Econometrica, 72, 1127-1177.<br />
[8] Bera, A.K, <strong>and</strong> P. Simlai: "Testing <strong>Spatial</strong> Autoregressive Model <strong>and</strong> a<br />
Formulation of <strong>Spatial</strong> ARCH (SARCH) Model with Applications. Paper<br />
presented at the Econometric Society World Congress, London, August<br />
2005.<br />
[9] Bierens, H.J. (1981): Robust methods <strong>and</strong> asymptotic theory. Berlin:<br />
Springer Verlag.<br />
[10] Billingsley, P. (1968): Convergence of Probability Measures. New York:<br />
John Wiley <strong>and</strong> Sons.<br />
[11] Bolthausen, E. (1982): "<strong>On</strong> the Central Limit Theorem for Stationary<br />
Mixing R<strong>and</strong>om Fields," The Annals of Probability, 10, 1047-1050.<br />
35
[12] Bradley, R. (1993): "Some Examples of Mixing R<strong>and</strong>om Fields," Rocky<br />
Mountain Journal of Mathematics, 23, 2, 495-519.<br />
[13] Brockwell, P., <strong>and</strong> R. Davis (1991): Times Series: Theory <strong>and</strong> Methods,<br />
Springer Verlag.<br />
[14] Bulinskii, A.V. (1989): Limit Theorems <strong>under</strong> Weak Dependence Conditions,<br />
Moscow University Press.<br />
[15] Bulinskii, A.V., <strong>and</strong> P. Doukhan (1990): "Vitesse de convergence dans le<br />
theorem de limite centrale pour des champs melangeants des hypotheses de<br />
moment faibles," C.R. Academie Sci. Paris, Serie I, 801-105.<br />
[16] Chen, X., <strong>and</strong> T. Conley (2001): "A New Semiparametric <strong>Spatial</strong> Model<br />
for Panel Time Series," Journal of Econometrics,105, 59-83.<br />
[17] Cli¤, A., <strong>and</strong> J. Ord (1981): <strong>Spatial</strong> <strong>Processes</strong>, Models <strong>and</strong> Applications.<br />
London: Pion.<br />
[18] Conley, T. (1999): "GMM Estimation with Cross Sectional Dependence,"<br />
Journal of Econometrics, 92, 1-45.<br />
[19] Davidson, J. (1992): "A Central Limit Theorem for Globally Nonstationary<br />
Near-Epoch Dependent Functions of Mixing <strong>Processes</strong>," Econometric<br />
Theory, 8, 313-329.<br />
[20] Davidson, J. (1993): "An L1 Convergence Theorem for Heterogenous<br />
Mixingale Arrays with Trending Moments," Statistics <strong>and</strong> Probability Letters,<br />
8, 313-329.<br />
[21] Davidson, J. (1994): Stochastic Limit Theory. Oxford University Press.<br />
[22] De Jong, R.M. (1997): "Central Limit Theorems for Dependent Heterogeneous<br />
R<strong>and</strong>om Variables," Econometric Theory, 13, 353–367.<br />
[23] Dobrushin, R. (1968): "The Description of a R<strong>and</strong>om Field by its Conditional<br />
Distribution <strong>and</strong> its Regularity Condition," Theory of Probability<br />
<strong>and</strong> its Applications, 13, 197-227.<br />
[24] Doukhan, P. (1994): Mixing. Properties <strong>and</strong> Examples. Springer-Verlag.<br />
[25] Doukhan, P., <strong>and</strong> X. Guyon (1991): "Mixing for Linear R<strong>and</strong>om Fields,"<br />
C.R. Academie Sciences Paris, Serie 1, 313, 465-470.<br />
[26] Doukhan, P., <strong>and</strong> S. Louhichi (1999): "A New Weak Dependence Condition<br />
<strong>and</strong> Applications to Moment Inequalities," Stochastic <strong>Processes</strong> <strong>and</strong> Their<br />
Applications, 84, 313-342.<br />
[27] Gallant, A.R.(1987): Nonlinear Statistical Models. New York: Wiley.<br />
[28] Gallant, A.R., <strong>and</strong> H. White (1988): A Uni…ed Theory of Estimation <strong>and</strong><br />
<strong>Inference</strong> for Nonlinear Dynamic Models. New York: Basil Blackwell.<br />
36
[29] Jenish, N., <strong>and</strong> I.R. Prucha (2009): "Central Limit Theorems <strong>and</strong> Uniform<br />
Laws of Large Numbers for Arrays of R<strong>and</strong>om Fields," Journal of<br />
Econometrics, 150, 86-98.<br />
[30] Ibragimov, I.A. (1962): "Some Limit Theorems for Stationary <strong>Processes</strong>,"<br />
Theory of Probability <strong>and</strong> Applications, 7, 349-382.<br />
[31] Ibragimov, I.A., <strong>and</strong> Y. V. Linnik (1971): Independent <strong>and</strong> Stationary<br />
Sequences of R<strong>and</strong>om Variables. Wolters-Noordho¤, Groningen.<br />
[32] Kelejian, H.H., <strong>and</strong> I.R. Prucha (2004): "Estimation of Simultaneous<br />
Systems of <strong>Spatial</strong>ly Interrelated Cross Sectional Equations," Journal of<br />
Econometrics, 118, 27-50.<br />
[33] Kelejian, H.H., <strong>and</strong> I.R. Prucha (2007a): "HAC Estimation in a <strong>Spatial</strong><br />
Framework," Journal of Econometrics,140, 131-154.<br />
[34] Kelejian, H.H., <strong>and</strong> I.R. Prucha (2007b): "Speci…cation <strong>and</strong> Estimation<br />
of <strong>Spatial</strong> Autoregressive Models with Autoregressive <strong>and</strong> Heteroskedastic<br />
Disturbances," forthcoming in Journal of Econometrics.<br />
[35] Lee, L.-F. (2004): "<strong>Asymptotic</strong> Distributions of Quasi-Maximum Likelihood<br />
Estimators for <strong>Spatial</strong> Autoregressive Models," Econometrica, 72,<br />
1899-1925.<br />
[36] Lee, L.-F. (2007a): "GMM <strong>and</strong> 2SLS for Mixed Regressive, <strong>Spatial</strong> Autoregressive<br />
Models," Journal of Econometrics, 137, 489–514.<br />
[37] Lee, L.-F. (2007b): "Method of Elimination <strong>and</strong> Substitution in the GMM<br />
Estimation of Mixed Regressive <strong>Spatial</strong> Autoregressive Models," forthcoming<br />
in Journal of Econometrics.<br />
[38] McLeish, D. L. (1975a): "A Maximal Inequality <strong>and</strong> Dependent Strong<br />
Laws," Annals of Probability, 3, 826-36.<br />
[39] McLeish, D. L. (1975b): "Invariance Principles for Dependent Variables,"<br />
Z. Wahrsch. verw. Gebiete, 32, 165-78.<br />
[40] Nahapetian, B. (1987): "An Approach to Limit Theorems for Dependent<br />
R<strong>and</strong>om Variables," Theory of Probability <strong>and</strong> its Applications, 32, 589-594.<br />
[41] Neaderhouser, C. (1978): "Limit Theorems for Multiply Indexed Mixing<br />
R<strong>and</strong>om Variables," Annals of Probability, 6.<br />
[42] Pinkse, J., L. Shen <strong>and</strong> M.E. Slade (2007): "A Central Limit Theorem<br />
for Endogenous Locations <strong>and</strong> Complex <strong>Spatial</strong> Interactions," Journal of<br />
Econometrics, 140, 215-225.<br />
[43] Phillips, P.C. <strong>and</strong> D. Sul (2003): “Dynamic Panel Estimation <strong>and</strong> Homogeneity<br />
Testing Under Cross Section Dependence,” The Econometrics<br />
Journal, 6, 217–259.<br />
37
[44] Pötscher, B. M., <strong>and</strong> I. R. Prucha (1989): "A Uniform Law of Large Numbers<br />
for Dependent <strong>and</strong> Heterogeneous Data <strong>Processes</strong>," Journal of Econometrics,<br />
140, 215-225.<br />
[45] Pötscher, B.M., <strong>and</strong> I. R. Prucha (1994): "Generic Uniform Convergence<br />
<strong>and</strong> Equicontinuity Concepts for R<strong>and</strong>om Functions," Journal of Econometrics,<br />
60, 23-63.<br />
[46] Pötscher, B.M. <strong>and</strong> I. R. Prucha (1997): Dynamic Nonlinear Econometric<br />
Models. Springer-Verlag, New York.<br />
[47] Robinson, P.M. (2009): "Large-Sample <strong>Inference</strong> on <strong>Spatial</strong> Dependence,"<br />
The Econometrics Journal, Tenth Anniversary Special Issue, 12, S68-S82.<br />
[48] Robinson, P.M. (2007): "E¢ cient Estimation of the Semiparametric <strong>Spatial</strong><br />
Autoregressive Model," forthcoming in Journal of Econometrics.<br />
[49] Takahata, H. (1983): "<strong>On</strong> the Rates in the Central Limit Theorem for<br />
Weakly Dependent R<strong>and</strong>om Fields," Z. Wahrsch. verw. Gebiete, 64, 445-<br />
456.<br />
[50] Tran, L. (1990): "Kernel Density Estimation on R<strong>and</strong>om Fields," Journal<br />
of Multivariate Analysis, 34, 37-53.<br />
[51] Whittle, P. (1954): "<strong>On</strong> Stationary <strong>Processes</strong> in the Plane," Biometrica,<br />
41, 434-449.<br />
[52] Wooldridge, J. (1986): "<strong>Asymptotic</strong> Properties of Econometric Estimators,"<br />
University of California San Diego, Department of Economics, Ph.D.<br />
Dissertation.<br />
[53] Yu, J., R. de Jong, <strong>and</strong> L.-F. Lee (2008): "Quasi-maximum Likelihood<br />
Estimators for <strong>Spatial</strong> Dynamic Panel Data with Fixed E¤ects when Both<br />
N <strong>and</strong> T are Large," Journal of Econometrics, 146, 118-134.<br />
38