A note on the use of V and U statistics in nonparametric models of ...

AISM (2006) 58: 389–406 

DOI 10.1007/s10463-006-0034-z 

Carlos Martins-Filho · Feng Yao 

A <strong>note</strong> on the use of V and U statistics 

in nonparametric models of regression 

Received: 8 September 2004 / Revised: 12 April 2005 / Published online: 1 June 2006 

© The Institute of Statistical Mathematics, Tokyo 2006 

Abstract We establish the √ n asymptotic equivalence of V and U statistics when 

the statistic’s kernel depends on n. Combined with a lemma of B. Lee this result 

provides conditions under which U statistics projections and V statistics are √ n 

asymptotically equivalent. The use of this equivalence in nonparametric regression 

models is illustrated with several examples; the estimation of conditional variances, 

skewness, kurtosis and the construction of a nonparametric R-squared measure. 

Keywords U statistics · V statistics · local linear estimation 

1 Introduction 

The use of U statistics (see Hoeffding 1948), as a means of obtaining asymptotic 

properties of kernel based estimators in semiparametric and nonparametric 

regression models is now common practice in econometrics. For example, Powell, 

Stock and Stoker (1989) use U statistics to obtain the asymptotic properties of their 

semiparametric index model estimator. Ahn and Powell (1993) use U statistics to 

investigate the properties of a two stage semiparametric estimation of censored 

selection models where the first stage estimator involves a nonparametric regression 

estimator for the selection variable. Fan and Li (1996) use U statistics to study 

C. Martins-Filho (B) 

Department of Economics, Oregon State University, 

Ballard Hall 303, Corvallis, OR 97331-3612 USA 

E-mail: carlos.martins@orst.edu 

Tel.:+1-541-737-1476 

F. Yao 

Department of Economics, University of North Dakota, 

P.O. Box 8369, Grand Forks, ND 58203 USA 

E-mail: feng.yao@mail.business.und.edu 

Tel.: +1-701-7773360

390 C. Martins-Filho and F. Yao 

a set of specification tests for nonparametric regression models and Zheng (1998) 

uses U-statistics to provide a nonparametric kernel based test for parametric quantile 

regression models. See also Kemp (2000) and D’Amico (2003) for more recent 

uses. 

The appeal for the use of U statistics derives from the fact that often estimators 

or statistics of interest – T n – are expressed as linear combinations of nonparametric 

regression estimators. Consider for example a nonparametric regression model 

E(Y|X = x) = m(x) with observations {(y i ,x i )} n i=1 , T n = ∑ n 

i=1 c i ˆm(x i ) where 

c i ∈Rare nonstochastic and ˆm(x) is an arbitrary nonparametric estimator for 

m(x). Since ˆm(x) can usually be written as ˆm(x) = ∑ n 

j=1 w jn(x)y j , we can write 

T n = 

n∑ 

i=1 

( ) 

n 

c i w in (x i )y i + u 

2 n = T 1n + T 2n , 

( ) −1 

n ∑ 

where u n = 

2 1≤i

V and U statistics in nonparametric models 391 

where ∑ (n,k) de<strong>note</strong>s a sum over all subsets 1 ≤ i 1 0 and consequently to reach the desired conclusion it suffices to show 

that nE((u n − v n ) 2 ) = o(1).Bythec r -inequality we can write 

( 

nE((u n − v n ) 2 ) ≤ kn 1 − P k 

n ) 2 k−1 

E(u 2 

n k n ) + kn ∑ 

κ=1 

( 

n 

κ 

) 2 

n −2k E((u (κ) 

n )2 ). 

(4)


We will deal with the two terms on the righthand side of inequality (4) separately. 

We first <strong>note</strong> that by the representation theorem for U statistics (Serfling, 1980; p. 

180) u (κ) 

n = 1 n! 

∑p W n 

(κ)(Z 

i 1 

,... ,Z in ) where 

W (κ) 

n (Z i 1 

,... ,Z in ) = 1 γ κ 

(φ κ,n (Z i1 ,... ,Z iκ ) + φ κ,n (Z iκ+1 ,... ,Z i2κ ) +··· 

+φ κ,n (Z iγκ κ−κ+1 

,...,Z iγκ κ 

)), 

γ κ = [n/κ] is the greatest integer ≤ n/κ and ∑ p 

de<strong>note</strong>s the sum over all permutations 

(i 1 ,...,i n ) of {1, 2,...,n} and similarly u n = 1 ∑ 

n! p W n(Z i1 ,... ,Z in ) 

where W n (Z i1 ,... ,Z in ) = 1 γ (ψ n(Z i1 ,... ,Z ik ) + ψ n (Z ik+1 ,... ,Z i2k ) + ··· + 

ψ n (Z iγk−k+1 ,...,Z iγk )) and γ = [n/k] is the greatest integer ≤ n/k. Now, for the 

first term on the righthand side of inequality (4), since n k −Pk 

n = O(nk−1 ) we have 

( ) 

n 1 − P k 

n 2 ( ) 

n = O(n −1 ) and n 1 − P n 2 (( 

k 

k 

n E(u 

2 k n )=O(n −1 1 

∑ 

)E 

n! p W n(Z i1 ,..., 

Z in ) ) ) 2 

. By Minkowski’s inequality 

⎛( 

E ⎝ 

1 

n! 

) ⎞ 2 

∑ 

W n (Z i1 ,... ,Z in ) 

p 

⎠ ≤ 1 

(n!) 2 ( ∑ 

p 

= E(|W n (Z i1 ,... ,Z in )| 2 ), 

( 

E(|Wn (Z i1 ,... ,Z in )| 2 ) ) ) 2 

1/2 

where the last equality follows since {Z i } i=1,2,... is an i.i.d. sequence. By definition 

of W n and another use of Minkowski’s inequality we have 

⎛ 

⎞ 

γ∑ 

E(|W n (Z i1 ,... ,Z in )| 2 ) ≤ γ −2 ( 

⎝ E(ψ 

2 

n (Z ijk−k+1 ,... ,Z ijk )) ) 1/2 

⎠ 

j=1 

= E(ψ n (Z i1 ,... ,Z ik )) = o(n), 

( ) 

where the last equality follows by assumption.We therefore conclude that n 1− P k 

n 2 

n k 

E(u 2 n ) = o(1). 

( 

n 

For the second term on inequality (4), <strong>note</strong> first that n 

κ) 

−k = O(n κ−k ) and 

(( ) 2 

n 

since κ ∈{1,... ,k− 1}, max κ n κ−k = n −1 and therefore n 

κ) 

−k =O(n −2 ). 

Once again, by Minkowski’s inequality we have E((u (κ) 

n )2 ) ≤ E(|W n (κ)(Z 

i 1 

, 

... ,Z in )| 2 ). Now, given the definition of W n 

(κ) and the fact that {φ κ,n (Z ijκ−κ+1 

,... ,Z ijκ )} j=1,2,... is an i.i.d. sequence of random variables we have by another 

application of Minkowski’s inequality that E(|W n 

(κ)(Z 

i 1 

,... ,Z in )| 2 ) ≤ E((φ κ,n (Z i1 

,... ,Z iκ )) 2 ). Let l κ de<strong>note</strong> the number of terms in ∑ {ν 1 ,... ,ν κ ,k} , then by the c r 

inequality 

2


E((φ κ,n (Z i1 ,... ,Z iκ )) 2 ) 

⎛⎛ 

⎞2⎞ 

= E ⎝⎝ 

∑ k! 

∏ κ 

{ν 1 ,... ,ν κ ,k} j=1 ν j ! ψ ⎟ ⎟ 

n ×(Z i1 ,... ,Z i1 ,... ,Z iκ ,... ,Z iκ ) ⎠ ⎠ 

} {{ } } {{ } 

ν 1 − times 

ν κ − times 

⎛ 

⎞ 

∑ (k!) 2 

≤ l κ 

( ∏ κ 

{ν 1 ,... ,ν κ ,k} j=1 ν j !) E ⎜ 

⎝ψ 2 2 n (Z ⎟ 

i 1 

,... ,Z i1 ,... ,Z iκ ,... ,Z iκ ) ⎠ . 

} {{ } } {{ } 



Now, since by assumption E ( ψn 2(Z i 1 

,... ,Z ik ) ) = o(n) for all 1 ≤ i 1 ,...,i k 

≤ n, k ≤ n,, then it is true that for all κ ∈{1,... ,k − 1} and all ν j ≥ 1 such 

that k = ∑ κ 

j=1 ν j we have E ( ψn 2(Z i 1 

,... ,Z i1 ,... ,Z iκ ,... ,Z iκ ) ) = o(n). Then 

E(|W (κ) 

n 

∑k−1 

( 

n 

n 

κ 

κ=1 

(Z i 1 

,... ,Z in )| 2 ) = o(n) and consequently E((u (κ))2 

) = o(n).As a result, 

) 2 

n −2k E((u (κ) 

n 

∑k−1 

)2 )≤nO(n −2 )o(n) 

κ=1 

l κ 

∑ 

n 

{ν 1 ,... ,ν κ ,k} 

(k!) 2 

( ∏κ 

j=1 ν j !) 2 

=o(1), 

which completes the proof. 

⊓⊔ 

Theorem 1 can be proved using the following two conditions which are implied 

by E ( ψn 2(Z i 1 

,... ,Z ik ) ) = o(n) for all 1 ≤ i 1 ,...,i k ≤ n, k ≤ n: (a) 

E ( ψn 2(Z i 1 

,... ,Z ik ) ) = o(n) for all 1 ≤ i 1 

(b) for all κ ∈ {1,... ,k − 1} and all ν j ≥ 1 such that k = ∑ κ 

j=1 ν j that 

define Z i1 ,... ,Z i1 ,... ,Z iκ ,... ,Z iκ E ( ψ 

} {{ } } {{ } 

n 2(Z i 1 

,... ,Z i1 ,... ,Z iκ ,... ,Z iκ ) ) 



= o(n 2(k−κ)−1 ). The next corollary establishes the √ n asymptotic equivalence 

between V statistics and U statistics projections. 

Corollary 1 Let {Z i } n i=1 be a sequence of i.i.d. random variables and u n and 

v n be U and V statistics with kernel function ψ n (Z 1 ,... ,Z k ). In addition, let 

û n = k n 

∑ n 

i=1 (ψ 1n(Z i ) − θ n ) + θ n , where ψ 1n (Z i ) = E (ψ n (Z 1 ,... ,Z k )|Z i ) and 

θ n = E (ψ n (Z 1 ,... ,Z k )). IfE ( ψ 2 n (Z 1,... ,Z k ) ) = o(n) then √ n(v n −û n ) = 

o p (1). 

Proof From Theorem 1 we have √ n(v n − u n ) = o p (1), and from Lemma 2.1 in 

Lee (1988) we have that √ n(u n −û n ) = o p (1). Hence, √ n(v n −û n ) = o p (1). ⊓⊔ 

3 Some applications in regression models 

We provide applications of Theorem 1 and its corollary around the following regression 

model, 

y t = m(x t ) + ε t , (5)


where, {(y t ,x t )} n t=1 is a sequence of i.i.d. random variables, y t,x t ∈Rwith joint 

density q(y,x), E(ε t |x t ) = 0 and V(ε t |x t ) = σ 2 (x t ). This is similar to the regression 

model considered by Fan and Yao (1998), with the exception that here the 

observations are i.i.d. rather than strictly stationary. Our applications all involve 

establishing the asymptotic distribution of statistics that are constructed as linear 

combinations of a first stage nonparametric estimator of m(x). As the first stage 

estimator for m(x) we consider the local linear estimator of Fan (1992), i.e., for 

x ∈R, we obtain ˆm(x) =ˆα where 

n∑ 

( ˆα, ˆβ) = argmin α,β (y t − α − β(x t − x)) 2 K 

t=1 

( ) 

xt − x 

, 

h n 

where K(·) is a density function and 0


(a) Given E(y 2 t f 2 (x t ,y t )) < ∞ then 

I n = 1 n 

n∑ 

∫ 

ε t f(x t , y)g(y|x t )dy + 1 2 h2 n σ K 2 E(m(2) (x t )f (x t ,y t )) 

t=1 

where g(y|x) de<strong>note</strong>s the conditional density of y given x. 

(b) Given E(y 4 t f 2 (x t ,y t )) < ∞ then 

+o p (n − 1 2 ) + op (h 2 n ), (6) 

J n = σ 2 K h2 n 

n 

n∑ 

ε t m (2) (x t )E(f (x t ,y t )|x 1 ,... ,x n ) 

t=1 

+ 1 4 h4 n σ 4 K E((m(2) (x t )) 2 f(x t ,y t )) 

+o p (n − 1 2 ) + op (h 3 n ), (7) 

(c) Given E(y 6 t f 2 (x t ,y t )) < ∞ then 

L n = o p (h 3 n ) (8) 

(d) Given E(y 8 t f 2 (x t ,y t )) < ∞ then 

M n = o p (h 4 n ). (9) 

Proof Here we prove parts (a)–(c). The proof of (d) is similar to that of (c) and is 

omitted. Let ∑ (p) h i 1 ...i p 

de<strong>note</strong> the sum of h i1 ...i p 

over the permutations of i 1 ...i p , 

i.e., ∑ (p) h i 1 ...i p 

is the sum of p! terms given by h i1 i 2 ...i p 

+ h i2 i 1 ...i p 

+···+h ip i p−1 ...i 1 

. 

(a) Note that, 

ˆm(x t ) − m(x t ) = 

1 

nh n g(x t ) 

n∑ 

K 

k=1 

1 

+ 

2nh n g(x t ) 

( ) 

xk − x t 

ε k 

h n 

n∑ 

K 

k=1 

( ) 

xk − x t 

m (2) (x kt )(x k − x t ) 2 + w(x t ), 

h n 

where w(x t ) = ˆm(x t )−m(x t )− 1 ∑ n 

nh n g(x t ) k=1 K(x k−x t 

h n 

)yk ∗ and y∗ k = ε k+ 1 2 m(2) (x kt ) 

(x k −x t ) 2 with x kt = λx t +(1−λ)x k for some λ ∈ (0, 1). Hence, I n = I 1n +I 2n +I 3n , 

where 

I 1n = 1 

n 2 h n 

n∑ 

n∑ 

K 

t=1 k=1 

( ) 

xk − x t 

h n 

1 

ε k f(x t ,y t ) 

g(x t )


I 2n = h n 

2n 2 

I 3n = 1 n 

n∑ 

t=1 k=1 

n∑ 

( )( ) 

xk − x t xk − x 2 

t 

K 

m (2) 1 

(x kt )f (x t ,y t ) 

h n g(x t ) 

h n 

n∑ 

w(x t )f (x t ,y t ). 

t=1 

We treat each term separately. Observe that I 1n can be written as 

I 1n = 1 n∑ n∑ 

( ( ) 

1 xk − x t 

1 

K ε 

2n 2 

k f(x t ,y t ) 

h 

t=1 k=1 n h n g(x t ) 

+ 1 ( ) 

) 

xt − x k 

1 

K ε t f(x k ,y k ) 

h n h n g(x k ) 

⎛ ⎞ 

= 1 n∑ n∑ 

⎝ ∑ h 

2n 2 

tk 

⎠ = 1 n∑ n∑ 

ψ 

2n 2 

n (Z t ,Z k ) = 1 2 v n, 

t=1 k=1 (2) 

t=1 k=1 

where v n is a V statistic with Z t = (x t ,y t ) and ψ n (Z t ,Z k ) a symmetric kernel. 

Hence, by Corollary 1, if E ( ψn 2(Z t,Z k ) ) = o(n) for all t,k then √ n(v n −û n ) = 

o p (1). First, <strong>note</strong> that for t ≠ k 1 n E ( ψn 2(Z t,Z k ) ) = 2 n E(h2 tk ) + 2 n E(h tkh kt ) and 

since E(εk 2|x 1,... ,x n ) = σ 2 (x k ), we have from the proposition of Prakasa-Rao 

that 

1 

n E(h2 tk ) = 1 E 

nh 2 n 

( ( ) 

K 2 xk −x t 

h n 

) 

σ 2 (x k )E(f 2 1 

(x t ,y t )|x 1 ,... ,x n ) → 0, 

g 2 (x t ) 

provided that nh n → ∞. Similarly it can be shown that E( 1 n h tkh kt ) = o(1) 

and therefore 1 n E(ψ2 n (Z t,Z k )) = o(1). Fort = k it suffices to verify that 4K2 (0) 

( 

) 

nh 2 n 

E σ 2 (x t ) f 2 (x t ,y t ) 

g 2 (x t 

= o(1), but this follows directly given our assumptions. Now, 

) 

since E(ε t |x 1 ,... ,x n ) = 0, θ n = E (ψ n (Z t ,Z k )) = 0 and û n = 2 ∑ n 

n t=1 ψ 1n(Z t ). 

Therefore, 

ψ 1n (Z t ) = 1 ∫ ∫ ( ) 

xk − x t 

ε t K f(x k ,y k )g(y k |x k )dx k dy k 

h n h 

∫ 

n 

= ε t f(x t , y)g(y|x t )dy + o(1), 

where g(y|x) = q(y,x) 

g(x) . Consequently, 

I 1n = 1 n∑ 

∫ 

ε t f(x t , y)g(y|x t )dy + o p (n − 1 2 ) (10) 

n 

t=1 

given that 1 n 

∑ n 

t=1 ε t = O p (n − 1 2 ). Now, <strong>note</strong> that 

I 2n = 1 

4n 2 

n∑ 

n∑ ∑ 

t=1 k=1 (2) 

h tk = 1 

4n 2 

n∑ 

t=1 k=1 

n∑ 

ψ n (Z t ,Z k ) = 1 4 v n,


where h tk = h n K( x k−x t 

h n 

)( x k−x t 

h n 

) 2 m (2) (x kt )f (x t ,y t ) 1 

g(x t ) . By Corollary 1 if E(ψ2 n (Z t, 

Z k )) = o(n) for all t,k, then √ n(v n −û n ) = o p (1). Hence we focus on û n . Note 

that, û n = 2 ∑ n 

n t=1 ψ 1n(Z t ) − θ n , where ψ 1n (Z t ) = E(h tk |Z t ) + E(h kt |Z t ) and 

E(û n ) = θ n = 2E(h tk ). Hence, by the proposition of Prakasa-Rao, 

) (ûn 

E = 2 ∫ ∫ ∫ ( )( ) 

xk − x t xk − x 2 

t 

K 

m (2) (x 

h 2 kt ) 

n h n h n h n 

×f(x t ,y t ) g(x k) 

g(x t ) q(x t,y t )dx k dx t dy t 

−→ 2σ 2 K E(m(2) (x t )f (x t ,y t )). 

In addition, <strong>note</strong> that 

( ) 1 

V û 

h 2 n = 4 nV (E(h 

n n 2 h 4 tk |Z t ) + E(h kt |Z t )) 

n 

= 4 ( 

E(E(htk |Z 

nh 4 t )) 2 +E(E(h kt |Z t )) 2 

n 

+2E(E(h tk |Z t )E(h kt |Z k ))−θn) 2 . 

Under our assumptions, it is straightforward to show that 

1 

E(E(h tk |Z t )) 2 

nh 4 n 

= 1 E 

nh 2 n 

( 

as n →∞. Similarly, 

( 

as n →∞. Hence, V 

f(x t ,y t ) 2 1 

g 2 (x t ) E2 ( 

K 

1 

nh 4 n 

1 

h 2 n 

( 

xk − x t 

h n 

)( 

xk − x t 

h n 

) 2 

m (2) (x kt )|Z t 

)) 

→0 

E(E(h kt |Z t )) 2 → 0 and 1 

nh 4 n 

E(E(h tk |Z t )E(h kt |Z t )) → 0 

û n 

) 

→ 0. By Markov’s inequality we conclude that 

û n = 2h 2 n σ K 2 E(m(2) (x t )f (x t ,y t )) + o p (h 2 n ) and by Corollary 1, 

I 2n = 1 2 h2 n σ K 2 E(m(2) (x t )f (x t ,y t )) + o p (h 2 n ). (11) 

Finally, verification that E(ψn 2(Z t,Z k )) = 2E(h 2 tk ) + 2E(h tkh kt ) = o(n) for all 

t ≠ k follows directly from our assumptions and use of the proposition of Prakasa-Rao, 

and for the case where t = k we have that E(ψn 2(Z t,Z k )) = 0. From 

Martins-Filho and Yao’s (2003) Lemma 2 and Theorem 1 

|w(x t )|= 

∣ ˆm(x t) − m(x t ) − 

1 

nh n g(x t ) 

n∑ 

K 

k=1 

( ) 

xk − x t 

h n 

y ∗ k 

∣ = O p(R n,2 (x t )) 

and sup xt ∈G |R n,2(x t )|=o p (h 2 n ), hence sup x t ∈G |w(x t)| =o p (h 2 n ). As a result we 

have that |I 3n |≤o p (h 2 n ) 1 ∑ n 

n t=1 |f(x t,y t )|=o p (h 2 n ), since E(f 2 (x t ,y t )) < ∞. 

Therefore, I 3n = 1 ∑ n 

n t=1 w(x t)f (x t ,y t ) = o p (h 2 n ). Combining this result with


(10) and (11) proves part (a). 

For part (b) <strong>note</strong> that 

J n = 1 n∑ n∑ n∑ 

( ) 

1 

f(x 

n 3 h 2 t ,y t ) 

n 

g 2 (x 

t=1 k=1 l=1 

t ) K xk − x t 

K 

h n 

n∑ n∑ n∑ 

( 

+ h2 n 

1 

4n 3 

g 2 (x t ) f(x xk − x t 

t,y t )K 

t=1 k=1 l=1 

( 

xl − x t 

×K 

h n 

h n 

)( 

xl − x t 

h n 

) 2 

m (2) (x kt )m (2) (x lt ) 

+ 1 n∑ 

w 2 (x t )f (x t ,y t ) 

n 

t=1 

+ 1 n∑ n∑ n∑ 

( ) 

1 

n 3 g 2 (x 

t=1 k=1 l=1 t ) f(x xk − x t 

t,y t )K 

h n 

( )( ) 

xl − x t xl − x 2 

t 

×K 

m (2) (x lt )ε k 

+ 2 

n 2 h n 

n∑ 

h n 

t=1 k=1 

h n 

n∑ 1 

w(x t ) 

g(x t ) f(x t,y t )K 

+ h n∑ n∑ 

n 

1 

w(x 

n 2 

t ) 

g(x 

t=1 k=1 

t ) f(x t,y t )K 

= J 1n + J 2n + J 3n + J 4n + J 5n + J 6n . 

We examine each term separately. 

J 1n = 1 1 

n∑ n∑ n∑ ∑ 

h 

6 n 3 

tkl = 1 1 

6 n 3 

t=1 k=1 l=1 (3) 

n∑ 

( ) 

xk − x t 

ε k 

h n 

( 

xk − x t 

n∑ 

h n 

t=1 k=1 l=1 

1 

( ) 

xl − x t 

ε k ε l 

h n 

)( 

xk − x t 

h n 

) 2 

)( 

xk − x t 

h n 

) 2 

m (2) (x kt ) 

n∑ 

ψ n (Z t ,Z k ,Z l ) = 1 6 v n, 

where h tkl = 1 K( x k−x t 

h 2 n h n 

)K( x l−x t 

h n 

)ε k ε l f(x t ,y t ) 

g 2 (x t ) . By Corollary 1, if E(ψ2 n (Z t, 

Z k ,Z l )) = o(n) for all t,k,l then √ n(v n −û n ) = o p (1). Since ψ 1n (Z t ) = 0 

and θ n = 0, we have that û n = 0 and therefore √ nI 1n = o p (1). To verify that 

1 

n E(ψ2 n (Z t,Z k ,Z l )) = o(1) for t ≠ k ≠ l we <strong>note</strong> that 1 n E(ψ2 n (Z t,Z k ,Z l )) = 

4 

n (3E(h2 tkl ) + 2E(h tklh ktl ) + 2E(h tkl h ltk ) + 2E(h ktl h ltk )). Under A1–A4, repeated 

application of the proposition of Prakasa-Rao to each of these expectations shows 

that each approaches zero as n →∞. Also, if t = k but t ≠ l we have that 

E(ψn 2(Z t,Z t ,Z l )) = o(n) and when t = k = l, E(ψn 2(Z t,Z t ,Z t )) = o(n 2 ) 

directly from our assumptions provided nh 3 n → 0. 

J 2n = 1 

24n 3 

n∑ 

n∑ 

n∑ ∑ 

t=1 k=1 l=1 (3) 

)( x k−x t 

h n 

h tkl = 1 

24n 3 

n∑ 

) 2 K( x l−x t 

h n 

)( x l−x t 

n∑ 

n∑ 

t=1 k=1 l=1 

ψ n (Z t ,Z k ,Z l ) = 1 

24 v n, 

where h tkl = h 2 n K(x k−x t 

h n 

By Corollary 1 √ n(v n −û n ) = o p (1) provided that E(ψ n (Z t ,Z k ,Z l )) = o(n) 

h n 

) 2 m (2) (x kt )m (2) (x lt ) 

g 2 (x t ) f(x t,y t ). 

1


for all t,k,l. As above, verification of this condition is straightforward given our 

assumptions. Note that û n = 3 ∑ n 

n t=1 ψ 1n(Z t ) − 2θ n with E(û n ) = θ n . Hence, 

( 

1 

E(û 

h 4 n ) = 6 ( )( ) 

xk −x t xk − x 2 ( )( ) 

t xl −x t xl −x 2 

t 

E K 

K 

m (2) (x 

n h 2 kt ) 

n h n h n h n h n 

) 

× m (2) 1 

(x lt ) 

g 2 (x t ) f(x t,y t ) 

and by the proposition of Prakasa-Rao we have, 1 E(û 

h 4 n )→6σK 4 E((m(2) (x t )) 2 f(x t , 

n 

y t )) and V( 1 û 

h 4 n ) → 0. By Markov’s inequality we conclude that J 2n = 1 

n 4 h4 n σ K 

4 

E((m (2) (x t )) 2 f(x t ,y t )) + o p (h 4 n ) + o p(n −1/2 ). J 3n = 1 ∑ n 

n t=1 w2 (x t )f (x t ,y t ) = 

o p (h 4 n ) follows directly from the analysis of the I 3n in part (a). Now, 

J 4n = 1 

6n 3 

n∑ 

n∑ 

where h tkl = K( x k−x t 

h n 

n∑ ∑ 

t=1 k=1 l=1 (3) 

h tkl = 1 

6n 3 

n∑ 

n∑ 

t=1 k=1 l=1 

n∑ 

ψ n (Z t ,Z k ,Z l ) = 1 6 v n, 

)K( x l−x t 

h n 

)( x l−x t 

h n 

) 2 m (2) 1 

(x lt )ε k g 2 (x t ) f(x t,y t ). Once again given 

our assumptions it can be verified that E(ψn 2(Z t,Z k ,Z l )) = o(n) for all t,k,l, 

therefore we have √ n(v n −û n ) = o p (1), where û n = 3 ∑ n 

n t=1 ψ 1n(Z t ) − 2θ n . 

Given that in this case θ n = 0, we have 

û n = 6 n 

( 

n∑ 

ε t E K 

t=1 

( ) 

xt −x k 

K 

h n 

( )( ) ) 

xl −x k xl −x 2 

k 

m (2) 1 

(x lk ) 

h n g 2 (x k ) f(x k,y k )|Z t , 

h n 

Under A1–A4, the proposition of Prakasa Rao gives 

( ( ) ( )( ) ) 

1 xt − x k xl − x k xl − x 2 

k 

E K K 

m (2) 1 

(x 

h 2 lk ) 

n h n h n h n g 2 (x k ) f(x k,y k )|Z t → 

σK 2 m(2) (x t )E(f (x t ,y)|x 1 ,... ,x n ). 

∑ 

Hence, û n = 6h2 n n 

n t=1 ε tm (2) (x t )σK 2 E(f(x t,y)|x 1 ,... ,x n )+o p (n − 1 2 ), given that 

6 

∑ n 

n t=1 ε t = O p (n − 1 2 ). Consequently, J 4n = σ K 2 ∑ h2 n n 

n t=1 ε tm (2) (x t )E(f (x t ,y) 

|x 1 ,... ,x n ) + o p (n − 1 2 ). 

|J 5n | = ∣ 2 ∑ ( 

n 

n t=1 w(x 1 

∑ ) n 

t) 

nh n g(x t ) k=1 K(x k−x t 

h n 

)ε k f(x t ,y t ) ∣ and from 

Martins-Filho and Yao (2003) Theorem 1 we have sup xt ∈G |w(x t)| 

= o p (h 2 n ) and sup 1 

∑ 

∣ 

n 

x t ∈G ∣ 

nh n g(x t ) k=1 K(x k−x t ∣∣ 

h n 

)ε k = op (h n ). Hence, |J 5n | 

≤ 2 ∑ n 

n t=1 |f(x t,y t )|o p (h 3 n ) = O p(1)o p (h 3 n ) = o p(h 3 n ) since 2 ∑ n 

n t=1 |f(x t,y t )| 

= O p (1), which gives J 5n = o p (h 3 n ). Finally, from Martins-Filho and Yao (2003) 

Theorem 1 we have 

h n 

n∑ 

( )( ) xk − x t xk − x 2 t 

sup 

K 

m (2) (x kt ) 

∣ng(x t ) 

h n ∣ = O p(h 2 n ) (12) 

x t ∈G 

k=1 

h n


and |J 6n |≤ 1 ∑ n 

n t=1 |f(x t,y t )|o p (h 2 n )O p(h 2 n ) = O p(1)o p (h 4 n ) = o p(h 4 n ). Combining 

the orders of all six terms gives the desired result. 

∑ n 

k=1 K ( 

xk −x t 

1 

2nh n g(x t ) 

∑ n 

k=1 K ( 

xk −x t 

h n 

) 

m (2) (x kt ) 

) 

(c) Let a tn = 1 

nh n g(x t ) 

h n 

ε k , b tn = 

(x k − x t ) 2 , c tn = w(x t ). Then ( ˆm(x t ) − m(x t )) 3 = atn 3 + b3 tn + c3 tn + 3a tnbtn 2 + 

3a tn ctn 2 +3a2 tn b tn+3atn 2 c tn+3btn 2 c tn+3b tn ctn 2 +6a tnb tn c tn and n −1 ∑ n 

t=1 ( ˆm(x t)− 

m(x t )) 3 f(x t ,y t ) = ∑ 10 

i=1 L in, where L 1n = n −1 ∑ n 

t=1 a3 tn f(x t,y t ),... ,L 10n = 

n −1 ∑ n 

t=1 6a tnb tn c tn f(x t ,y t ). Since 

∣ sup 1 

n∑ 

( ) ∣ 

xk − x ∣∣∣∣ 

t 

x t 

K ε k = o p (h n ), 

nh n g(x t ) h n 

k=1 

then |L 1n |≤o p (h 3 n ) × 1 ∑ n 

n t=1 |f(x t,y t )|=o p (h 3 n ). By (12), |L 2n| ≤O p (h 

∑ 6 n )n−1 

n 

t=1 |f(x t,y t )|=O p (h 6 n ) since E(f 2 (x t ,y t )) < ∞. |L 3n |≤o p (h 6 n ) follows 

directly from sup xt ∈G|w(x t )|=o p (h 2 n ). 

|L 4n |≤ 3 n∑ 

1 

n∑ 

( ) ∣ 

xk − x ∣∣∣∣ 

t 

K ε k 

n ∣nh n g(x t ) h n 

t=1 

( 

× 

∣ 

1 

2nh n g(x t ) 

k=1 

n∑ 

K 

k=1 

×|f(x t ,y t )|≤o p (h n )O p (h 4 n ) 3 n 

( ) 

xk − x t 

(x k − x t ) 2 m (2) (x kt ) 

h n 

) 2 

∣ ∣∣∣∣∣ 

n∑ 

|f(x t ,y t )|=o p (h 5 n ) 

from (12) and the analysis of J 5n in part (b). Similarly, we obtain |L 5n |, |L 10n |≤ 

o p (h 5 n ), |L 6n|, |L 7n |≤o p (h 4 n ) and |L 8n|, |L 9n |≤o p (h 6 n ). Hence, we have L n = 

o p (h 3 n ). 

⊓⊔ 

We now use the results in the Lemma to establish the asymptotic distribution of 

several statistics of interest. 

t=1 

3.1 Estimating conditional variance 

Consider first a special case of the model (5), where ε t |x t ∼ N(0,σ 2 ). Then a 

natural estimator of σ 2 is 

ˆσ 2 = 1 n∑ 

(y t −ˆm(x t )) 2 = 1 n∑ ( 

ε 

2 

n 

n t +2(m(x t )−ˆm(x t ))ε t +(m(x t )−ˆm(x t )) 2) . 

t=1 

t=1 

The difficulty in dealing with such expressions lies in the average terms involving 

(m(x t ) −ˆm(x t )) and (m(x t ) −ˆm(x t )) 2 , since the first term is an average of an 

i.i.d. sequence. By using Corollary 1 and Lemma 1 we have a convenient way to 

establish their asymptotic properties. This is shown in the next theorem. 

Theorem 2 Assume that in model (5), ε t |x t ∼ N(0,σ 2 ) and E(yt 4 )


Proof Note that ˆσ 2 = 1 ∑ n 

( 

n t=1 ε 

2 

t + 2(m(x t ) −ˆm(x t ))ε t + (m(x t ) −ˆm(x t )) 2) . 

Now, letting f(x t ,y t )=−2ε t in part (a) of Lemma 1 we have 1 ∑ n 

n t=1 ε ∫ 

t f(xt ,y t ) 

g(y t |x t )dy t = 0, since E(ε t |x 1 ,... ,x n ) = 0 and 1 2 h2 n σ K 2 E ( m (2) (x t )f (x t ,y t ) ) = 0. 

Hence, 1 ∑ n 

n t=1 2(m(x t) −ˆm(x t ))ε t = o p (n − 1 2 ) + o p (h 2 n ). By part (b) of Lemma 1 

with f(x t ,y t ) = 1wehave 

1 

4 h4 n σ K 4 E((m(2) (x t )) 2 f(x t ,y t )) = O p (h 4 n ) 

and 

σK 2 n∑ h2 n 

ε t m (2) (x t )E(f (x t ,y)|x 1 ,... ,x n ) 

n 

t=1 

= σ K 2 n∑ h2 n 

ε t m (2) (x t ) = O p (n − 1 2 h 

2 

n 

n ) = o p (n − 1 2 ) 

t=1 

by the central limit theorem for i.i.d. sequences. Hence, 1 ∑ n 

n t=1 (m(x t)− ˆm(x t )) 2 = 

o p (n − 1 2 ) + o p (h 3 n ) provided E(y4 t )


(b) If E(y 8 t )


Since E(yt 6)


Hence, 

b 3n = n −1 

n∑ 

t=1 

ε 4 t − µ 4 − 4µ 3 

n 

n∑ 

ε t − 2σK 2 h2 n µ 3E ( m (2) (x t ) ) 

+o p (n −1/2 ) + o p (h 2 n ). (23) 

µ 

Combining (20) and (23), we write b 3n − b 4 

4n (µ 2 

= n −1 ∑ n 

) 2 t=1 W t + A 2n where 

W t = εt 4 + µ 4 − 4ε t µ 3 − 2 µ 4 

µ 2 

εt 2 and A 2n =−2h 2 n σ K 2 µ 3E(m (2) (x t )) + o p (n −1/2 ) + 

o p (h 2 n ). Note that E(W t) = 0 and by the c r -inequality, V(W t ) = E(Wt 2)


√ n( ˆη 2 −η 2 −b 2n ) → d N(0,V(ξ t )) where ξ t = 1 ( 

ε 

2 

V(y t ) t −(1−η 2 )(y t −E(y t )) 2) 

and b 2n = o p (h 2 n ). 

( 

) 

Proof ˆη 2 − η 2 1 

E(y 

=− 1 

∑ n β 

n t=1 (y t −ȳ) 2 1 − β t −m(x t )) 2 

2 V(y t 

where β 

) 

1 = 1 ∑ n 

n t=1 (y t − 

ˆm(x t )) 2 − E(y t − m(x t )) 2 , β 2 = 1 ∑ n 

n t=1 (y t − ȳ) 2 − V(y t ). First, <strong>note</strong> that by the 

law of large numbers 1 ∑ n 

n t=1 y2 p 

t → E(yt 2 ) and ȳ2 → p E(y t ) 2 , hence 

( 

) −1 

1 

n∑ 

(y t − ȳ) 2 p 

→ V(yt ) −1 . (26) 

n 

t=1 

1 

For β 2 , ȳ 2 − (E(y t )) 2 = 1 ∑ n ∑ n 

2 n 2 t=1 k=1 (y ty k + y k y t ) − E(m(x t )) 2 = 1 2 v n − 

(E(m(x t ))) 2 . By Corollary 1, since 1 n E(ψ2 n (y t,y k )) = 4 n E((y2 t ))2 = o(1) for all 

t,k, √ n(v n −û n ) = o p (1) where û n = 2 ∑ n 

n t=1 (ψ 1n(Z t ))−θ n = 4 ∑ n 

n t=1 y tE(m(x t ))− 

2(E(m(x t ))) 2 . Hence, ȳ 2 − (E(y t )) 2 = 2 ∑ n 

n t=1 y tE(m(x t )) − 2(E(m(x t ))) 2 + 

o p (n − 1 2 ), and 

E(y t − m(x t )) 2 

β 2 = 1 n∑ 

{ E(σ 2 (x t )) 

(yt 2 − E(yt 2 V(y t ) n V(y 

t=1 

t ) 

)) 

} 

−2 E(σ2 (x t ))E(m(x t )) 

(y t − E(m(x t ))) + o p (n − 1 2 ). 

V(y t ) 

For β 1 , 1 ∑ n 

n t=1 (y t−ˆm(x t )) 2 = 1 ∑ n 

n t=1( 

ε 

2 

t + 2(m(x t )−ˆm(x t ))ε t +(m(x t )−ˆm(x t )) 2) . 

Similarly to the proof of Theorem 2, we use part (a) of Lemma 1 with f(x t ,y t ) 

=−2ε t to obtain 1 ∑ n 

n t=1 2(m(x t)− ˆm(x t ))ε t = o p (n − 1 2 )+o p (h 2 n ). By part (b) with 

f(x t ,y t ) = 1 we obtain 1 ∑ n 

n t=1 (m(x t)− ˆm(x t )) 2 = o p (n − 1 2 )+o p (h 3 n ). Therefore, 

1 

∑ n 

n t=1 (y t −ˆm(x t )) 2 = 1 ∑ n 

n t=1 ε2 t + o p (n − 1 2 ) + o p (h 2 n ). Hence, 

β 1 − β 2 

E(y t − m(x t )) 2 

V(y t ) 

= 1 n 

n∑ 

t=1 

(ε 2 t − E(σ 2 (x t )) − E(σ2 (x t )) 

V(y t ) 

+ 2E(σ2 (x t ))E(m(x t )) 

(y t − E(m(x t ))) 

V(y t ) 

+o p (n − 1 2 ) + op (h 2 n ) 

= 1 n∑ 

ζ t + o p (n − 1 2 ) + op (h 2 n 

n 

). 

t=1 

(y 2 t − E(y 2 t )) 

Since E(ζ t ) = 0,V(ζ t ) = V(εt 2 − (1 − η 2 )(y t − E(y t )) 2 )


4 Conclusion 

We have established the √ n asymptotic equivalence between V and U statistics 

when their kernels depend on n, the sample size. We combine our result with a 

result of Lee (1988) to obtain the √ n asymptotic equivalence between V statistics 

and U statistics projections. We provide a number of examples and illustrations 

on how our results can be used in nonparametric kernel estimation. The list of 

examples is obviously not exhaustive as our results can be used in much broader 

contexts. 

Acknowledgements 

We would like to thank Professor Katuomi Hirano and two anonymous referees 

for comments and suggestions. 

References 

Ahn, H., Powell, J. L. (1993). Semiparametric estimation of censored selection models with a 

nonparametric selection mechanism. Journal of Econometrics, 58, 3–29. 

Bönner, N., Kirschner, H.-P. (1977). Note on conditions for weak convergence of Von Mises’ 

differentiable statistical functions. The Annals of Statistics, 5, 405–407. 

D’Amico, S. (2003). Density estimation and combination under model ambiguity via a pilot nonparametric 

estimate: an application to stock returns.Working paper, Department of Economics, 

Columbia University (http://www.columbia.edu/∼sd445/). 

Doksum, K., Samarov, A. (1995). Nonparametric estimation of global functionals and a measure 

of the explanatory power of covariates in regression. The Annals of Statistics, 23(5), 

1443–1473. 

Fan, J. (1992). Design adaptive nonparametric regression. Journal of the American Statistical 

Association, 87, 998–1004. 

Fan, J.,Yao, Q. (1998). Efficient estimation of conditional variance functions in stochastic regression. 

Biometrika, 85, 645–660. 

Fan, Y., Li, Q. (1996). Consistent model specification test: omitted variables and semiparametric 

functional forms. Econometrica, 64(4), 865–890. 

Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. Annals of 

Mathematical Statistics, 19, 293–325. 

Hoeffding, W. (1961). The strong law of large numbers for U statistics, Institute of Statistics, 

Mimeo Series 302, University of North Carolina, Chapel Hill, NC. 

Kemp, G. (2000). Semiparametric estimation of a logit model. Working paper, Department of 

Economics, University of Essex (http://www.essex.ac.uk /∼ kempgcr/ papers/logitmx.pdf). 

Lee, A. J. (1990). U-Statistics. New York: Marcel Dekker. 

Lee, B. (1988). Nonparametric tests using a kernel estimation method, Doctoral dissertation, 

Department of Economics, University of Wisconsin, Madison, WI. 

Martins-Filho, C., Yao, F. (2003). A nonparametric model of frontiers, working paper, Department 

of Economics, Oregon State University (http://oregonstate.edu/∼martinsc/martins-filho-yao(03).pdf). 

Powell, J., Stock, J., Stoker, T. (1989). Semiparametric estimation of index coefficients. Econometrica, 

57, 1403–1430. 

Prakasa-Rao, B. L. S. (1983). Nonparametric functional estimation. Orlando, FL: Academic 

Serfling, R. J. (1980). Approximation theorems of mathematical statistics. New York: Wiley. 

Yao, Q., Tong, H. (2000). Nonparametric estimation of ratios of noise to signal in stochastic 

regression. Statistica Sinica, 10, 751–770. 

Zheng, J. X. (1998). A consistent nonparametric test of parametric regression models under 

conditional quantile restrictions. Econometric Theory, 14, 123–138.

A note on the use of V and U statistics in nonparametric models of ...

Create successful ePaper yourself

Delete template?

Save as template?