29.07.2014 Views

Lecture7 Slide - The Department of Statistics and Applied Probability ...

Lecture7 Slide - The Department of Statistics and Applied Probability ...

Lecture7 Slide - The Department of Statistics and Applied Probability ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1<br />

ST5207 Nonparametric Regression, Lecture 7<br />

Lijian Yang<br />

<strong>Department</strong> <strong>of</strong> <strong>Statistics</strong> & <strong>Probability</strong><br />

Michigan State University<br />

East Lansing, MI 48824<br />

<strong>and</strong><br />

<strong>Department</strong> <strong>of</strong> <strong>Statistics</strong> & <strong>Applied</strong> <strong>Probability</strong><br />

National University <strong>of</strong> Singapore<br />

Singapore 117546<br />

ST5207 Nonparametric Regression, 10th March 2005


2<br />

Multivariate nonparametric estimation<br />

• Let { Y i , X T i<br />

} n<br />

i=1 = {Y i, X i1 , ..., X id } n i=1<br />

be i.i.d. sample from model<br />

Y = m (X) + σ (X) ε, X = (X 1 , ..., X d )<br />

where the noise satisfies E (ε|X) = 0, var (ε|X) = 1.<br />

ST5207 Nonparametric Regression, 10th March 2005


3<br />

Multivariate nonparametric estimation<br />

• Let { Y i , X T i<br />

} n<br />

i=1 = {Y i, X i1 , ..., X id } n i=1<br />

be i.i.d. sample from model<br />

Y = m (X) + σ (X) ε, X = (X 1 , ..., X d )<br />

where the noise satisfies E (ε|X) = 0, var (ε|X) = 1.<br />

• How to estimate multivariate function m?<br />

ST5207 Nonparametric Regression, 10th March 2005


4<br />

Multivariate nonparametric estimation<br />

• Let { Y i , X T i<br />

} n<br />

i=1 = {Y i, X i1 , ..., X id } n i=1<br />

be i.i.d. sample from model<br />

Y = m (X) + σ (X) ε, X = (X 1 , ..., X d )<br />

where the noise satisfies E (ε|X) = 0, var (ε|X) = 1.<br />

• How to estimate multivariate function m?<br />

• We will discuss Nadaraya-Watson <strong>and</strong> local linear methods.<br />

ST5207 Nonparametric Regression, 10th March 2005


5<br />

Multivariate nonparametric estimation<br />

• Let { Y i , X T i<br />

} n<br />

i=1 = {Y i, X i1 , ..., X id } n i=1<br />

be i.i.d. sample from model<br />

Y = m (X) + σ (X) ε, X = (X 1 , ..., X d )<br />

where the noise satisfies E (ε|X) = 0, var (ε|X) = 1.<br />

• How to estimate multivariate function m?<br />

• We will discuss Nadaraya-Watson <strong>and</strong> local linear methods.<br />

• Conventions on multivariate kernel <strong>and</strong> b<strong>and</strong>width vector<br />

d∏<br />

( )<br />

1 uα<br />

K h (u) = K , u = (u 1 , ..., u d ) , h = (h 1 , ..., h d )<br />

h α h α<br />

α=1<br />

ST5207 Nonparametric Regression, 10th March 2005


6<br />

Multivariate nonparametric estimation<br />

• Let { Y i , X T i<br />

} n<br />

i=1 = {Y i, X i1 , ..., X id } n i=1<br />

Y = m (X) + σ (X) ε, X = (X 1 , ..., X d )<br />

where the noise satisfies E (ε|X) = 0, var (ε|X) = 1.<br />

• How to estimate multivariate function m?<br />

be i.i.d. sample from model<br />

• We will discuss Nadaraya-Watson <strong>and</strong> local linear methods<br />

• Conventions on multivariate kernel <strong>and</strong> b<strong>and</strong>width vector<br />

d∏<br />

( )<br />

1 uα<br />

K h (u) = K , u = (u 1 , ..., u d ) , h = (h 1 , ..., h d )<br />

h α h α<br />

α=1<br />

• W<strong>and</strong> & Jones (1995). Kernel Smoothing, Chapman <strong>and</strong> Hall,<br />

London, <strong>and</strong> W<strong>and</strong> & Ruppert (1994) (see reference list in syllabus)<br />

use b<strong>and</strong>width matrix, instead <strong>of</strong> vector.<br />

ST5207 Nonparametric Regression, 10th March 2005


7<br />

Multivariate nonparametric estimation<br />

• NW estimator<br />

ˆm (x) = arg min<br />

c<br />

n ∑<br />

i=1<br />

(Y i − c) 2 w i (x) , w i (x) = K h (X i − x)<br />

ST5207 Nonparametric Regression, 10th March 2005


8<br />

Multivariate nonparametric estimation<br />

• NW estimator<br />

ˆm (x) = arg min<br />

c<br />

n ∑<br />

i=1<br />

(Y i − c) 2 w i (x) , w i (x) = K h (X i − x)<br />

• <strong>The</strong> explicit formula is<br />

ˆm (x) =<br />

n∑<br />

i=1<br />

n∑<br />

i=1<br />

Y i K h (X i − x)<br />

K h (X i − x)<br />

ST5207 Nonparametric Regression, 10th March 2005


9<br />

Multivariate nonparametric estimation<br />

• NW estimator<br />

ˆm (x) = arg min<br />

c<br />

n ∑<br />

i=1<br />

(Y i − c) 2 w i (x) , w i (x) = K h (X i − x)<br />

• <strong>The</strong> explicit formula is<br />

ˆm (x) =<br />

n∑<br />

i=1<br />

n∑<br />

i=1<br />

Y i K h (X i − x)<br />

K h (X i − x)<br />

• Limiting distribution is<br />

√<br />

nh1 · · · h d<br />

{<br />

ˆm (x) − m (x) −<br />

}<br />

d∑<br />

h 2 αb α (x)<br />

α=1<br />

D<br />

→ N {0, v(x)}<br />

ST5207 Nonparametric Regression, 10th March 2005


10<br />

Multivariate nonparametric estimation<br />

• Bias <strong>and</strong> variance functions are<br />

{ 1 ∂ 2 m<br />

b α (x) = µ 2 (K)<br />

2 ∂ 2 (x) + ∂m (x) ∂f<br />

}<br />

(x) f −1 (x)<br />

x α ∂x α ∂x α<br />

{∫<br />

v(x) = σ 2 (x)f −1 (x)<br />

} d<br />

K 2 (u) du<br />

ST5207 Nonparametric Regression, 10th March 2005


11<br />

Multivariate nonparametric estimation<br />

• Bias <strong>and</strong> variance functions are<br />

{ 1 ∂ 2 m<br />

b α (x) = µ 2 (K)<br />

2 ∂ 2 (x) + ∂m (x) ∂f<br />

}<br />

(x) f −1 (x)<br />

x α ∂x α ∂x α<br />

{∫<br />

v(x) = σ 2 (x)f −1 (x)<br />

} d<br />

K 2 (u) du<br />

• <strong>The</strong> local linear estimator (to be discussed next) has limiting<br />

distribution <strong>of</strong> the same form, but with<br />

b α (x) = µ 2 (K) 1 2<br />

∂ 2 m<br />

∂ 2 x α<br />

(x)<br />

ST5207 Nonparametric Regression, 10th March 2005


12<br />

Multivariate nonparametric estimation<br />

• Bias <strong>and</strong> variance functions are<br />

{ 1<br />

b α (x) = µ 2 (K)<br />

2<br />

{∫<br />

v(x) = σ 2 (x)f −1 (x)<br />

∂ 2 m<br />

∂ 2 (x) + ∂m (x) ∂f (x) f −1 (x)<br />

x α ∂x α ∂x α<br />

} d<br />

K 2 (u) du<br />

• <strong>The</strong> local linear estimator (to be discussed next) has limiting<br />

distribution <strong>of</strong> the same form, but with<br />

b α (x) = µ 2 (K) 1 2<br />

∂ 2 m<br />

∂ 2 x α<br />

(x)<br />

• <strong>The</strong> local linear weighted least squares problem is<br />

n∑ {<br />

2<br />

{ ˆm (x) , ∇m (x)} = arg min Y i − a − (X i − x) b} T wi (x)<br />

a,b<br />

i=1<br />

}<br />

ST5207 Nonparametric Regression, 10th March 2005


13<br />

Multivariate local linear estimation<br />

• Matrices: W = W (x) = diag { n −1 K h (X i − x) }<br />

X = X (x) =<br />

m =<br />

⎛<br />

⎜<br />

⎝<br />

⎛<br />

⎜<br />

⎝<br />

⎞ ⎛<br />

1, (X 1 − x) T<br />

1, (X 2 − x) T<br />

, Y =<br />

· · · ⎟ ⎜<br />

⎠ ⎝<br />

1, (X n − x) T<br />

m (X 1 )<br />

m (X 2 )<br />

· · ·<br />

m (X n )<br />

⎞<br />

⎛<br />

, e =<br />

⎟ ⎜<br />

⎠ ⎝<br />

ε 1<br />

ε 2<br />

· · ·<br />

ε n<br />

⎞<br />

⎟<br />

⎠<br />

Y 1<br />

Y 2<br />

· · ·<br />

Y n<br />

⎞<br />

⎟<br />

⎠<br />

ST5207 Nonparametric Regression, 10th March 2005


14<br />

Multivariate local linear estimation<br />

• Matrices: W = W (x) = diag { n −1 K h (X i − x) }<br />

•<br />

X = X (x) =<br />

m =<br />

⎛<br />

⎜<br />

⎝<br />

⎛<br />

⎜<br />

⎝<br />

⎞ ⎛<br />

1, (X 1 − x) T<br />

1, (X 2 − x) T<br />

, Y =<br />

· · · ⎟ ⎜<br />

⎠ ⎝<br />

1, (X n − x) T<br />

m (X 1 )<br />

m (X 2 )<br />

· · ·<br />

m (X n )<br />

⎞<br />

⎛<br />

, e =<br />

⎟ ⎜<br />

⎠ ⎝<br />

ε 1<br />

ε 2<br />

· · ·<br />

ε n<br />

⎞<br />

⎟<br />

⎠<br />

Y 1<br />

Y 2<br />

· · ·<br />

Y n<br />

{ ˆm (x) , ∇m (x)} T = ( X T WX ) −1<br />

X T WY<br />

⎞<br />

⎟<br />

⎠<br />

ST5207 Nonparametric Regression, 10th March 2005


15<br />

Multivariate local linear estimation<br />

• Separately, the estimators are (α = 1, ..., d)<br />

ˆm (x) = e T 0<br />

(<br />

X T WX ) −1<br />

X T WY, e T 0 = (1, 0, ..., 0)<br />

̂∂m<br />

∂x α<br />

(x) = e T α<br />

(<br />

X T WX ) −1<br />

X T WY, e T α = (0, 0, ..., 0, 1, 0..., 0)<br />

ST5207 Nonparametric Regression, 10th March 2005


16<br />

Multivariate local linear estimation<br />

• Separately, the estimators are (α = 1, ..., d)<br />

ˆm (x) = e T 0<br />

(<br />

X T WX ) −1<br />

X T WY, e T 0 = (1, 0, ..., 0)<br />

̂∂m<br />

∂x α<br />

(x) = e T α<br />

(<br />

X T WX ) −1<br />

X T WY, e T α = (0, 0, ..., 0, 1, 0..., 0)<br />

• Corresponding functions are (α = 1, ..., d)<br />

m (x) = m (x) e T (<br />

0 X T WX ) −1<br />

X T WXe 0<br />

∂m<br />

(x) = ∂m (x) e T (<br />

α X T WX ) −1<br />

X T WXe α<br />

∂x α ∂x α<br />

ST5207 Nonparametric Regression, 10th March 2005


17<br />

Multivariate local linear estimation<br />

• Separately, the estimators are (α = 1, ..., d)<br />

ˆm (x) = e T 0<br />

(<br />

X T WX ) −1<br />

X T WY, e T 0 = (1, 0, ..., 0)<br />

̂∂m<br />

∂x α<br />

(x) = e T α<br />

(<br />

X T WX ) −1<br />

X T WY, e T α = (0, 0, ..., 0, 1, 0..., 0)<br />

• Corresponding functions are (α = 1, ..., d)<br />

m (x) = m (x) e T (<br />

0 X T WX ) −1<br />

X T WXe 0<br />

∂m<br />

(x) = ∂m (x) e T (<br />

α X T WX ) −1<br />

X T WXe α<br />

∂x α ∂x α<br />

• In addition, observe that<br />

e T 0<br />

(<br />

X T WX ) −1<br />

X T W<br />

d∑<br />

α=1<br />

∂m<br />

∂x α<br />

(x) Xe α = 0<br />

ST5207 Nonparametric Regression, 10th March 2005


18<br />

Multivariate local linear estimation<br />

• <strong>The</strong> error decomposition for ˆm (x) is<br />

ˆm (x) − m (x) = e T 0<br />

(<br />

X T WX ) −1<br />

X T We+<br />

e T 0<br />

(<br />

X T WX ) −1<br />

X T Wm−m (x) e T 0<br />

(<br />

X T WX ) −1<br />

X T WXe 0<br />

−e T 0<br />

(<br />

X T WX ) −1<br />

X T W<br />

d∑<br />

α=1<br />

∂m<br />

∂x α<br />

(x) Xe α<br />

ST5207 Nonparametric Regression, 10th March 2005


19<br />

Multivariate local linear estimation<br />

• <strong>The</strong> error decomposition for ˆm (x) is<br />

ˆm (x) − m (x) = e T 0<br />

(<br />

X T WX ) −1<br />

X T We+<br />

e T 0<br />

(<br />

X T WX ) −1<br />

X T Wm−m (x) e T 0<br />

(<br />

X T WX ) −1<br />

X T WXe 0<br />

−e T 0<br />

• Which becomes<br />

(<br />

X T WX ) −1<br />

X T W<br />

d∑<br />

α=1<br />

∂m<br />

∂x α<br />

(x) Xe α<br />

e T 0<br />

ˆm (x) − m (x) = e T (<br />

0 X T WX ) −1<br />

X T We+<br />

(<br />

X T WX ) {<br />

}<br />

−1<br />

d∑<br />

X T ∂m<br />

W m − m (x) Xe 0 − (x) Xe α<br />

∂x α<br />

α=1<br />

ST5207 Nonparametric Regression, 10th March 2005


20<br />

Multivariate local linear estimation<br />

• <strong>The</strong> limiting distribution for ˆm (x) is<br />

{<br />

√ d∑<br />

nh1 · · · h d ˆm (x) − m (x) −<br />

α=1<br />

h 2 αb α (x)<br />

}<br />

D<br />

→ N {0, v(x)}<br />

b α (x) = d K<br />

2<br />

∂ 2 m<br />

∂ 2 x α<br />

(x) , v(x) = σ 2 (x)f −1 (x)c d K<br />

ST5207 Nonparametric Regression, 10th March 2005


21<br />

Multivariate local linear estimation<br />

• <strong>The</strong> limiting distribution for ˆm (x) is<br />

{<br />

√ d∑<br />

nh1 · · · h d ˆm (x) − m (x) −<br />

b α (x) = d K<br />

2<br />

α=1<br />

h 2 αb α (x)<br />

}<br />

D<br />

→ N {0, v(x)}<br />

∂ 2 m<br />

∂ 2 x α<br />

(x) , v(x) = σ 2 (x)f −1 (x)c d K<br />

• <strong>The</strong> Asymptotic MISE (AMISE { ˆm (x) ; h}) is<br />

∫<br />

σ 2 (x)dxc d K<br />

nh 1 · · · h d<br />

+ d2 K<br />

4<br />

d∑<br />

α,β=1<br />

h 2 αh 2 β<br />

∫<br />

∂ 2 m<br />

∂ 2 (x) ∂2 m<br />

x α ∂ 2 (x) f(x)dx<br />

x β<br />

h opt = v (m, σ, K) n −1/(d+4) , AMISE { ˆm (x) ; h opt } = C (m, σ, K) n −4/(d+4)<br />

ST5207 Nonparametric Regression, 10th March 2005


22<br />

Multivariate local linear estimation<br />

• <strong>The</strong> limiting distribution for ˆm (x) is<br />

{<br />

√ d∑<br />

nh1 · · · h d ˆm (x) − m (x) −<br />

b α (x) = d K<br />

2<br />

α=1<br />

h 2 αb α (x)<br />

}<br />

D<br />

→ N {0, v(x)}<br />

∂ 2 m<br />

∂ 2 x α<br />

(x) , v(x) = σ 2 (x)f −1 (x)c d K<br />

• <strong>The</strong> Asymptotic MISE (AMISE { ˆm (x) ; h}) is<br />

∫<br />

σ 2 (x)dxc d d∑<br />

∫<br />

K<br />

+ d2 K<br />

∂<br />

h 2<br />

nh 1 · · · h d 4<br />

αh 2 2 m<br />

β<br />

∂ 2 (x) ∂2 m<br />

x α ∂ 2 (x) f(x)dx<br />

x β<br />

α,β=1<br />

h opt = v (m, σ, K) n −1/(d+4) , AMISE { ˆm (x) ; h opt } = C (m, σ, K) n −4/(d+4)<br />

• <strong>The</strong> ”curse <strong>of</strong> dimensionality”: slower convergence rate n −2/(d+4)<br />

with high dimension d (Intuitively, why?)<br />

ST5207 Nonparametric Regression, 10th March 2005


23<br />

Computing <strong>and</strong> dimension reduction<br />

• In XploRe, there are two related quantlets, “lregxestp” for local<br />

linear <strong>and</strong> “regxestp” for NW estimators<br />

ST5207 Nonparametric Regression, 10th March 2005


24<br />

Computing <strong>and</strong> dimension reduction<br />

• In XploRe, there are two related quantlets, “lregxestp” for local<br />

linear <strong>and</strong> “regxestp” for NW estimators<br />

• We show the output on an example <strong>of</strong> n = 200 observations<br />

generated with m (x) = cos (x 1 ) + cos (x 2 ) for X distributed<br />

uniformly on [−π, π]<br />

ST5207 Nonparametric Regression, 10th March 2005


25<br />

Computing <strong>and</strong> dimension reduction<br />

• In XploRe, there are two related quantlets, “lregxestp” for local<br />

linear <strong>and</strong> “regxestp” for NW estimators<br />

• We show the output on an example <strong>of</strong> n = 200 observations<br />

generated with m (x) = cos (x 1 ) + cos (x 2 ) for X distributed<br />

uniformly on [−π, π]<br />

• One natural way to ”reduce” dimension is additive model. This<br />

means that<br />

d∑<br />

m (x) = c + m α (x α )<br />

α=1<br />

with the identification conditions Em α (X α ) ≡ 0, α = 1, ..., d.<br />

ST5207 Nonparametric Regression, 10th March 2005


26<br />

Computing <strong>and</strong> dimension reduction<br />

• In XploRe, there are two related quantlets, “lregxestp” for local<br />

linear <strong>and</strong> “regxestp” for NW estimators<br />

• We show the output on an example <strong>of</strong> n = 200 observations<br />

generated with m (x) = cos (x 1 ) + cos (x 2 ) for X distributed<br />

uniformly on [−π, π]<br />

• One natural way to ”reduce” dimension is additive model. This<br />

means that<br />

d∑<br />

m (x) = c + m α (x α )<br />

α=1<br />

with the identification conditions Em α (X α ) ≡ 0, α = 1, ..., d.<br />

• In XploRe, there are two related quantlets, “backfit” for backfitting<br />

<strong>and</strong> “intest” for integration estimators <strong>of</strong> additive model.<br />

ST5207 Nonparametric Regression, 10th March 2005

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!