Lecture7 Slide - The Department of Statistics and Applied Probability ...
Lecture7 Slide - The Department of Statistics and Applied Probability ...
Lecture7 Slide - The Department of Statistics and Applied Probability ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
1<br />
ST5207 Nonparametric Regression, Lecture 7<br />
Lijian Yang<br />
<strong>Department</strong> <strong>of</strong> <strong>Statistics</strong> & <strong>Probability</strong><br />
Michigan State University<br />
East Lansing, MI 48824<br />
<strong>and</strong><br />
<strong>Department</strong> <strong>of</strong> <strong>Statistics</strong> & <strong>Applied</strong> <strong>Probability</strong><br />
National University <strong>of</strong> Singapore<br />
Singapore 117546<br />
ST5207 Nonparametric Regression, 10th March 2005
2<br />
Multivariate nonparametric estimation<br />
• Let { Y i , X T i<br />
} n<br />
i=1 = {Y i, X i1 , ..., X id } n i=1<br />
be i.i.d. sample from model<br />
Y = m (X) + σ (X) ε, X = (X 1 , ..., X d )<br />
where the noise satisfies E (ε|X) = 0, var (ε|X) = 1.<br />
ST5207 Nonparametric Regression, 10th March 2005
3<br />
Multivariate nonparametric estimation<br />
• Let { Y i , X T i<br />
} n<br />
i=1 = {Y i, X i1 , ..., X id } n i=1<br />
be i.i.d. sample from model<br />
Y = m (X) + σ (X) ε, X = (X 1 , ..., X d )<br />
where the noise satisfies E (ε|X) = 0, var (ε|X) = 1.<br />
• How to estimate multivariate function m?<br />
ST5207 Nonparametric Regression, 10th March 2005
4<br />
Multivariate nonparametric estimation<br />
• Let { Y i , X T i<br />
} n<br />
i=1 = {Y i, X i1 , ..., X id } n i=1<br />
be i.i.d. sample from model<br />
Y = m (X) + σ (X) ε, X = (X 1 , ..., X d )<br />
where the noise satisfies E (ε|X) = 0, var (ε|X) = 1.<br />
• How to estimate multivariate function m?<br />
• We will discuss Nadaraya-Watson <strong>and</strong> local linear methods.<br />
ST5207 Nonparametric Regression, 10th March 2005
5<br />
Multivariate nonparametric estimation<br />
• Let { Y i , X T i<br />
} n<br />
i=1 = {Y i, X i1 , ..., X id } n i=1<br />
be i.i.d. sample from model<br />
Y = m (X) + σ (X) ε, X = (X 1 , ..., X d )<br />
where the noise satisfies E (ε|X) = 0, var (ε|X) = 1.<br />
• How to estimate multivariate function m?<br />
• We will discuss Nadaraya-Watson <strong>and</strong> local linear methods.<br />
• Conventions on multivariate kernel <strong>and</strong> b<strong>and</strong>width vector<br />
d∏<br />
( )<br />
1 uα<br />
K h (u) = K , u = (u 1 , ..., u d ) , h = (h 1 , ..., h d )<br />
h α h α<br />
α=1<br />
ST5207 Nonparametric Regression, 10th March 2005
6<br />
Multivariate nonparametric estimation<br />
• Let { Y i , X T i<br />
} n<br />
i=1 = {Y i, X i1 , ..., X id } n i=1<br />
Y = m (X) + σ (X) ε, X = (X 1 , ..., X d )<br />
where the noise satisfies E (ε|X) = 0, var (ε|X) = 1.<br />
• How to estimate multivariate function m?<br />
be i.i.d. sample from model<br />
• We will discuss Nadaraya-Watson <strong>and</strong> local linear methods<br />
• Conventions on multivariate kernel <strong>and</strong> b<strong>and</strong>width vector<br />
d∏<br />
( )<br />
1 uα<br />
K h (u) = K , u = (u 1 , ..., u d ) , h = (h 1 , ..., h d )<br />
h α h α<br />
α=1<br />
• W<strong>and</strong> & Jones (1995). Kernel Smoothing, Chapman <strong>and</strong> Hall,<br />
London, <strong>and</strong> W<strong>and</strong> & Ruppert (1994) (see reference list in syllabus)<br />
use b<strong>and</strong>width matrix, instead <strong>of</strong> vector.<br />
ST5207 Nonparametric Regression, 10th March 2005
7<br />
Multivariate nonparametric estimation<br />
• NW estimator<br />
ˆm (x) = arg min<br />
c<br />
n ∑<br />
i=1<br />
(Y i − c) 2 w i (x) , w i (x) = K h (X i − x)<br />
ST5207 Nonparametric Regression, 10th March 2005
8<br />
Multivariate nonparametric estimation<br />
• NW estimator<br />
ˆm (x) = arg min<br />
c<br />
n ∑<br />
i=1<br />
(Y i − c) 2 w i (x) , w i (x) = K h (X i − x)<br />
• <strong>The</strong> explicit formula is<br />
ˆm (x) =<br />
n∑<br />
i=1<br />
n∑<br />
i=1<br />
Y i K h (X i − x)<br />
K h (X i − x)<br />
ST5207 Nonparametric Regression, 10th March 2005
9<br />
Multivariate nonparametric estimation<br />
• NW estimator<br />
ˆm (x) = arg min<br />
c<br />
n ∑<br />
i=1<br />
(Y i − c) 2 w i (x) , w i (x) = K h (X i − x)<br />
• <strong>The</strong> explicit formula is<br />
ˆm (x) =<br />
n∑<br />
i=1<br />
n∑<br />
i=1<br />
Y i K h (X i − x)<br />
K h (X i − x)<br />
• Limiting distribution is<br />
√<br />
nh1 · · · h d<br />
{<br />
ˆm (x) − m (x) −<br />
}<br />
d∑<br />
h 2 αb α (x)<br />
α=1<br />
D<br />
→ N {0, v(x)}<br />
ST5207 Nonparametric Regression, 10th March 2005
10<br />
Multivariate nonparametric estimation<br />
• Bias <strong>and</strong> variance functions are<br />
{ 1 ∂ 2 m<br />
b α (x) = µ 2 (K)<br />
2 ∂ 2 (x) + ∂m (x) ∂f<br />
}<br />
(x) f −1 (x)<br />
x α ∂x α ∂x α<br />
{∫<br />
v(x) = σ 2 (x)f −1 (x)<br />
} d<br />
K 2 (u) du<br />
ST5207 Nonparametric Regression, 10th March 2005
11<br />
Multivariate nonparametric estimation<br />
• Bias <strong>and</strong> variance functions are<br />
{ 1 ∂ 2 m<br />
b α (x) = µ 2 (K)<br />
2 ∂ 2 (x) + ∂m (x) ∂f<br />
}<br />
(x) f −1 (x)<br />
x α ∂x α ∂x α<br />
{∫<br />
v(x) = σ 2 (x)f −1 (x)<br />
} d<br />
K 2 (u) du<br />
• <strong>The</strong> local linear estimator (to be discussed next) has limiting<br />
distribution <strong>of</strong> the same form, but with<br />
b α (x) = µ 2 (K) 1 2<br />
∂ 2 m<br />
∂ 2 x α<br />
(x)<br />
ST5207 Nonparametric Regression, 10th March 2005
12<br />
Multivariate nonparametric estimation<br />
• Bias <strong>and</strong> variance functions are<br />
{ 1<br />
b α (x) = µ 2 (K)<br />
2<br />
{∫<br />
v(x) = σ 2 (x)f −1 (x)<br />
∂ 2 m<br />
∂ 2 (x) + ∂m (x) ∂f (x) f −1 (x)<br />
x α ∂x α ∂x α<br />
} d<br />
K 2 (u) du<br />
• <strong>The</strong> local linear estimator (to be discussed next) has limiting<br />
distribution <strong>of</strong> the same form, but with<br />
b α (x) = µ 2 (K) 1 2<br />
∂ 2 m<br />
∂ 2 x α<br />
(x)<br />
• <strong>The</strong> local linear weighted least squares problem is<br />
n∑ {<br />
2<br />
{ ˆm (x) , ∇m (x)} = arg min Y i − a − (X i − x) b} T wi (x)<br />
a,b<br />
i=1<br />
}<br />
ST5207 Nonparametric Regression, 10th March 2005
13<br />
Multivariate local linear estimation<br />
• Matrices: W = W (x) = diag { n −1 K h (X i − x) }<br />
X = X (x) =<br />
m =<br />
⎛<br />
⎜<br />
⎝<br />
⎛<br />
⎜<br />
⎝<br />
⎞ ⎛<br />
1, (X 1 − x) T<br />
1, (X 2 − x) T<br />
, Y =<br />
· · · ⎟ ⎜<br />
⎠ ⎝<br />
1, (X n − x) T<br />
m (X 1 )<br />
m (X 2 )<br />
· · ·<br />
m (X n )<br />
⎞<br />
⎛<br />
, e =<br />
⎟ ⎜<br />
⎠ ⎝<br />
ε 1<br />
ε 2<br />
· · ·<br />
ε n<br />
⎞<br />
⎟<br />
⎠<br />
Y 1<br />
Y 2<br />
· · ·<br />
Y n<br />
⎞<br />
⎟<br />
⎠<br />
ST5207 Nonparametric Regression, 10th March 2005
14<br />
Multivariate local linear estimation<br />
• Matrices: W = W (x) = diag { n −1 K h (X i − x) }<br />
•<br />
X = X (x) =<br />
m =<br />
⎛<br />
⎜<br />
⎝<br />
⎛<br />
⎜<br />
⎝<br />
⎞ ⎛<br />
1, (X 1 − x) T<br />
1, (X 2 − x) T<br />
, Y =<br />
· · · ⎟ ⎜<br />
⎠ ⎝<br />
1, (X n − x) T<br />
m (X 1 )<br />
m (X 2 )<br />
· · ·<br />
m (X n )<br />
⎞<br />
⎛<br />
, e =<br />
⎟ ⎜<br />
⎠ ⎝<br />
ε 1<br />
ε 2<br />
· · ·<br />
ε n<br />
⎞<br />
⎟<br />
⎠<br />
Y 1<br />
Y 2<br />
· · ·<br />
Y n<br />
{ ˆm (x) , ∇m (x)} T = ( X T WX ) −1<br />
X T WY<br />
⎞<br />
⎟<br />
⎠<br />
ST5207 Nonparametric Regression, 10th March 2005
15<br />
Multivariate local linear estimation<br />
• Separately, the estimators are (α = 1, ..., d)<br />
ˆm (x) = e T 0<br />
(<br />
X T WX ) −1<br />
X T WY, e T 0 = (1, 0, ..., 0)<br />
̂∂m<br />
∂x α<br />
(x) = e T α<br />
(<br />
X T WX ) −1<br />
X T WY, e T α = (0, 0, ..., 0, 1, 0..., 0)<br />
ST5207 Nonparametric Regression, 10th March 2005
16<br />
Multivariate local linear estimation<br />
• Separately, the estimators are (α = 1, ..., d)<br />
ˆm (x) = e T 0<br />
(<br />
X T WX ) −1<br />
X T WY, e T 0 = (1, 0, ..., 0)<br />
̂∂m<br />
∂x α<br />
(x) = e T α<br />
(<br />
X T WX ) −1<br />
X T WY, e T α = (0, 0, ..., 0, 1, 0..., 0)<br />
• Corresponding functions are (α = 1, ..., d)<br />
m (x) = m (x) e T (<br />
0 X T WX ) −1<br />
X T WXe 0<br />
∂m<br />
(x) = ∂m (x) e T (<br />
α X T WX ) −1<br />
X T WXe α<br />
∂x α ∂x α<br />
ST5207 Nonparametric Regression, 10th March 2005
17<br />
Multivariate local linear estimation<br />
• Separately, the estimators are (α = 1, ..., d)<br />
ˆm (x) = e T 0<br />
(<br />
X T WX ) −1<br />
X T WY, e T 0 = (1, 0, ..., 0)<br />
̂∂m<br />
∂x α<br />
(x) = e T α<br />
(<br />
X T WX ) −1<br />
X T WY, e T α = (0, 0, ..., 0, 1, 0..., 0)<br />
• Corresponding functions are (α = 1, ..., d)<br />
m (x) = m (x) e T (<br />
0 X T WX ) −1<br />
X T WXe 0<br />
∂m<br />
(x) = ∂m (x) e T (<br />
α X T WX ) −1<br />
X T WXe α<br />
∂x α ∂x α<br />
• In addition, observe that<br />
e T 0<br />
(<br />
X T WX ) −1<br />
X T W<br />
d∑<br />
α=1<br />
∂m<br />
∂x α<br />
(x) Xe α = 0<br />
ST5207 Nonparametric Regression, 10th March 2005
18<br />
Multivariate local linear estimation<br />
• <strong>The</strong> error decomposition for ˆm (x) is<br />
ˆm (x) − m (x) = e T 0<br />
(<br />
X T WX ) −1<br />
X T We+<br />
e T 0<br />
(<br />
X T WX ) −1<br />
X T Wm−m (x) e T 0<br />
(<br />
X T WX ) −1<br />
X T WXe 0<br />
−e T 0<br />
(<br />
X T WX ) −1<br />
X T W<br />
d∑<br />
α=1<br />
∂m<br />
∂x α<br />
(x) Xe α<br />
ST5207 Nonparametric Regression, 10th March 2005
19<br />
Multivariate local linear estimation<br />
• <strong>The</strong> error decomposition for ˆm (x) is<br />
ˆm (x) − m (x) = e T 0<br />
(<br />
X T WX ) −1<br />
X T We+<br />
e T 0<br />
(<br />
X T WX ) −1<br />
X T Wm−m (x) e T 0<br />
(<br />
X T WX ) −1<br />
X T WXe 0<br />
−e T 0<br />
• Which becomes<br />
(<br />
X T WX ) −1<br />
X T W<br />
d∑<br />
α=1<br />
∂m<br />
∂x α<br />
(x) Xe α<br />
e T 0<br />
ˆm (x) − m (x) = e T (<br />
0 X T WX ) −1<br />
X T We+<br />
(<br />
X T WX ) {<br />
}<br />
−1<br />
d∑<br />
X T ∂m<br />
W m − m (x) Xe 0 − (x) Xe α<br />
∂x α<br />
α=1<br />
ST5207 Nonparametric Regression, 10th March 2005
20<br />
Multivariate local linear estimation<br />
• <strong>The</strong> limiting distribution for ˆm (x) is<br />
{<br />
√ d∑<br />
nh1 · · · h d ˆm (x) − m (x) −<br />
α=1<br />
h 2 αb α (x)<br />
}<br />
D<br />
→ N {0, v(x)}<br />
b α (x) = d K<br />
2<br />
∂ 2 m<br />
∂ 2 x α<br />
(x) , v(x) = σ 2 (x)f −1 (x)c d K<br />
ST5207 Nonparametric Regression, 10th March 2005
21<br />
Multivariate local linear estimation<br />
• <strong>The</strong> limiting distribution for ˆm (x) is<br />
{<br />
√ d∑<br />
nh1 · · · h d ˆm (x) − m (x) −<br />
b α (x) = d K<br />
2<br />
α=1<br />
h 2 αb α (x)<br />
}<br />
D<br />
→ N {0, v(x)}<br />
∂ 2 m<br />
∂ 2 x α<br />
(x) , v(x) = σ 2 (x)f −1 (x)c d K<br />
• <strong>The</strong> Asymptotic MISE (AMISE { ˆm (x) ; h}) is<br />
∫<br />
σ 2 (x)dxc d K<br />
nh 1 · · · h d<br />
+ d2 K<br />
4<br />
d∑<br />
α,β=1<br />
h 2 αh 2 β<br />
∫<br />
∂ 2 m<br />
∂ 2 (x) ∂2 m<br />
x α ∂ 2 (x) f(x)dx<br />
x β<br />
h opt = v (m, σ, K) n −1/(d+4) , AMISE { ˆm (x) ; h opt } = C (m, σ, K) n −4/(d+4)<br />
ST5207 Nonparametric Regression, 10th March 2005
22<br />
Multivariate local linear estimation<br />
• <strong>The</strong> limiting distribution for ˆm (x) is<br />
{<br />
√ d∑<br />
nh1 · · · h d ˆm (x) − m (x) −<br />
b α (x) = d K<br />
2<br />
α=1<br />
h 2 αb α (x)<br />
}<br />
D<br />
→ N {0, v(x)}<br />
∂ 2 m<br />
∂ 2 x α<br />
(x) , v(x) = σ 2 (x)f −1 (x)c d K<br />
• <strong>The</strong> Asymptotic MISE (AMISE { ˆm (x) ; h}) is<br />
∫<br />
σ 2 (x)dxc d d∑<br />
∫<br />
K<br />
+ d2 K<br />
∂<br />
h 2<br />
nh 1 · · · h d 4<br />
αh 2 2 m<br />
β<br />
∂ 2 (x) ∂2 m<br />
x α ∂ 2 (x) f(x)dx<br />
x β<br />
α,β=1<br />
h opt = v (m, σ, K) n −1/(d+4) , AMISE { ˆm (x) ; h opt } = C (m, σ, K) n −4/(d+4)<br />
• <strong>The</strong> ”curse <strong>of</strong> dimensionality”: slower convergence rate n −2/(d+4)<br />
with high dimension d (Intuitively, why?)<br />
ST5207 Nonparametric Regression, 10th March 2005
23<br />
Computing <strong>and</strong> dimension reduction<br />
• In XploRe, there are two related quantlets, “lregxestp” for local<br />
linear <strong>and</strong> “regxestp” for NW estimators<br />
ST5207 Nonparametric Regression, 10th March 2005
24<br />
Computing <strong>and</strong> dimension reduction<br />
• In XploRe, there are two related quantlets, “lregxestp” for local<br />
linear <strong>and</strong> “regxestp” for NW estimators<br />
• We show the output on an example <strong>of</strong> n = 200 observations<br />
generated with m (x) = cos (x 1 ) + cos (x 2 ) for X distributed<br />
uniformly on [−π, π]<br />
ST5207 Nonparametric Regression, 10th March 2005
25<br />
Computing <strong>and</strong> dimension reduction<br />
• In XploRe, there are two related quantlets, “lregxestp” for local<br />
linear <strong>and</strong> “regxestp” for NW estimators<br />
• We show the output on an example <strong>of</strong> n = 200 observations<br />
generated with m (x) = cos (x 1 ) + cos (x 2 ) for X distributed<br />
uniformly on [−π, π]<br />
• One natural way to ”reduce” dimension is additive model. This<br />
means that<br />
d∑<br />
m (x) = c + m α (x α )<br />
α=1<br />
with the identification conditions Em α (X α ) ≡ 0, α = 1, ..., d.<br />
ST5207 Nonparametric Regression, 10th March 2005
26<br />
Computing <strong>and</strong> dimension reduction<br />
• In XploRe, there are two related quantlets, “lregxestp” for local<br />
linear <strong>and</strong> “regxestp” for NW estimators<br />
• We show the output on an example <strong>of</strong> n = 200 observations<br />
generated with m (x) = cos (x 1 ) + cos (x 2 ) for X distributed<br />
uniformly on [−π, π]<br />
• One natural way to ”reduce” dimension is additive model. This<br />
means that<br />
d∑<br />
m (x) = c + m α (x α )<br />
α=1<br />
with the identification conditions Em α (X α ) ≡ 0, α = 1, ..., d.<br />
• In XploRe, there are two related quantlets, “backfit” for backfitting<br />
<strong>and</strong> “intest” for integration estimators <strong>of</strong> additive model.<br />
ST5207 Nonparametric Regression, 10th March 2005