07.08.2013 Views

Discussion of the presentation: Comparing and Selecting Spatial ...

Discussion of the presentation: Comparing and Selecting Spatial ...

Discussion of the presentation: Comparing and Selecting Spatial ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Discussion</strong> <strong>of</strong> <strong>the</strong> <strong>presentation</strong>:<br />

<strong>Comparing</strong> <strong>and</strong> <strong>Selecting</strong> <strong>Spatial</strong> Predictors Using<br />

Local Criteria<br />

by Bradley, Cressie, <strong>and</strong> Shi<br />

Monica Musio<br />

University <strong>of</strong> Cagliari<br />

Workshop<br />

March 21-23 2013, Padova<br />

1 / 10


Observation process given by<br />

Z(s) = Y (s) + ɛ(s); s ∈ D<br />

where D = {u1,··· ,uN } ⊂ R d is a spatial lattice,<br />

Y (·) is a hidden process <strong>and</strong> ɛ(·) is a measurement-error process<br />

independent <strong>of</strong> Y (·)<br />

◮ Start from an arbitrary set <strong>of</strong> predictors: Traditional<br />

stationary kriging (TSK), smoothing splines (SSP),<br />

negative-exponential distance weighting (EDW), Fixed Rank<br />

Kriging (FRK), a modified predictive processes approach<br />

(MPP), a stochastic partial differential equation approach<br />

(SPD) <strong>and</strong> lattice kriging (LTK)<br />

◮ Choose between <strong>the</strong> various predictors separately at each site,<br />

using possibly different predictors in different places. At each<br />

site some measure <strong>of</strong> predictive performance is optimised.<br />

2 / 10


◮ Local squared validation error:<br />

LSVE (k) (ui,Z trn ,H val ) =<br />

1<br />

|H val (ui)|<br />

<br />

s∈H val (ui)<br />

(Z(s)− Y (k) (s,Z trn )) 2<br />

k ∗ (ui,Z trn ,H val ) = argmin{LSVE (k) (ui,Z trn ,H val ),k = 1,··· ,K}<br />

Locally selected predictor<br />

Y LSP = ( Y (k∗ ) ,i = 1,··· ,N ) ′<br />

◮ Empirical deviance Information criterion (Bradley, Cressie <strong>and</strong> Shi<br />

(2013))<br />

3 / 10


Very pragmatic approach, aiming to improve predictive<br />

performance<br />

◮ These spatial predictors are <strong>of</strong>ten derived from different<br />

assumptions on <strong>the</strong> stochastic process, or may be deterministic<br />

◮ They can be regarded as approximations <strong>of</strong> <strong>the</strong> underlying<br />

r<strong>and</strong>om field<br />

◮ All <strong>the</strong>se methods sacrifice some important information in <strong>the</strong><br />

data in order to gain computational efficiency<br />

4 / 10


◮ <strong>Selecting</strong> a single predictor from some class <strong>of</strong> predictors does<br />

not take prediction uncertainty into account<br />

◮ Natural extension: instead <strong>of</strong> selecting a single predictor at<br />

each site, assign a weight to each predictor at each site <strong>and</strong><br />

consider a new hybrid local predictor given by <strong>the</strong> weighted<br />

average <strong>of</strong> all local predictors:<br />

˜Y =<br />

K<br />

ωj Y (j)<br />

j=1<br />

e.g. using Buckl<strong>and</strong> et al. (1997) measure <strong>of</strong> predictive<br />

performance, Ij:<br />

ωj =<br />

exp(−Ij/2)<br />

Kk=1 exp(−Ik/2) .<br />

5 / 10


Ano<strong>the</strong>r idea:<br />

◮ in a Bayesian framework: use strictly proper scoring rules to<br />

evaluate probabilistic predictions in <strong>the</strong> form <strong>of</strong> a predictive<br />

distribution.<br />

◮ Scoring rules provide a summary measure for evaluating a<br />

probabilistic prediction given <strong>the</strong> predictive distribution <strong>and</strong><br />

<strong>the</strong> observed outcome.<br />

◮ Can use different scoring rules for different purposes.<br />

◮ Several different scoring rules can be used to compare locally<br />

<strong>the</strong> ability <strong>of</strong> c<strong>and</strong>idate predictors.<br />

6 / 10


Example:<br />

◮ Z(·) is a discrete process , π (k) (s) = {π1(s),··· ,πJ (s)} is <strong>the</strong><br />

posterior predictive distribution <strong>of</strong> predictor k at s,<br />

k = 1,··· ,K<br />

◮ we can compare <strong>the</strong> predictors locally in terms <strong>of</strong> mean score<br />

S = <br />

s∈H val (ui)<br />

◮ Quadratic: S(z,π) = J j=1 π 2 j − 2πz<br />

◮ Spherical: S(z,π) = −<br />

πz<br />

( J j=1 π2 j )1/2<br />

◮ Logarithmic: S(z,π) = −lnπz<br />

S(z(s),π(s))<br />

|H val (ui)|<br />

(Dawid <strong>and</strong> Musio (2013); Finley, Banerjee <strong>and</strong> McRoberts (2009),<br />

Gneiting <strong>and</strong> Raftery (2007))<br />

7 / 10


Continuous case: consider a local scoring rule<br />

◮ log score<br />

S(z,π) = −lnπ(z)<br />

◮ Hyvärinen score (Parry et al. (2012))<br />

∇ = (∂/∂z j ), ∆ = k j=1 ∂ 2 /(∂z j ) 2<br />

S(z,π) = ∆lnπ(z) + 1<br />

2 |∇lnπ(z)|2 = ∆ π(z)<br />

π(z)<br />

you can compare <strong>the</strong> predictors locally in terms <strong>of</strong> mean score<br />

S = <br />

s∈H val (ui)<br />

S(z(s),π(s))<br />

|H val (ui)|<br />

8 / 10


The proposed method used only estimated means, not variances:<br />

LSVE (k) (ui,Z trn ,H val ) ∝ <br />

s∈H val (ui)<br />

(Z(s) − Y (k) (s,Z trn )) 2<br />

If we have an estimate v(s) <strong>of</strong> <strong>the</strong> variance <strong>of</strong> Z at each validation<br />

site s, we could use it to weight <strong>the</strong> terms:<br />

LSVE (k) (ui,Z trn ,H val ) ∝ <br />

s∈H val (ui)<br />

— but this is not a proper scoring rule!<br />

Instead, use<br />

<br />

s∈H val (ui)<br />

(Dawid <strong>and</strong> Sebastiani, 1999)<br />

which is!<br />

(Z(s) − Y (k) (s,Z trn )) 2<br />

v(s)<br />

(Z(s) − Y (k) (s,Ztrn )) 2<br />

+ lnv(s)<br />

v(s)<br />

9 / 10


Bibliography<br />

◮ Bradley, J. R., Cressie, N., <strong>and</strong> Shi, T. (2013). Local<br />

spatial-predictor selection. In Proceedings <strong>of</strong> <strong>the</strong> 2012 Joint<br />

Statistical Meetings, forthcoming. Alex<strong>and</strong>ria, VA: American<br />

Statistical Association<br />

◮ Buckl<strong>and</strong>, S.T., Burnhamn, K.P. <strong>and</strong> Augustin N.H. (1997). Model<br />

Selection: An Integral Part <strong>of</strong> Inference. Biometrics, 53, 603-618<br />

◮ Dawid, A. P.,Musio, M. (2013). Estimation <strong>of</strong> spatial processes<br />

using local scoring rules, AStA, 97, 2, 173-179<br />

◮ Dawid, A. P. <strong>and</strong> Sebastiani, P. (1999). Coherent dispersion criteria<br />

for optimal experimental design. Ann. Statist. 27, 65–81.<br />

◮ Finley A.O., Banerjee S. <strong>and</strong> McRoberts, R. (2009). Hierachical<br />

spatial models for predicting tree species assemblages across large<br />

domains. Annals <strong>of</strong> Applied Statistics, 3, 1052-1079<br />

◮ Gneiting, T., Raftery, A.E. (2007). Strictly proper scoring rules,<br />

prediction, <strong>and</strong> estimation. Journal <strong>of</strong> <strong>the</strong> American Statistical<br />

Association, 102, 359-378<br />

◮ Parry, M. F., Dawid, A. P., <strong>and</strong> Lauritzen, S. L. (2012). Proper<br />

local scoring rules. Annals <strong>of</strong> Statistics, 40, 561-92<br />

10 / 10

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!