29.06.2013 Views

Analysing spatial point patterns in R - Department of Statistical and ...

Analysing spatial point patterns in R - Department of Statistical and ...

Analysing spatial point patterns in R - Department of Statistical and ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Analys<strong>in</strong>g</strong> <strong>spatial</strong> <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> <strong>in</strong> R<br />

Adrian Baddeley<br />

CSIRO <strong>and</strong> University <strong>of</strong> Western Australia<br />

Adrian.Baddeley@csiro.au<br />

adrian@maths.uwa.edu.au<br />

Workshop Notes<br />

Version 3<br />

October 2008<br />

Copyright c○CSIRO 2008<br />

Abstract<br />

This is a detailed set <strong>of</strong> notes for a workshop on <strong>Analys<strong>in</strong>g</strong> <strong>spatial</strong> <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> that has<br />

been held several times <strong>in</strong> Australia <strong>and</strong> New Zeal<strong>and</strong> <strong>in</strong> 2006–2008.<br />

It covers statistical methods that are currently feasible <strong>in</strong> practice <strong>and</strong> available <strong>in</strong> public<br />

doma<strong>in</strong> s<strong>of</strong>tware. Some <strong>of</strong> these techniques are well established <strong>in</strong> the applications literature,<br />

while some are very recent developments.<br />

The workshop uses the statistical package R <strong>and</strong> is based on spatstat, an add-on library<br />

for R for the analysis <strong>of</strong> <strong>spatial</strong> data.<br />

Topics covered <strong>in</strong>clude: statistical formulation <strong>and</strong> methodological issues; data <strong>in</strong>put<br />

<strong>and</strong> h<strong>and</strong>l<strong>in</strong>g; R concepts such as classes <strong>and</strong> methods; nonparametric <strong>in</strong>tensity estimates;<br />

goodness-<strong>of</strong>-fit test<strong>in</strong>g for Complete Spatial R<strong>and</strong>omness; maximum likelihood <strong>in</strong>ference for<br />

Poisson processes; model validation for Poisson processes; distance methods <strong>and</strong> summary<br />

functions such as Ripley’s K function; non-Poisson <strong>po<strong>in</strong>t</strong> process models; simulation techniques;<br />

fitt<strong>in</strong>g models us<strong>in</strong>g summary statistics; Gibbs <strong>po<strong>in</strong>t</strong> process models; fitt<strong>in</strong>g Gibbs<br />

models; simulat<strong>in</strong>g Gibbs models; validat<strong>in</strong>g Gibbs models; multitype <strong>and</strong> marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>;<br />

exploratory analysis <strong>of</strong> marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>; multitype Poisson process models <strong>and</strong><br />

maximum likelihood <strong>in</strong>ference; multitype Gibbs process models <strong>and</strong> maximum pseudolikelihood;<br />

<strong>and</strong> l<strong>in</strong>e segment data.<br />

This version <strong>of</strong> the notes requires R version 2.7.0 or later, <strong>and</strong> spatstat version 1.14-5<br />

or later.<br />

Acknowledgements<br />

The author gratefully acknowledges countless comments <strong>and</strong> suggestions from workshop participants,<br />

<strong>and</strong> the support <strong>of</strong> CSIRO Mathematical <strong>and</strong> Information Sciences, The New<br />

Zeal<strong>and</strong> <strong>Statistical</strong> Association, The University <strong>of</strong> Waikato, The <strong>Statistical</strong> Society<br />

<strong>of</strong> Australia <strong>and</strong> The University <strong>of</strong> Western Australia.


2<br />

Copyright c○CSIRO Australia 2008<br />

All rights are reserved. Permission to reproduce <strong>in</strong>dividual copies <strong>of</strong> this document for<br />

personal use is granted. Redistribution <strong>in</strong> any other form is prohibited.<br />

The <strong>in</strong>formation conta<strong>in</strong>ed <strong>in</strong> this document is based on a number <strong>of</strong> technical, circumstantial<br />

or otherwise specified assumptions <strong>and</strong> parameters. The user must make its own analysis <strong>and</strong><br />

assessment <strong>of</strong> the suitability <strong>of</strong> the <strong>in</strong>formation or material conta<strong>in</strong>ed <strong>in</strong> or generated from this<br />

document. To the extent permitted by law, CSIRO excludes all liability to any party for any<br />

expenses, losses, damages <strong>and</strong> costs aris<strong>in</strong>g directly or <strong>in</strong>directly from us<strong>in</strong>g this document.<br />

Copyright c○CSIRO 2008


CONTENTS 3<br />

Contents<br />

PART I. OVERVIEW 5<br />

1 Introduction 6<br />

2 <strong>Statistical</strong> formulation 13<br />

3 The R system 17<br />

4 Introduction to spatstat 19<br />

PART II. DATA TYPES 26<br />

5 Objects, classes <strong>and</strong> methods <strong>in</strong> R 27<br />

6 Po<strong>in</strong>t <strong>patterns</strong> <strong>in</strong> spatstat 33<br />

7 W<strong>in</strong>dows <strong>in</strong> spatstat 39<br />

8 Manipulat<strong>in</strong>g <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> 46<br />

9 Pixel images <strong>in</strong> spatstat 54<br />

10 Tessellations 60<br />

PART III. INTENSITY AND RANDOMNESS 65<br />

11 Methods 1: Investigat<strong>in</strong>g <strong>in</strong>tensity 66<br />

12 Methods 2: Tests <strong>of</strong> Complete Spatial R<strong>and</strong>omness 72<br />

13 Methods 3: Maximum likelihood for Poisson processes 79<br />

14 Methods 4: check<strong>in</strong>g a fitted Poisson model 90<br />

PART IV. INTERACTION 97<br />

15 Simple models <strong>of</strong> non-Poisson <strong>patterns</strong> 98<br />

16 Methods 5: Distance methods for <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> 102<br />

17 Methods 6: simulation envelopes <strong>and</strong> goodness-<strong>of</strong>-fit tests 119<br />

18 Methods 7: model-fitt<strong>in</strong>g us<strong>in</strong>g summary statistics 125<br />

19 Methods 8: adjust<strong>in</strong>g for <strong>in</strong>homogeneity 128<br />

20 Gibbs models 132<br />

21 Methods 9: fitt<strong>in</strong>g Gibbs models 139<br />

22 Methods 10: validation <strong>of</strong> fitted Gibbs models 148<br />

Copyright c○CSIRO 2008


4 CONTENTS<br />

PART IV. MARKED POINT PATTERNS 155<br />

23 Marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> 156<br />

24 H<strong>and</strong>l<strong>in</strong>g marked <strong>po<strong>in</strong>t</strong> pattern data 160<br />

25 Methods 11: exploratory tools for marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> 165<br />

26 Methods 12: multitype Poisson models 179<br />

27 Methods 13: Gibbs models for multitype <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> 185<br />

28 L<strong>in</strong>e segment data 190<br />

29 Further <strong>in</strong>formation on spatstat 192<br />

Bibliography 193<br />

Index 195<br />

Copyright c○CSIRO 2008


CONTENTS 5<br />

PART I. OVERVIEW<br />

The first part <strong>of</strong> the workshop is a quick overview <strong>of</strong> <strong>spatial</strong> statistics for <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>, <strong>and</strong> a<br />

very quick <strong>in</strong>troduction to the s<strong>of</strong>tware.<br />

Copyright c○CSIRO 2008


6 Introduction<br />

1 Introduction<br />

1.1 Types <strong>of</strong> data<br />

1.1.1 Po<strong>in</strong>ts<br />

A <strong>po<strong>in</strong>t</strong> pattern dataset gives the locations <strong>of</strong> objects/events occurr<strong>in</strong>g <strong>in</strong> a study region.<br />

The <strong>po<strong>in</strong>t</strong>s could represent trees, animal nests, earthquake epicentres, petty crimes, domiciles<br />

<strong>of</strong> new cases <strong>of</strong> <strong>in</strong>fluenza, galaxies, etc.<br />

The <strong>po<strong>in</strong>t</strong>s might be situated <strong>in</strong> a region <strong>of</strong> the two-dimensional (2D) plane, or on the Earth’s<br />

surface, or a 3D volume, etc. They could be <strong>po<strong>in</strong>t</strong>s <strong>in</strong> space-time (e.g. earthquake epicentre<br />

location <strong>and</strong> time). The s<strong>of</strong>tware presented here is only applicable to 2D <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> (but<br />

we’re work<strong>in</strong>g on it).<br />

1.1.2 Marks<br />

The <strong>po<strong>in</strong>t</strong>s may have extra <strong>in</strong>formation called marks attached to them. The mark represents an<br />

“attribute” <strong>of</strong> the <strong>po<strong>in</strong>t</strong>. The mark variable could be categorical, e.g. species or disease status:<br />

<strong>of</strong>f<br />

on<br />

The mark variable could be cont<strong>in</strong>uous, e.g. tree diameter:<br />

Copyright c○CSIRO 2008


1.1 Types <strong>of</strong> data 7<br />

The mark could be multivariate, or even more complicated.<br />

1.1.3 Covariates<br />

Our dataset may also <strong>in</strong>clude covariates — any data that we treat as explanatory, rather than<br />

as part <strong>of</strong> the ‘response’.<br />

Covariate data may be a <strong>spatial</strong> function Z(u) def<strong>in</strong>ed at all <strong>spatial</strong> locations u, e.g. altitude,<br />

soil pH, displayed as a pixel image or a contour plot:<br />

120 130 140 150 160<br />

Covariate data may be another <strong>spatial</strong> pattern such as another <strong>po<strong>in</strong>t</strong> pattern, or a l<strong>in</strong>e<br />

segment pattern, e.g. a map <strong>of</strong> geological faults:<br />

140<br />

130<br />

125<br />

145<br />

145<br />

135<br />

140<br />

150<br />

130<br />

135<br />

150<br />

155<br />

140<br />

Copyright c○CSIRO 2008<br />

135<br />

130<br />

125<br />

130<br />

130


8 Introduction<br />

1.2 Typical scientific questions<br />

1.2.1 Intensity<br />

‘Intensity’ is the average density <strong>of</strong> <strong>po<strong>in</strong>t</strong>s (expected number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s per unit area). Intensity<br />

may be constant (‘uniform’) or may vary from location to location (‘non-uniform’ or ‘<strong>in</strong>homogeneous’).<br />

1.2.2 Interaction<br />

uniform<br />

<strong>in</strong>homogeneous<br />

‘Inter<strong>po<strong>in</strong>t</strong> <strong>in</strong>teraction’ is stochastic dependence between the <strong>po<strong>in</strong>t</strong>s <strong>in</strong> a <strong>po<strong>in</strong>t</strong> pattern. Usually<br />

we expect dependence to be strongest between <strong>po<strong>in</strong>t</strong>s that are close to one another.<br />

<strong>in</strong>dependent regular clustered<br />

Example 1 (Japanese p<strong>in</strong>es) Locations <strong>of</strong> 65 sapl<strong>in</strong>gs <strong>of</strong> Japanese p<strong>in</strong>e <strong>in</strong> a 5.7 × 5.7 metre<br />

square sampl<strong>in</strong>g region <strong>in</strong> a natural st<strong>and</strong>.<br />

Ma<strong>in</strong> question: is the spac<strong>in</strong>g between sapl<strong>in</strong>gs greater than would be expected for a r<strong>and</strong>om<br />

pattern? (reflect<strong>in</strong>g competition for resources)<br />

Copyright c○CSIRO 2008


1.2 Typical scientific questions 9<br />

1.2.3 Covariate effects<br />

Japanese P<strong>in</strong>es<br />

For a <strong>po<strong>in</strong>t</strong> pattern dataset with covariate data, we typically<br />

• <strong>in</strong>vestigate whether the <strong>in</strong>tensity depends on the covariates<br />

• allow for covariate effects on <strong>in</strong>tensity before study<strong>in</strong>g <strong>in</strong>teraction between <strong>po<strong>in</strong>t</strong>s<br />

Example 2 (Tropical ra<strong>in</strong>forest data) Locations <strong>of</strong> 3605 trees <strong>in</strong> a tropical ra<strong>in</strong>forest, with<br />

supplementary grid map <strong>of</strong> elevation (altitude).<br />

Ma<strong>in</strong> questions: (1) does tree density depend on slope? (2) after account<strong>in</strong>g for variation <strong>in</strong><br />

tree density due to slope, is there evidence <strong>of</strong> cluster<strong>in</strong>g <strong>of</strong> trees?<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ ++ +<br />

+ +<br />

++<br />

+ + + +<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+ ++ +<br />

+ +<br />

+<br />

+ + + +<br />

++<br />

+<br />

+<br />

+ + +<br />

+ ++ + +<br />

+ + +<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

+ +<br />

+<br />

+ +<br />

+ + +<br />

+ +<br />

+ +<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+ + + +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+++ +<br />

+<br />

++<br />

+<br />

+<br />

++<br />

+ + ++<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

++<br />

+<br />

+ +<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

++<br />

++ +<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+ +++<br />

+ + +<br />

+<br />

+ +<br />

+++<br />

+<br />

++<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+++<br />

+<br />

+<br />

+<br />

+ ++<br />

++<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+ +<br />

+<br />

++ + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ ++<br />

++ +<br />

++<br />

+<br />

+ + ++ +<br />

+<br />

+<br />

++<br />

+ ++ +<br />

+ + ++<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+ +<br />

+<br />

+<br />

++<br />

+<br />

+<br />

++<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+ +<br />

+<br />

+<br />

++ +++<br />

++++<br />

+<br />

+<br />

++<br />

+<br />

++<br />

+<br />

++<br />

++<br />

+ +<br />

+<br />

+<br />

+<br />

++++<br />

+<br />

+<br />

+ +<br />

+++<br />

+<br />

+ ++<br />

++<br />

+<br />

+++<br />

++<br />

+<br />

+<br />

+ +<br />

+<br />

++<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+ + ++ ++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+ +<br />

+<br />

+<br />

+<br />

+ ++<br />

+ +<br />

++<br />

++<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+ + +<br />

+++ + + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

++<br />

++<br />

+ ++ ++++<br />

++++ +<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

++<br />

+<br />

+ +++ + +<br />

+ +<br />

+ +<br />

+<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

+ +<br />

+ ++ +<br />

+<br />

++ +<br />

++<br />

+<br />

+<br />

+<br />

+<br />

++ ++ +<br />

+<br />

++ +<br />

++ +<br />

+ +<br />

+<br />

+<br />

+++<br />

+++ +<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+ +<br />

++<br />

+ + + +<br />

+<br />

++<br />

+<br />

+<br />

+<br />

++<br />

+<br />

++<br />

+ +<br />

+<br />

+<br />

+<br />

+ ++<br />

++ +<br />

+++<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++<br />

++ +<br />

++<br />

+<br />

++<br />

+++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

++<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

++<br />

+ +<br />

+++<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++ +<br />

+ +<br />

+ +++<br />

+<br />

+<br />

+++<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

++<br />

+ +++<br />

+<br />

+ ++ + + ++<br />

+<br />

+<br />

+++<br />

++<br />

+ +<br />

+<br />

+ ++ +<br />

+ ++<br />

++ + +<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+ + ++<br />

++++<br />

+<br />

+ + ++<br />

+<br />

++++ +<br />

+<br />

+ + +<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+ ++ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ ++<br />

++ +<br />

++<br />

+<br />

+ + ++<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

++<br />

++<br />

++<br />

++<br />

+ +<br />

+<br />

++ + ++<br />

+<br />

+<br />

+ ++<br />

+ +<br />

+<br />

+<br />

++<br />

+++<br />

+ + + + +<br />

+<br />

++ + ++++<br />

+++<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

++<br />

++<br />

++ +<br />

+<br />

+++<br />

+<br />

++ +<br />

+<br />

+<br />

+ +<br />

++<br />

+<br />

+ ++<br />

+ + +<br />

+ ++<br />

+<br />

+++ +<br />

+++ ++ + +<br />

+ + +++ ++ +<br />

+<br />

+ +<br />

+<br />

++ +<br />

+<br />

+<br />

+ + + ++<br />

+<br />

++<br />

+<br />

++<br />

+ +++<br />

+ ++<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

++<br />

+++<br />

+ ++<br />

++<br />

+<br />

+++<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ ++<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +++<br />

++<br />

+ +<br />

++<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

++<br />

++ +<br />

+<br />

++ +<br />

+<br />

+<br />

+ +++<br />

++<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+ +<br />

++<br />

+ +++<br />

++<br />

+<br />

++++<br />

+<br />

+<br />

++<br />

+<br />

++<br />

+<br />

++<br />

+ +<br />

+<br />

++<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

++ +<br />

+ ++<br />

+ +++<br />

++<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+++<br />

+<br />

+ ++<br />

+++<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+<br />

+ +<br />

+++<br />

+<br />

++<br />

++ ++<br />

++<br />

+<br />

+ + + + ++<br />

+<br />

+<br />

+<br />

+<br />

+++<br />

+<br />

+<br />

++ +<br />

+<br />

++ +<br />

+<br />

++<br />

+<br />

+<br />

+<br />

++<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

++<br />

++<br />

+<br />

+ ++<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++<br />

+ ++++ +<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

++<br />

+<br />

+ + ++++<br />

+++<br />

+ +<br />

+++ ++<br />

+ +<br />

+<br />

++<br />

+<br />

++++<br />

+<br />

+<br />

++ +<br />

+<br />

++<br />

++<br />

+ +<br />

+<br />

+ ++<br />

+<br />

+++<br />

++<br />

+++<br />

+<br />

+<br />

+ + + ++ +<br />

+ + ++<br />

+ + ++<br />

++<br />

+<br />

+<br />

+<br />

+ ++ +<br />

+<br />

+<br />

+<br />

+ + +<br />

+ +<br />

+++<br />

+<br />

++++<br />

+ +<br />

+<br />

+ + ++<br />

+<br />

+ +<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+++<br />

++<br />

+ ++ +<br />

++<br />

+<br />

+++ + +<br />

+<br />

+++<br />

+<br />

++++ + + + ++<br />

+ ++<br />

+ + ++ + +<br />

+<br />

++<br />

++ +<br />

+<br />

+<br />

+ ++<br />

+<br />

+ + + + +<br />

+ + + + ++<br />

+ +<br />

++<br />

+++<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ ++<br />

++<br />

+<br />

+ +<br />

+<br />

++<br />

++ +<br />

+ + +<br />

++ +<br />

+ ++ +<br />

+ +<br />

+ +<br />

+ +++<br />

++ + +<br />

+ + + ++<br />

+<br />

++<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

++<br />

+<br />

+ +<br />

+<br />

+ +<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

++<br />

++ +<br />

+<br />

+ ++ +++<br />

++<br />

+<br />

+ ++ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

++ +<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+<br />

+ + ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+++ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+++<br />

+<br />

+<br />

+<br />

+ + +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ ++<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

++ +++<br />

++<br />

+ ++<br />

+ + +<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+++<br />

+ + +<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ ++<br />

+<br />

+ + ++ ++<br />

+<br />

+<br />

+ +<br />

++ + + +<br />

+ +<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+++<br />

+<br />

+ ++<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

++ +<br />

+++ + +++<br />

+ + +<br />

++ +<br />

+ +<br />

+ +<br />

++++<br />

+<br />

+ ++ ++<br />

+<br />

+<br />

++ +<br />

+ +<br />

+<br />

+++<br />

+ + +<br />

+ +<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+ +<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+++ +<br />

++<br />

+ +<br />

++<br />

+ +<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+ +<br />

++ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

++<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ ++++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ ++ +++<br />

+<br />

+<br />

+<br />

+ ++<br />

++<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++++<br />

+++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+ ++ +<br />

+<br />

+ +<br />

+<br />

+ ++<br />

+<br />

++ +<br />

+<br />

+<br />

++<br />

+ +<br />

+<br />

+<br />

++ +<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + + +<br />

++ +<br />

++<br />

++<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

++<br />

+ +<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

++<br />

+<br />

+ + + +<br />

+ + ++<br />

+<br />

++ ++<br />

+<br />

+<br />

+ +<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

++ +<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + +++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ ++ +<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+ + +<br />

+<br />

+<br />

+ +<br />

+ +<br />

+ + + ++<br />

+<br />

+ ++<br />

+<br />

+ + + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

++<br />

+<br />

+ +<br />

+<br />

++<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+ + ++<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+ + + + + +<br />

+<br />

++ +<br />

++<br />

+<br />

+<br />

++ +<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

Example 3 (Queensl<strong>and</strong> copper data) A <strong>in</strong>tensive m<strong>in</strong>eralogical survey yields a map <strong>of</strong><br />

copper deposits (essentially <strong>po<strong>in</strong>t</strong>like at this scale) <strong>and</strong> geological faults (straight l<strong>in</strong>es). The<br />

faults can easily be observed from satellites, but the copper deposits are hard to f<strong>in</strong>d. The ma<strong>in</strong><br />

question is whether the faults are ‘predictive’ for copper deposits (e.g. copper less/more likely to<br />

be found near faults).<br />

120 130 140 150 160<br />

Copyright c○CSIRO 2008


10 Introduction<br />

Example 4 (Chorley-Ribble data) An apparent cluster <strong>of</strong> cases <strong>of</strong> cancer <strong>of</strong> the larynx occurred<br />

near a disused <strong>in</strong>dustrial <strong>in</strong>c<strong>in</strong>erator. The area health authority mapped the domicile<br />

locations <strong>of</strong> all cases (58) <strong>of</strong> cancer <strong>of</strong> the larynx <strong>and</strong>, for control purposes, a r<strong>and</strong>om sample <strong>of</strong><br />

cases (978) <strong>of</strong> lung cancer.<br />

Ma<strong>in</strong> question: after allow<strong>in</strong>g for <strong>spatial</strong> variation <strong>in</strong> density <strong>of</strong> the susceptible population<br />

(for which the lung cancer cases are a surrogate), is there evidence <strong>of</strong> raised <strong>in</strong>cidence <strong>of</strong> laryngeal<br />

cancer near the <strong>in</strong>c<strong>in</strong>erator?<br />

larynx<br />

lung<br />

<strong>in</strong>c<strong>in</strong>erator<br />

Chorley−Ribble Data<br />

1.2.4 Segregation <strong>of</strong> <strong>po<strong>in</strong>t</strong>s with different marks<br />

In a marked <strong>po<strong>in</strong>t</strong> pattern, we need to <strong>in</strong>vestigate whether <strong>po<strong>in</strong>t</strong>s with different mark values are<br />

‘segregated’ (found <strong>in</strong> different parts <strong>of</strong> the study region).<br />

Example 5 (Lans<strong>in</strong>g Woods) In a 20-acre study region <strong>in</strong> Lans<strong>in</strong>g Woods, Michigan, the<br />

locations <strong>of</strong> 2251 trees <strong>and</strong> the botanical classification <strong>of</strong> each tree were recorded.<br />

Ma<strong>in</strong> question: is the study region divided <strong>in</strong>to doma<strong>in</strong>s where a s<strong>in</strong>gle tree species dom<strong>in</strong>ates,<br />

or are the different species r<strong>and</strong>omly <strong>in</strong>terspersed?<br />

Copyright c○CSIRO 2008


1.2 Typical scientific questions 11<br />

blackoak<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+++<br />

+<br />

++<br />

+ +<br />

+<br />

++ +<br />

+ ++ +<br />

+<br />

++<br />

+<br />

+ + + + +<br />

+<br />

+ +<br />

+<br />

++ +<br />

++ + +<br />

+ + +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ + ++<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

++ +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ ++<br />

+ ++++++<br />

+++ +<br />

+<br />

+<br />

+<br />

+ + +<br />

+ +<br />

+<br />

+ ++ + +<br />

+ ++<br />

+<br />

+<br />

+ + +<br />

+<br />

hickory<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

++<br />

+ + +<br />

++ +<br />

+<br />

+ + +<br />

+ +<br />

+<br />

+ +<br />

+ + +<br />

+ +<br />

++ +<br />

++++<br />

+ +++ +<br />

+ ++<br />

+ +<br />

+<br />

+ + +<br />

+ ++<br />

+++ + ++<br />

++ +<br />

+ +<br />

+ ++<br />

++ ++ +<br />

++<br />

+ ++ +<br />

+ +<br />

++<br />

++<br />

+ +<br />

+ ++<br />

+ +++ + +++<br />

+<br />

+ + ++<br />

++<br />

++ ++ + ++<br />

+ +<br />

+ + +++<br />

+ +<br />

+ +<br />

+<br />

+<br />

+ +<br />

+ +<br />

+ + +<br />

+ +<br />

+<br />

+ +<br />

+ ++<br />

+ +++ +<br />

+<br />

++<br />

+ +<br />

++<br />

+ ++<br />

+ +<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ +<br />

+ + + +++<br />

+ +<br />

+<br />

++ +<br />

+ + +<br />

+ ++ +<br />

+<br />

+ +<br />

+ +<br />

+<br />

+ + + ++<br />

++<br />

+ ++ +<br />

++<br />

+<br />

+ + + +<br />

+<br />

+ +++ + +<br />

+ +<br />

+<br />

+<br />

+<br />

+ ++<br />

++ +<br />

+<br />

+ +<br />

+<br />

+ + +<br />

++<br />

+ +<br />

+<br />

+ + +<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+<br />

++ +<br />

++ ++<br />

+ +<br />

++ + +<br />

+ ++<br />

+ + +<br />

+<br />

+<br />

+ +<br />

+<br />

++<br />

+ + ++++ + +<br />

++<br />

+<br />

++ +<br />

++<br />

+<br />

+<br />

+++<br />

++<br />

+<br />

+<br />

+ ++<br />

+ ++<br />

+<br />

++ +<br />

+ +<br />

+ + + ++<br />

++ +<br />

+<br />

+ +<br />

+ +<br />

+++<br />

+<br />

+<br />

++ + ++ +<br />

+<br />

+<br />

+<br />

+ + + +<br />

+<br />

+ +<br />

+<br />

++ +<br />

+ +<br />

++<br />

+<br />

+<br />

+ ++<br />

++<br />

+<br />

+ +<br />

+ +++<br />

+ +<br />

+<br />

++<br />

++<br />

+<br />

++ +<br />

+ +<br />

+<br />

+ +<br />

+<br />

+ + +<br />

+ + +<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+ ++<br />

+<br />

+<br />

++<br />

++<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

++ + +++ +++<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ ++<br />

+ +<br />

+<br />

+<br />

++ + +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

++<br />

++ + ++ +<br />

+<br />

+ ++ +<br />

+<br />

++ +<br />

+<br />

+ +<br />

+ + + +<br />

+ ++ +<br />

++ +++ + +<br />

++ ++<br />

++ + + + + + ++ +<br />

+<br />

+<br />

+<br />

++ + + +<br />

+ +<br />

maple<br />

+<br />

+ +<br />

+ ++<br />

+<br />

+ +<br />

+ +<br />

+ ++ + +<br />

+ + +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ + +<br />

++ + ++ ++<br />

+ +<br />

+<br />

+ +<br />

+<br />

+ + + +<br />

+<br />

+ +<br />

+ +<br />

+<br />

++ + +<br />

+ ++ + +<br />

+<br />

+<br />

+ +<br />

+ + ++++ +<br />

+ +<br />

++<br />

+ +<br />

+ + +<br />

+<br />

+<br />

+ +<br />

++ + + +<br />

++<br />

+ ++ +<br />

+<br />

++<br />

+<br />

++ + + ++ ++ + + + ++<br />

+ +<br />

+<br />

+ +<br />

+<br />

++<br />

+<br />

+ ++<br />

+ +<br />

+ ++<br />

+ ++ +<br />

+ + +<br />

+ + ++<br />

+<br />

+<br />

+ + +<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+ +<br />

+ + +<br />

+ ++<br />

++<br />

+ +<br />

++<br />

+<br />

+ +<br />

+ + +<br />

++ +<br />

++<br />

++ +++ + + + + + + + ++++<br />

+ +<br />

+<br />

+<br />

+ +<br />

+ + +<br />

+<br />

++<br />

+<br />

+ +<br />

++<br />

+ ++<br />

+<br />

++ + ++<br />

+<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+ +<br />

++++<br />

+<br />

++<br />

++<br />

+ ++ + + ++<br />

+<br />

+++ + ++ +<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ + + +<br />

+<br />

+<br />

+<br />

++++<br />

+ ++<br />

+<br />

+ ++ +<br />

+++<br />

+ ++ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ + + + +<br />

+<br />

+<br />

++<br />

+<br />

+ +<br />

+<br />

+ +<br />

+ +<br />

+++ +<br />

++ +++ +<br />

+++<br />

+ +<br />

+ +<br />

+ + +<br />

+<br />

+<br />

++ +<br />

+ +<br />

+<br />

+<br />

+<br />

++<br />

+ +++<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

Example 6 (Longleaf P<strong>in</strong>es) In a forest <strong>of</strong> Longleaf P<strong>in</strong>e trees <strong>in</strong> Georgia, USA, the locations<br />

<strong>of</strong> 584 trees were recorded along with their diameter at breast height (dbh), a convenient surrogate<br />

measure <strong>of</strong> size <strong>and</strong> age.<br />

Ma<strong>in</strong> question: expla<strong>in</strong> any <strong>spatial</strong> variation <strong>in</strong> the density <strong>and</strong> age <strong>of</strong> trees.<br />

Longleaf P<strong>in</strong>es<br />

1.2.5 Dependence between <strong>po<strong>in</strong>t</strong>s <strong>of</strong> different types<br />

In a <strong>po<strong>in</strong>t</strong> pattern dataset with categorical marks, (aka multitype <strong>po<strong>in</strong>t</strong> pattern), dependence<br />

between the different types may be formulated either as<br />

• <strong>in</strong>teraction between the sub-pattern <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>of</strong> type i <strong>and</strong> the sub-pattern <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>of</strong><br />

type j; or<br />

• dependence between the mark values <strong>of</strong> <strong>po<strong>in</strong>t</strong>s at two specified locations.<br />

Example 7 (Amacr<strong>in</strong>e cells) The ret<strong>in</strong>a is a flat sheet conta<strong>in</strong><strong>in</strong>g several layers <strong>of</strong> cells.<br />

Amacr<strong>in</strong>e cells occupy two adjacent layers, the ‘on’ <strong>and</strong> ‘<strong>of</strong>f’ layers. In a microscope field <strong>of</strong><br />

view, the locations <strong>of</strong> all amacr<strong>in</strong>e cells were mapped, <strong>and</strong> classified <strong>in</strong>to ‘on’ <strong>and</strong> ‘<strong>of</strong>f’.<br />

Ma<strong>in</strong> question: is there evidence that the ‘on’ <strong>and</strong> ‘<strong>of</strong>f’ layers grew <strong>in</strong>dependently <strong>of</strong> one<br />

another?<br />

Copyright c○CSIRO 2008


12 Introduction<br />

<strong>of</strong>f<br />

on<br />

amacr<strong>in</strong>e<br />

Example 8 (Ants’ nests) The nests <strong>of</strong> two species <strong>of</strong> ants <strong>in</strong> a plot <strong>in</strong> Greece were mapped.<br />

Auxiliary <strong>in</strong>formation records a field/scrub boundary, <strong>and</strong> the position <strong>of</strong> a walk<strong>in</strong>g track.<br />

Ma<strong>in</strong> question: does species A <strong>in</strong>tentionally place its nests close to species B?<br />

1.3 Overview <strong>of</strong> statistical methods<br />

A<br />

B<br />

ants<br />

scrub<br />

<strong>Statistical</strong> methods for <strong>spatial</strong> <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> have a quirky history, <strong>and</strong> have not yet coalesced<br />

<strong>in</strong>to a mature statistical methodology. They <strong>in</strong>clude<br />

• summary statistics: the applied literature is dom<strong>in</strong>ated by ad hoc methods based on<br />

evaluat<strong>in</strong>g a summary statistic (e.g. average distance from a <strong>po<strong>in</strong>t</strong> to its nearest neighbour)<br />

with very little statistical theory to support them.<br />

• comparison to Poisson process: <strong>in</strong> the applied literature, hypothesis tests are <strong>in</strong>voked<br />

chiefly to decide whether the <strong>po<strong>in</strong>t</strong> pattern is ‘completely r<strong>and</strong>om’ (a uniform Poisson <strong>po<strong>in</strong>t</strong><br />

process) whether or not this is scientifically relevant. Lots <strong>of</strong> misunderst<strong>and</strong><strong>in</strong>gs prevail.<br />

• modell<strong>in</strong>g: only <strong>in</strong> the last decade has it f<strong>in</strong>ally become possible to formulate <strong>and</strong> fit<br />

realistic models to <strong>po<strong>in</strong>t</strong> pattern data. There’s still a lot <strong>of</strong> work to be done e.g. <strong>in</strong><br />

algorithms, model choice, goodness-<strong>of</strong>-fit.<br />

We’ll cover both classical <strong>and</strong> modern methods. Useful textbooks <strong>in</strong>clude [17, 19, 23, 31, 46,<br />

37]. An important recent survey is [38].<br />

Copyright c○CSIRO 2008<br />

field


2 <strong>Statistical</strong> formulation<br />

2.1 Po<strong>in</strong>t processes<br />

In this workshop, the observed <strong>po<strong>in</strong>t</strong> pattern x will be treated as a realisation <strong>of</strong> a r<strong>and</strong>om<br />

<strong>po<strong>in</strong>t</strong> process X <strong>in</strong> two-dimensional space. A <strong>po<strong>in</strong>t</strong> process is simply a r<strong>and</strong>om set <strong>of</strong> <strong>po<strong>in</strong>t</strong>s;<br />

the number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s is r<strong>and</strong>om, as well as the locations <strong>of</strong> the <strong>po<strong>in</strong>t</strong>s. Our goal is usually to<br />

estimate parameters <strong>of</strong> the distribution <strong>of</strong> X.<br />

2.2 Should I treat the data as a <strong>po<strong>in</strong>t</strong> process?<br />

Treat<strong>in</strong>g the <strong>po<strong>in</strong>t</strong> pattern as a <strong>po<strong>in</strong>t</strong> process effectively assumes that the pattern is r<strong>and</strong>om<br />

(the locations <strong>of</strong> the <strong>po<strong>in</strong>t</strong>s, <strong>and</strong> the number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s, are r<strong>and</strong>om) <strong>and</strong> that the pattern is<br />

the observation or ‘response’ <strong>of</strong> <strong>in</strong>terest. A realisation <strong>of</strong> a <strong>po<strong>in</strong>t</strong> process is an unordered set <strong>of</strong><br />

<strong>po<strong>in</strong>t</strong>s, so the <strong>po<strong>in</strong>t</strong>s do not have a serial order (unless there are marks attached).<br />

Example 9 A silicon wafer is <strong>in</strong>spected for defects <strong>in</strong> the crystal surface, <strong>and</strong> the locations <strong>of</strong><br />

all defects are recorded.<br />

This can be analysed as a <strong>po<strong>in</strong>t</strong> process <strong>in</strong> two dimensions, assum<strong>in</strong>g the defects are <strong>po<strong>in</strong>t</strong>like.<br />

We’re <strong>in</strong>terested <strong>in</strong> the <strong>in</strong>tensity <strong>of</strong> defects, spac<strong>in</strong>g between defects, etc.<br />

Example 10 Earthquake aftershocks <strong>in</strong> Japan are detected <strong>and</strong> their latitude, longitude <strong>and</strong><br />

time <strong>of</strong> occurrence are recorded.<br />

This can be analysed as a <strong>po<strong>in</strong>t</strong> process <strong>in</strong> space-time (where space is the two-dimensional<br />

plane or the Earth’s surface). If the occurrence times are ignored, it becomes a <strong>spatial</strong> <strong>po<strong>in</strong>t</strong><br />

process.<br />

Example 11 The locations <strong>of</strong> petty crimes that occurred <strong>in</strong> the past week are plotted on a street<br />

map <strong>of</strong> Chicago.<br />

This can be analysed as a <strong>po<strong>in</strong>t</strong> process. We’re <strong>in</strong>terested <strong>in</strong> the <strong>in</strong>tensity (propensity for<br />

crimes to occur), any <strong>spatial</strong> variation <strong>in</strong> <strong>in</strong>tensity, clusters <strong>of</strong> crimes, etc. One issue here is<br />

whether the recorded crime locations can be anywhere <strong>in</strong> two dimensional space, or whether<br />

they are actually restricted to locations on the streets (mak<strong>in</strong>g them a <strong>po<strong>in</strong>t</strong> process on a 1dimensional<br />

network).<br />

Example 12 A tiger shark is captured, tagged with a satellite transmitter, <strong>and</strong> released. Over<br />

the next month its location is reported daily. These <strong>po<strong>in</strong>t</strong>s are plotted on a map.<br />

It is probably not appropriate to analyse these data as a <strong>spatial</strong> <strong>po<strong>in</strong>t</strong> process. At the very<br />

least, the time <strong>of</strong> each observation should be <strong>in</strong>cluded. They could be treated as a space-time<br />

<strong>po<strong>in</strong>t</strong> process, except that it’s a strange process, as it consists <strong>of</strong> exactly one <strong>po<strong>in</strong>t</strong> at each <strong>in</strong>stant<br />

<strong>of</strong> time. These data should really be treated as a sparse sample <strong>of</strong> a cont<strong>in</strong>uous trajectory, <strong>and</strong><br />

analysed us<strong>in</strong>g other methods [which, alas, are fairly underdeveloped.] See the R package trip.<br />

Example 13 A herd <strong>of</strong> deer is photographed from the air at noon each day for 10 days. Each<br />

photograph is processed to produce a <strong>po<strong>in</strong>t</strong> pattern <strong>of</strong> <strong>in</strong>dividual deer locations on a map.<br />

13<br />

Copyright c○CSIRO 2008


14 <strong>Statistical</strong> formulation<br />

Each day produces a <strong>po<strong>in</strong>t</strong> pattern that could be analysed as a realisation <strong>of</strong> a <strong>po<strong>in</strong>t</strong> process.<br />

However, the observations on successive days are dependent (e.g. constant herd size, systematic<br />

forag<strong>in</strong>g behaviour). Assum<strong>in</strong>g <strong>in</strong>dividual deer cannot be identified from day to day, this is<br />

effectively a ‘repeated measures’ dataset where each response is a <strong>po<strong>in</strong>t</strong> pattern. Methods for<br />

this problem are <strong>in</strong> their <strong>in</strong>fancy.<br />

Example 14 In a designed controlled experiment, silicon wafers are produced under various<br />

conditions. Each wafer is <strong>in</strong>spected for defects <strong>in</strong> the crystal surface, <strong>and</strong> the locations <strong>of</strong> all<br />

defects are recorded as a <strong>po<strong>in</strong>t</strong> pattern.<br />

This is a designed experiment <strong>in</strong> which the response is a <strong>po<strong>in</strong>t</strong> pattern. Methods for this<br />

problem are <strong>in</strong> their <strong>in</strong>fancy. There are some methods for replicated <strong>spatial</strong> <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

[9, 13, 24, 25, 29] that apply when each experimental group conta<strong>in</strong>s several <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>.<br />

Example 15 The <strong>po<strong>in</strong>t</strong>s are not the orig<strong>in</strong>al data, but were obta<strong>in</strong>ed after process<strong>in</strong>g the data.<br />

For example,<br />

• the orig<strong>in</strong>al dataset is a pattern <strong>of</strong> small blobs, <strong>and</strong> the <strong>po<strong>in</strong>t</strong>s are the blob centres;<br />

• the orig<strong>in</strong>al dataset is a collection <strong>of</strong> l<strong>in</strong>e segments, <strong>and</strong> the <strong>po<strong>in</strong>t</strong>s are the end<strong>po<strong>in</strong>t</strong>s,<br />

cross<strong>in</strong>g <strong>po<strong>in</strong>t</strong>s, mid<strong>po<strong>in</strong>t</strong>s etc;<br />

• the orig<strong>in</strong>al dataset is a space-fill<strong>in</strong>g tessellation <strong>of</strong> biological cells, <strong>and</strong> the <strong>po<strong>in</strong>t</strong>s are the<br />

centres <strong>of</strong> the cells.<br />

This is a grey area. Po<strong>in</strong>t process methodology can be applied, <strong>and</strong> may be more powerful<br />

or more flexible than exist<strong>in</strong>g methodology for the unprocessed data. However the orig<strong>in</strong> <strong>of</strong> the<br />

<strong>po<strong>in</strong>t</strong> pattern may lead to artefacts (for example the centres <strong>of</strong> biological cells never lie very close<br />

together, because cells have nonzero size) which must be taken <strong>in</strong>to account <strong>in</strong> the analysis.<br />

2.3 Assumptions about the data<br />

The “st<strong>and</strong>ard model” assumes that the <strong>po<strong>in</strong>t</strong> process X extends throughout 2-D space, but<br />

is observed only <strong>in</strong>side a region W, the “sampl<strong>in</strong>g w<strong>in</strong>dow”. Our data consist <strong>of</strong> an unordered<br />

set<br />

x = {x1,... ,xn}, xi ∈ W, n ≥ 0<br />

<strong>of</strong> <strong>po<strong>in</strong>t</strong>s xi <strong>in</strong> W. The w<strong>in</strong>dow W is fixed <strong>and</strong> known. Usually our goal is <strong>in</strong>ference about<br />

parameters <strong>of</strong> X.<br />

Copyright c○CSIRO 2008


2.4 Marks <strong>and</strong> covariates 15<br />

Data are <strong>of</strong>ten supplied without <strong>in</strong>formation about the sampl<strong>in</strong>g w<strong>in</strong>dow W. It is important<br />

to know the w<strong>in</strong>dow W, s<strong>in</strong>ce we need to know where <strong>po<strong>in</strong>t</strong>s were not observed. Even<br />

someth<strong>in</strong>g as simple as estimat<strong>in</strong>g the density <strong>of</strong> <strong>po<strong>in</strong>t</strong>s depends on the w<strong>in</strong>dow. It would be<br />

wrong, or at least different, to analyze a <strong>po<strong>in</strong>t</strong> pattern dataset by “guess<strong>in</strong>g” the appropriate<br />

w<strong>in</strong>dow. An analogy may be drawn with the difference between sequential experiments <strong>and</strong><br />

experiments <strong>in</strong> which the sample size is fixed a priori.<br />

For the same reason, it is not sufficient to observe the values <strong>of</strong> covariates at the data <strong>po<strong>in</strong>t</strong>s<br />

only. In order to <strong>in</strong>vestigate the dependence <strong>of</strong> the <strong>po<strong>in</strong>t</strong> process on the covariate, we need to<br />

have at least some observations <strong>of</strong> the covariate at other (“non-data”) locations.<br />

It’s implicitly assumed that all <strong>po<strong>in</strong>t</strong>s <strong>of</strong> X with<strong>in</strong> W have been mapped without omission.<br />

Most models we use will assume that r<strong>and</strong>om <strong>po<strong>in</strong>t</strong>s could have been observed at any location<br />

<strong>in</strong> the w<strong>in</strong>dow W, without further constra<strong>in</strong>t. (Examples where this does not apply: GPS<br />

locations <strong>of</strong> cars will usually lie along roads; certa<strong>in</strong> cells lie only <strong>in</strong>side certa<strong>in</strong> tissues).<br />

When th<strong>in</strong>k<strong>in</strong>g about methodological issues it’s <strong>of</strong>ten useful to th<strong>in</strong>k about the discretised<br />

version <strong>of</strong> a <strong>po<strong>in</strong>t</strong> process. Suppose the w<strong>in</strong>dow W is chopped <strong>in</strong>to <strong>in</strong>f<strong>in</strong>itely many ‘pixels’.<br />

Each pixel is assigned the value I = 1 if it conta<strong>in</strong>s a <strong>po<strong>in</strong>t</strong> <strong>of</strong> X, <strong>and</strong> I = 0 otherwise. This<br />

array <strong>of</strong> 0’s <strong>and</strong> 1’s constitutes the data that must be modelled. [e.g. obviously we can’t model<br />

the dependence <strong>of</strong> these <strong>in</strong>dicators on a covariate if we only observe the covariate value at the<br />

locations where I = 1.]<br />

2.4 Marks <strong>and</strong> covariates<br />

The ma<strong>in</strong> differences between marks <strong>and</strong> covariates are that<br />

• marks are associated with data <strong>po<strong>in</strong>t</strong>s;<br />

1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0<br />

0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 1<br />

0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0<br />

0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 1<br />

0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0<br />

1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0<br />

0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0<br />

0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1<br />

0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0<br />

0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0<br />

0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0<br />

1 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0<br />

0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1<br />

0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0<br />

0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0<br />

0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0<br />

0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1<br />

0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0<br />

1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 1 0 1<br />

0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0<br />

• marks are part <strong>of</strong> the ‘response’ (the <strong>po<strong>in</strong>t</strong> pattern) while covariates are ‘explanatory’.<br />

2.4.1 Marks<br />

A mark variable may be <strong>in</strong>terpreted as an additional coord<strong>in</strong>ate for the <strong>po<strong>in</strong>t</strong>: for example<br />

a <strong>po<strong>in</strong>t</strong> process <strong>of</strong> earthquake epicentre locations (longitude, latitude), with marks giv<strong>in</strong>g the<br />

occurrence time <strong>of</strong> each earthquake, can alternatively be viewed as a <strong>po<strong>in</strong>t</strong> process <strong>in</strong> space-time<br />

with coord<strong>in</strong>ates (longitude, latitude, time).<br />

A marked <strong>po<strong>in</strong>t</strong> process <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>in</strong> space S with marks belong<strong>in</strong>g to a set M is mathematically<br />

def<strong>in</strong>ed as a <strong>po<strong>in</strong>t</strong> process <strong>in</strong> the cartesian product S × M. The space M <strong>of</strong> possible marks<br />

may be ‘anyth<strong>in</strong>g’. In current applications, typically the mark is either a categorical variable<br />

(so that the <strong>po<strong>in</strong>t</strong>s are grouped <strong>in</strong>to ‘types’) or a real number. Multivariate marks consist<strong>in</strong>g <strong>of</strong><br />

several such variables are also common.<br />

Copyright c○CSIRO 2008


16 <strong>Statistical</strong> formulation<br />

A marked <strong>po<strong>in</strong>t</strong> pattern is an unordered set<br />

y = {(x1,m1),... ,(xn,mn)}, xi ∈ W, mi ∈ M<br />

where xi are the locations <strong>and</strong> mi are the correspond<strong>in</strong>g marks.<br />

Marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> are discussed <strong>in</strong> detail <strong>in</strong> section 23.<br />

2.4.2 Covariates<br />

Any k<strong>in</strong>d <strong>of</strong> data may be recruited as an explanatory variable (covariate).<br />

A ‘<strong>spatial</strong> function’, ‘<strong>spatial</strong> covariate’ or ‘geostatistical covariate’ is a function Z(u) observable<br />

(potentially) at every <strong>spatial</strong> location u ∈ W. Values <strong>of</strong> Z(u) may be available for a f<strong>in</strong>e<br />

grid <strong>of</strong> locations u:<br />

The values <strong>of</strong> a <strong>spatial</strong> function Z(u) may only be observable at some scattered sampl<strong>in</strong>g<br />

locations u. An example is the measurement <strong>of</strong> soil pH at a few sampl<strong>in</strong>g locations. In this case,<br />

the value <strong>of</strong> the covariate Z must be observed for all <strong>po<strong>in</strong>t</strong>s xi <strong>of</strong> the <strong>po<strong>in</strong>t</strong> pattern x, <strong>and</strong> must<br />

also be observed at some other ‘non-data’ or ‘background’ locations u ∈ W with u ∈ x.<br />

Alternatively, the covariate <strong>in</strong>formation may consist <strong>of</strong> another <strong>spatial</strong> pattern, such as a<br />

<strong>po<strong>in</strong>t</strong> pattern or a l<strong>in</strong>e segment pattern. The way <strong>in</strong> which this covariate <strong>in</strong>formation enters<br />

the analysis or statistical model depends very much on the context <strong>and</strong> the choice <strong>of</strong> model.<br />

Typically the covariate pattern would be used to def<strong>in</strong>e a surrogate <strong>spatial</strong> function Z, for<br />

example, Z(u) may be the distance from u to the nearest l<strong>in</strong>e segment.<br />

Copyright c○CSIRO 2008<br />

120 130 140 150 160


3 The R system<br />

We will be us<strong>in</strong>g the statistical package R.<br />

3.1 How to obta<strong>in</strong> R<br />

R is free s<strong>of</strong>tware with an open-source licence. You can download it from r-project.org <strong>and</strong><br />

it should be easy to <strong>in</strong>stall on any computer (see the <strong>in</strong>structions at the website).<br />

Books <strong>and</strong> onl<strong>in</strong>e tutorials are available to help you learn to use R.<br />

3.2 How comm<strong>and</strong>s are pr<strong>in</strong>ted <strong>in</strong> the notes<br />

You can run an R session us<strong>in</strong>g either a <strong>po<strong>in</strong>t</strong>-<strong>and</strong>-click <strong>in</strong>terface or a l<strong>in</strong>e-by-l<strong>in</strong>e comm<strong>and</strong><br />

<strong>in</strong>terpreter. In these notes, R comm<strong>and</strong>s are pr<strong>in</strong>ted as they would appear when typed at the<br />

comm<strong>and</strong> l<strong>in</strong>e. So a typical series <strong>of</strong> R comm<strong>and</strong>s looks like this:<br />

> pi/2<br />

> s<strong>in</strong>(pi/2)<br />

> x x<br />

Note that you are not meant to type the > symbol; this is just the prompt for comm<strong>and</strong> <strong>in</strong>put<br />

<strong>in</strong> R. To type the first comm<strong>and</strong>, just type pi/2.<br />

In these notes we will sometimes also pr<strong>in</strong>t the response that R gives to a set <strong>of</strong> comm<strong>and</strong>s.<br />

In the example above, it would look like this:<br />

> pi/2<br />

[1] 1.570796<br />

> s<strong>in</strong>(pi/2)<br />

[1] 1<br />

> x x<br />

[1] 1.414214<br />

If the <strong>in</strong>put is too long, R will break it <strong>in</strong>to several l<strong>in</strong>es, <strong>and</strong> pr<strong>in</strong>t the character + to <strong>in</strong>dicate<br />

that the <strong>in</strong>put cont<strong>in</strong>ues from the previous l<strong>in</strong>e. (You don’t type the +). Also if you type an<br />

expression <strong>in</strong>volv<strong>in</strong>g brackets <strong>and</strong> hit Return before all the open brackets have been closed, then<br />

R will pr<strong>in</strong>t a + <strong>in</strong>dicat<strong>in</strong>g that it expects you to f<strong>in</strong>ish the expression.<br />

> folderol s<strong>in</strong>(folderol * folderol * folderol * folderol * folderol * folderol *<br />

+ folderol * folderol * folderol * folderol)<br />

[1] -0.09132148<br />

17<br />

Copyright c○CSIRO 2008


18 The R system<br />

3.3 Contributed libraries for R<br />

In addition to the basic R system, the R website also <strong>of</strong>fers many add-on modules (‘libraries’ or<br />

‘packages’) contributed by users. These can be downloaded from cran.r-project.org (under<br />

‘Contributed Packages’).<br />

Packages that may be useful for analys<strong>in</strong>g <strong>spatial</strong> data <strong>in</strong>clude:<br />

ads <strong>spatial</strong> <strong>po<strong>in</strong>t</strong> pattern analysis<br />

DCluster detect<strong>in</strong>g clusters <strong>in</strong> <strong>spatial</strong> count data<br />

fields curve <strong>and</strong> function fitt<strong>in</strong>g<br />

geoR model-based geostatistical methods<br />

geoRglm model-based geostatistical methods<br />

GeoXB <strong>in</strong>teractive <strong>spatial</strong> exploratory data analysis<br />

grasp <strong>spatial</strong> prediction<br />

maptools geographical <strong>in</strong>formation systems<br />

rgdal <strong>in</strong>terface to GDAL geographical data analysis<br />

sp base library for some <strong>spatial</strong> data analysis packages<br />

spatclus detect<strong>in</strong>g clusters <strong>in</strong> <strong>spatial</strong> <strong>po<strong>in</strong>t</strong> pattern data<br />

<strong>spatial</strong>Covariance <strong>spatial</strong> covariance for data on grids<br />

<strong>spatial</strong>kernel <strong>in</strong>terpolation <strong>and</strong> segregation <strong>of</strong> <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

spatstat Spatial <strong>po<strong>in</strong>t</strong> pattern analysis <strong>and</strong> modell<strong>in</strong>g<br />

spBayes Gaussian <strong>spatial</strong> process MCMC (grid data)<br />

spdep <strong>spatial</strong> statistics for variables observed at fixed sites<br />

spgwr geographically weighted regression<br />

splancs <strong>spatial</strong> <strong>and</strong> space-time <strong>po<strong>in</strong>t</strong> pattern analysis<br />

spsurvey <strong>spatial</strong> survey methods<br />

trip analysis <strong>of</strong> <strong>spatial</strong> trip data<br />

To make use <strong>of</strong> a package, you need to:<br />

1. download the package code (once only) without unpack<strong>in</strong>g;<br />

2. ‘<strong>in</strong>stall’ the package code on your system (once only);<br />

3. ‘load’ the package <strong>in</strong>to your current R session us<strong>in</strong>g the comm<strong>and</strong> library (each time you<br />

start a new R session).<br />

The <strong>in</strong>stallation step is performed automatically us<strong>in</strong>g R, not by manually unpack<strong>in</strong>g the code.<br />

Installation is usually a very easy process.<br />

Instructions on how to <strong>in</strong>stall a package are given atcran.r-project.org. If you are runn<strong>in</strong>g<br />

W<strong>in</strong>dows, first start an R session. Then try the pull-down menu item Packages — Install<br />

packages. If this menu item is available, then you will be able to download <strong>and</strong> <strong>in</strong>stall any<br />

desired packages by simply select<strong>in</strong>g the package name from the pulldown list. If this menu item<br />

is not available (for <strong>in</strong>ternet security reasons), you can manually download packages by go<strong>in</strong>g<br />

to the CRAN website under Contributed packages -- W<strong>in</strong>dows b<strong>in</strong>aries <strong>and</strong> download<strong>in</strong>g<br />

the desired zip files <strong>of</strong> W<strong>in</strong>dows b<strong>in</strong>ary files. To perform step 2, start an R session <strong>and</strong> use the<br />

menu item Packages — Install from local zip files to <strong>in</strong>stall.<br />

If you are runn<strong>in</strong>g L<strong>in</strong>ux, step 1 is performed manually by go<strong>in</strong>g to the CRAN website under<br />

Contributed Packages <strong>and</strong> download<strong>in</strong>g the tar filepackagename.tar.gz. Step 2 is performed<br />

by issu<strong>in</strong>g the comm<strong>and</strong> R CMD INSTALL packagename.tar.gz.<br />

Copyright c○CSIRO 2008


4 Introduction to spatstat<br />

4.1 The spatstat package<br />

Spatstat is a contributed R package for analys<strong>in</strong>g <strong>spatial</strong> data, written by Adrian Baddeley <strong>and</strong><br />

Rolf Turner. Current versions <strong>of</strong> spatstat deal ma<strong>in</strong>ly with <strong>spatial</strong> <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> <strong>in</strong> two<br />

dimensions. The package supports<br />

• creation, manipulation <strong>and</strong> plott<strong>in</strong>g <strong>of</strong> <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

• exploratory data analysis<br />

• simulation <strong>of</strong> <strong>po<strong>in</strong>t</strong> process models<br />

• parametric model-fitt<strong>in</strong>g<br />

• hypothesis tests, residual plots, diagnostics<br />

Spatstat is one <strong>of</strong> the largest contributed packages available for R, with over 300 user-level<br />

functions <strong>and</strong> a 500-page manual. It has its own web doma<strong>in</strong>, www.spatstat.org, <strong>of</strong>fer<strong>in</strong>g<br />

<strong>in</strong>formation about the package.<br />

Spatstat can be downloaded from cran.r-project.org (under ‘Contributed packages —<br />

spatstat’). To <strong>in</strong>stall spatstat you will also need to download the packages mgcv <strong>and</strong> sm.<br />

4.2 Please acknowledge spatstat<br />

If you use spatstat for research that leads to publications, it would be much appreciated if<br />

you could acknowledge spatstat <strong>in</strong> your publications, preferably cit<strong>in</strong>g [4]. Citations help us<br />

to justify the expenditure <strong>of</strong> time <strong>and</strong> effort on ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g <strong>and</strong> develop<strong>in</strong>g the package.<br />

4.3 Gett<strong>in</strong>g started<br />

Here is a quick demonstration <strong>of</strong> spatstat <strong>in</strong> action. You can follow the demonstration by<br />

typ<strong>in</strong>g the comm<strong>and</strong>s <strong>in</strong>to R.<br />

To beg<strong>in</strong> any analysis us<strong>in</strong>g spatstat, first start the R system, <strong>and</strong> type<br />

> library(spatstat)<br />

The response will be someth<strong>in</strong>g like this:<br />

> library(spatstat)<br />

This is mgcv 1.3-20<br />

spatstat 1.14-5<br />

Type ’help(spatstat)’ for <strong>in</strong>formation<br />

The pr<strong>in</strong>tout shows that, before load<strong>in</strong>g spatstat, the system has loaded the package mgcv<br />

that is required by spatstat. Then it loads spatstat, show<strong>in</strong>g the version number <strong>of</strong> the<br />

package.<br />

For a list <strong>of</strong> the comm<strong>and</strong>s available <strong>in</strong> spatstat, type<br />

> help(spatstat)<br />

To get <strong>in</strong>formation on a particular comm<strong>and</strong>, type help(comm<strong>and</strong>).<br />

To ga<strong>in</strong> an impression <strong>of</strong> what is available <strong>in</strong> spatstat, you can run the package demonstration<br />

by typ<strong>in</strong>g demo(spatstat).<br />

19<br />

Copyright c○CSIRO 2008


20 Introduction to spatstat<br />

4.4 Inspect<strong>in</strong>g data<br />

For our first demonstration, we’ll use one <strong>of</strong> the st<strong>and</strong>ard <strong>po<strong>in</strong>t</strong> pattern datasets that is <strong>in</strong>stalled<br />

with the package. The ‘Swedish P<strong>in</strong>es’ dataset represent the positions <strong>of</strong> 71 trees <strong>in</strong> a forest plot<br />

9.6 by 10.0 metres.<br />

> data(swedishp<strong>in</strong>es)<br />

To avoid typ<strong>in</strong>g ‘swedishp<strong>in</strong>es’ all the time, let us copy the data to another dataset with a<br />

shorter name:<br />

> X plot(X)<br />

> X<br />

X<br />

Simply typ<strong>in</strong>g the name <strong>of</strong> the dataset gives you some basic <strong>in</strong>formation:<br />

planar <strong>po<strong>in</strong>t</strong> pattern: 71 <strong>po<strong>in</strong>t</strong>s<br />

w<strong>in</strong>dow: rectangle = [0, 96] x [0, 100] units (one unit = 0.1 metres)<br />

Let’s study the <strong>in</strong>tensity (density <strong>of</strong> <strong>po<strong>in</strong>t</strong>s) <strong>in</strong> this <strong>po<strong>in</strong>t</strong> pattern. For a few basic summary<br />

statistics, type<br />

> summary(X)<br />

Planar <strong>po<strong>in</strong>t</strong> pattern: 71 <strong>po<strong>in</strong>t</strong>s<br />

Average <strong>in</strong>tensity 0.0074 <strong>po<strong>in</strong>t</strong>s per square unit (one unit = 0.1 metres)<br />

W<strong>in</strong>dow: rectangle = [0, 96] x [0, 100] units<br />

W<strong>in</strong>dow area = 9600 square units<br />

Unit <strong>of</strong> length: 0.1 metres<br />

The coord<strong>in</strong>ates are <strong>in</strong> decimetres (0.1 metre), so the average <strong>in</strong>tensity is 0.0074 trees per<br />

square decimetre or 0.74 trees per square metre.<br />

To get an impression <strong>of</strong> local <strong>spatial</strong> variations <strong>in</strong> <strong>in</strong>tensity, we can plot a kernel estimate <strong>of</strong><br />

<strong>in</strong>tensity:<br />

Copyright c○CSIRO 2008


4.5 Exploratory data analysis 21<br />

> plot(density(X, 10))<br />

density(X, 10)<br />

0.002 0.004 0.006 0.008 0.01 0.012 0.014<br />

where 10 is my chosen value for the st<strong>and</strong>ard deviation <strong>of</strong> the Gaussian smooth<strong>in</strong>g kernel.<br />

If you prefer a contour plot,<br />

> contour(density(X, 10), axes = FALSE)<br />

0.011<br />

0.004<br />

0.003<br />

0.01<br />

0.009<br />

0.008<br />

0.005<br />

0.004<br />

0.009<br />

0.01<br />

0.006<br />

density(X, 10)<br />

0.005<br />

0.007<br />

0.005<br />

0.006<br />

0.007<br />

0.004<br />

0.008<br />

0.009<br />

0.01<br />

0.006<br />

0.007<br />

0.011<br />

0.012<br />

0.005<br />

0.009<br />

0.009<br />

0.01<br />

0.013<br />

0.015<br />

0.014<br />

The contours are labelled <strong>in</strong> density units <strong>of</strong> “trees per square decimetre”.<br />

4.5 Exploratory data analysis<br />

Spatstat is designed to support all the st<strong>and</strong>ard types <strong>of</strong> exploratory data analysis for <strong>po<strong>in</strong>t</strong><br />

<strong>patterns</strong>.<br />

One example is quadrat count<strong>in</strong>g. The study region is divided <strong>in</strong>to rectangles (‘quadrats’) <strong>of</strong><br />

equal size, <strong>and</strong> the number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>in</strong> each rectangle is counted.<br />

> Q Q<br />

x<br />

y [0,24] (24,48] (48,72] (72,96]<br />

(66.7,100] 7 3 6 5<br />

(33.3,66.7] 5 9 7 7<br />

[0,33.3] 4 3 6 9<br />

Copyright c○CSIRO 2008


22 Introduction to spatstat<br />

> plot(X)<br />

> plot(Q, add = TRUE, cex = 2)<br />

X<br />

7 3 6 5<br />

5 9 7 7<br />

4 3 6 9<br />

Another example is Ripley’s K function. I’ll expla<strong>in</strong> more about the K function later. For<br />

now, we’ll just demonstrate how easy it is to compute <strong>and</strong> plot it. To compute the K function<br />

for a <strong>po<strong>in</strong>t</strong> pattern X, type Kest(X). This returns an object which can be plotted.<br />

> K plot(K)<br />

K(r)<br />

0 500 1000 1500<br />

0 5 10 15 20<br />

K<br />

r (one unit = 0.1 metres)<br />

4.6 Multitype <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

A marked <strong>po<strong>in</strong>t</strong> pattern <strong>in</strong> which the marks are a categorical variable is usually called a multitype<br />

<strong>po<strong>in</strong>t</strong> pattern. The ‘types’ are the different values or levels <strong>of</strong> the mark variable.<br />

Here is the famous Lans<strong>in</strong>g Woods dataset record<strong>in</strong>g the positions <strong>of</strong> 2251 trees <strong>of</strong> 6 different<br />

species (hickories, maples, red oaks, white oaks, black oaks <strong>and</strong> miscellaneous trees).<br />

> data(lans<strong>in</strong>g)<br />

> lans<strong>in</strong>g<br />

marked planar <strong>po<strong>in</strong>t</strong> pattern: 2251 <strong>po<strong>in</strong>t</strong>s<br />

multitype, with levels = blackoak hickory maple misc redoak<br />

w<strong>in</strong>dow: rectangle = [0, 1] x [0, 1] units (one unit = 924 feet)<br />

Copyright c○CSIRO 2008


4.6 Multitype <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> 23<br />

> summary(lans<strong>in</strong>g)<br />

Marked planar <strong>po<strong>in</strong>t</strong> pattern: 2251 <strong>po<strong>in</strong>t</strong>s<br />

Average <strong>in</strong>tensity 2250 <strong>po<strong>in</strong>t</strong>s per square unit (one unit = 924 feet)<br />

*Pattern conta<strong>in</strong>s duplicated <strong>po<strong>in</strong>t</strong>s*<br />

Multitype:<br />

frequency proportion <strong>in</strong>tensity<br />

blackoak 135 0.0600 135<br />

hickory 703 0.3120 703<br />

maple 514 0.2280 514<br />

misc 105 0.0466 105<br />

redoak 346 0.1540 346<br />

whiteoak 448 0.1990 448<br />

W<strong>in</strong>dow: rectangle = [0, 1] x [0, 1] units<br />

W<strong>in</strong>dow area = 1 square unit<br />

Unit <strong>of</strong> length: 924 feet<br />

> plot(lans<strong>in</strong>g)<br />

blackoak hickory maple misc redoak whiteoak<br />

1 2 3 4 5 6<br />

lans<strong>in</strong>g<br />

In this plot, each type <strong>of</strong> <strong>po<strong>in</strong>t</strong> (i.e. each species <strong>of</strong> tree) is represented by a different plot<br />

symbol. The last l<strong>in</strong>e <strong>of</strong> output above expla<strong>in</strong>s the encod<strong>in</strong>g: black oak is coded as symbol 1<br />

(open circle) <strong>and</strong> so on.<br />

An alternative way to plot these data is to split them <strong>in</strong>to 6 <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>, each pattern<br />

conta<strong>in</strong><strong>in</strong>g the trees <strong>of</strong> one species. This is done us<strong>in</strong>g split:<br />

> plot(split(lans<strong>in</strong>g))<br />

Copyright c○CSIRO 2008


24 Introduction to spatstat<br />

split(lans<strong>in</strong>g)<br />

blackoak hickory maple<br />

misc redoak whiteoak<br />

The result <strong>of</strong> split(lans<strong>in</strong>g) is a list <strong>of</strong> <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>. The names <strong>of</strong> the list entries are the<br />

names <strong>of</strong> the types (<strong>in</strong> this case "blackoak","hickory", etc). To extract one <strong>of</strong> these <strong>patterns</strong>,<br />

e.g. the hickories,<br />

> hick plot(hick)<br />

hick<br />

Copyright c○CSIRO 2008


4.7 Installed datasets 25<br />

4.7 Installed datasets<br />

For reference, here is a list <strong>of</strong> the st<strong>and</strong>ard <strong>po<strong>in</strong>t</strong> pattern datasets that are supplied with the<br />

<strong>in</strong>stallation <strong>of</strong> spatstat:<br />

name description marks covariates w<strong>in</strong>dow<br />

amacr<strong>in</strong>e Hughes’ rabbit amacr<strong>in</strong>e cells 2 types ·<br />

anemones Upton-F<strong>in</strong>gleton sea anemones diameter ·<br />

ants Harkness-Isham ant nests 2 species 2 zones convex poly<br />

bei Tropical ra<strong>in</strong>forest trees · topography<br />

betacells Wässle et al. cat ret<strong>in</strong>al ganglia 2 types ·<br />

bramblecanes Bramble Canes 3 ages ·<br />

cells Crick-Ripley biological cells · ·<br />

chorley Chorley-South Ribble cancers case/control · irregular<br />

copper Queensl<strong>and</strong> copper deposits · fault l<strong>in</strong>es<br />

demopat artificial data 2 types · irregular<br />

f<strong>in</strong>p<strong>in</strong>es F<strong>in</strong>nish P<strong>in</strong>es diameter ·<br />

hamster Aherne’s hamster tumour data 2 types ·<br />

humberside Humberside child leukaemia case/control · irregular<br />

japanesep<strong>in</strong>es Japanese P<strong>in</strong>es · ·<br />

lans<strong>in</strong>g Lans<strong>in</strong>g Woods 6 species ·<br />

longleaf Longleaf P<strong>in</strong>e trees diameter ·<br />

murchison Murchison gold deposits · fault l<strong>in</strong>es irregular<br />

nbfires New Brunswick fires several · irregular<br />

nztrees Mark-Esler-Ripley NZ trees · ·<br />

ponderosa Getis-Frankl<strong>in</strong> Ponderosa p<strong>in</strong>es · ·<br />

redwood Strauss-Ripley redwood sapl<strong>in</strong>gs · ·<br />

redwoodfull Strauss redwood map (full set) · 2 zones<br />

simdat Simulated <strong>po<strong>in</strong>t</strong> pattern · ·<br />

spruces Spruce trees <strong>in</strong> Saxony diameter ·<br />

swedishp<strong>in</strong>es Str<strong>and</strong>-Ripley Swedish p<strong>in</strong>es · ·<br />

urkiola Urkiola Woods, Spa<strong>in</strong> 2 species · irregular<br />

The symbol <strong>in</strong>dicates that the w<strong>in</strong>dow for the pattern is a rectangle.<br />

To flick through a nice display <strong>of</strong> all these datasets, type demo(data).<br />

To access one <strong>of</strong> these datasets, type data(name) where name is the name listed above. To<br />

see <strong>in</strong>formation about the dataset, type help(name). To plot the dataset, type plot(name).<br />

4.8 Po<strong>in</strong>t-<strong>and</strong>-click on the screen<br />

There is a graphical <strong>in</strong>terface which allows you to draw a <strong>po<strong>in</strong>t</strong> pattern on the screen. Type<br />

> X plot(X)<br />

Copyright c○CSIRO 2008


26 Introduction to spatstat<br />

PART II. DATA TYPES<br />

In Part II <strong>of</strong> the workshop, we look at the different types <strong>of</strong> <strong>spatial</strong> data (<strong>po<strong>in</strong>t</strong> <strong>patterns</strong>, w<strong>in</strong>dows,<br />

pixel images, etc) <strong>and</strong> how to manipulate them <strong>in</strong> spatstat.<br />

Copyright c○CSIRO 2008


5 Objects, classes <strong>and</strong> methods <strong>in</strong> R<br />

The tutorial examples above have used some <strong>of</strong> the ‘object-oriented’ features <strong>of</strong> R. It is very<br />

useful to know a little about how these work.<br />

5.1 Classes <strong>in</strong> R<br />

R is an ‘object-oriented’ language. A dataset with some k<strong>in</strong>d <strong>of</strong> structure on it (e.g. a cont<strong>in</strong>gency<br />

table, a time series, a <strong>po<strong>in</strong>t</strong> pattern) is treated as a s<strong>in</strong>gle ‘object’.<br />

For example, R <strong>in</strong>cludes a datasetsunspots which is a time series conta<strong>in</strong><strong>in</strong>g monthly sunspot<br />

counts from 1749 to 1983. This dataset can be manipulated as if it were a s<strong>in</strong>gle object:<br />

> plot(sunspots)<br />

> summary(sunspots)<br />

> X class(sunspots)<br />

[1] "ts"<br />

St<strong>and</strong>ard operations, such as pr<strong>in</strong>t<strong>in</strong>g, plott<strong>in</strong>g, or calculat<strong>in</strong>g the sample mean, are def<strong>in</strong>ed<br />

separately for each class <strong>of</strong> object.<br />

For example, typ<strong>in</strong>g plot(sunspots) <strong>in</strong>vokes the generic comm<strong>and</strong> plot. Now sunspots is<br />

an object <strong>of</strong> class "ts" represent<strong>in</strong>g a time series, <strong>and</strong> there is a special “method” for plott<strong>in</strong>g<br />

time series, called plot.ts. So the system executes plot.ts(sunspots). It is said that the plot<br />

comm<strong>and</strong> is “dispatched” to the method plot.ts. The plot method for time series produces a<br />

display that is sensible for time series, with axes properly annotated.<br />

Tip: to f<strong>in</strong>d out how to modify the plot for an object <strong>of</strong> class "foo", consult<br />

help(plot.foo) rather than help(plot).<br />

5.2 Classes <strong>in</strong> spatstat<br />

To h<strong>and</strong>le <strong>po<strong>in</strong>t</strong> pattern datasets <strong>and</strong> related data, the spatstat package def<strong>in</strong>es the follow<strong>in</strong>g<br />

classes <strong>of</strong> objects:<br />

• ppp: planar <strong>po<strong>in</strong>t</strong> pattern<br />

• ow<strong>in</strong>: <strong>spatial</strong> region (‘observation w<strong>in</strong>dow’)<br />

• im: pixel image<br />

• psp: pattern <strong>of</strong> l<strong>in</strong>e segments<br />

• tess: tessellation<br />

27<br />

Copyright c○CSIRO 2008


28 Objects, classes <strong>and</strong> methods <strong>in</strong> R<br />

Rectangular w<strong>in</strong>dow<br />

(class ow<strong>in</strong>)<br />

Po<strong>in</strong>t pattern (class ppp)<br />

Polygonal w<strong>in</strong>dow<br />

(class ow<strong>in</strong>)<br />

Pixel image (class im)<br />

B<strong>in</strong>ary mask w<strong>in</strong>dow<br />

(class ow<strong>in</strong>)<br />

L<strong>in</strong>e segment pattern (class psp) Tessellation (class tess)<br />

Most <strong>of</strong> the functionality <strong>in</strong> spatstat works on such objects. To use this functionality, you’ll<br />

need to read your raw data <strong>in</strong>to R <strong>and</strong> then convert it <strong>in</strong>to an object <strong>of</strong> the appropriate format.<br />

In particular spatstat has methods for plot, pr<strong>in</strong>t <strong>and</strong> summary for each <strong>of</strong> these classes.<br />

For example, the plot method for <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>, plot.ppp, ensures that the x <strong>and</strong> y scales<br />

are equal, <strong>and</strong> does various other th<strong>in</strong>gs that are sensible when plott<strong>in</strong>g a <strong>spatial</strong> <strong>po<strong>in</strong>t</strong> pattern<br />

rather than just a list <strong>of</strong> (x,y) pairs.<br />

Copyright c○CSIRO 2008<br />

120 130 140 150 160


5.2 Classes <strong>in</strong> spatstat 29<br />

> data(humberside)<br />

> plot(humberside)<br />

humberside<br />

Exercise 1 F<strong>in</strong>d out how to modify the comm<strong>and</strong> plot(swedishp<strong>in</strong>es) so that the title reads<br />

“Swedish P<strong>in</strong>es data” <strong>and</strong> the <strong>po<strong>in</strong>t</strong>s are represented by plus-signs <strong>in</strong>stead <strong>of</strong> circles.<br />

When you type pr<strong>in</strong>t(swedishp<strong>in</strong>es) or just swedishp<strong>in</strong>es, this <strong>in</strong>vokes the generic comm<strong>and</strong>pr<strong>in</strong>t,<br />

which dispatches to the methodpr<strong>in</strong>t.ppp, which pr<strong>in</strong>ts some sensible <strong>in</strong>formation<br />

about the <strong>po<strong>in</strong>t</strong> pattern swedishp<strong>in</strong>es at the term<strong>in</strong>al.<br />

> swedishp<strong>in</strong>es<br />

planar <strong>po<strong>in</strong>t</strong> pattern: 71 <strong>po<strong>in</strong>t</strong>s<br />

w<strong>in</strong>dow: rectangle = [0, 96] x [0, 100] units (one unit = 0.1 metres)<br />

The generic comm<strong>and</strong> summary is meant to provide basic summary statistics for a dataset.<br />

When you type summary(swedishp<strong>in</strong>es) this is dispatched to the method summary.ppp, which<br />

computes a sensible set <strong>of</strong> summary statistics for a <strong>po<strong>in</strong>t</strong> pattern, <strong>and</strong> pr<strong>in</strong>ts them at the term<strong>in</strong>al.<br />

> summary(swedishp<strong>in</strong>es)<br />

Planar <strong>po<strong>in</strong>t</strong> pattern: 71 <strong>po<strong>in</strong>t</strong>s<br />

Average <strong>in</strong>tensity 0.0074 <strong>po<strong>in</strong>t</strong>s per square unit (one unit = 0.1 metres)<br />

W<strong>in</strong>dow: rectangle = [0, 96] x [0, 100] units<br />

W<strong>in</strong>dow area = 9600 square units<br />

Unit <strong>of</strong> length: 0.1 metres<br />

The comm<strong>and</strong> density is also generic. It is normally used to compute a kernel density<br />

estimate <strong>of</strong> a probability distribution from a vector <strong>of</strong> numbers. (This “default method” is<br />

called density.default.) But there is also a method for <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>, so that when you type<br />

density(swedishp<strong>in</strong>es), this is dispatched to density.ppp which computes a two-dimensional<br />

kernel estimate <strong>of</strong> the <strong>in</strong>tensity function.<br />

> plot(density(swedishp<strong>in</strong>es, sigma = 10))<br />

Copyright c○CSIRO 2008


30 Objects, classes <strong>and</strong> methods <strong>in</strong> R<br />

density(swedishp<strong>in</strong>es, sigma = 10)<br />

0.002 0.006 0.01 0.014<br />

To see a list <strong>of</strong> all methods available <strong>in</strong> R for a particular generic function such as plot:<br />

> methods(plot)<br />

To see a list <strong>of</strong> all methods that are available for a particular class such as ppp:<br />

> methods(class = "ppp")<br />

[1] [.ppp [ plot(cells)<br />

Copyright c○CSIRO 2008


5.3 Return values 31<br />

There’s still a return value from the function, which can be captured by assign<strong>in</strong>g the result<br />

to a variable:<br />

> a a<br />

NULL<br />

Tip: Many plott<strong>in</strong>g comm<strong>and</strong>s return a value which is useful if you want to annotate<br />

the plot. In spatstat the function plot.ppp plots a <strong>po<strong>in</strong>t</strong> pattern <strong>and</strong> returns<br />

<strong>in</strong>formation about the encod<strong>in</strong>g <strong>of</strong> the marks. After plott<strong>in</strong>g a multitype pattern, to<br />

make a nice legend for the plot, save the result <strong>of</strong> the plot call <strong>and</strong> pass it to the<br />

legend comm<strong>and</strong>:<br />

> data(lans<strong>in</strong>g)<br />

> a legend(-0.25, 0.5, names(a), pch = a)<br />

blackoak<br />

hickory<br />

maple<br />

misc<br />

redoak<br />

whiteoak<br />

lans<strong>in</strong>g<br />

Tip: To f<strong>in</strong>d out the format <strong>of</strong> the output returned by a particular function fun,<br />

type help(fun) <strong>and</strong> read the section headed ‘Value’.<br />

5.3.2 Return<strong>in</strong>g an object<br />

A function which performs a complicated analysis <strong>of</strong> your data will typically return an object<br />

belong<strong>in</strong>g to a special class. This is a convenient way to h<strong>and</strong>le calculations that yield large or<br />

complicated output. It enables you to store the result for later use, <strong>and</strong> provides methods for<br />

h<strong>and</strong>l<strong>in</strong>g the result.<br />

Many <strong>of</strong> the functions <strong>in</strong> spatstat return an object <strong>of</strong> a special class. For example, the<br />

value returned by density.ppp is a pixel image (an object <strong>of</strong> class "im"). This is effectively a<br />

large matrix, giv<strong>in</strong>g the values <strong>of</strong> the kernel estimate <strong>of</strong> <strong>in</strong>tensity at each <strong>po<strong>in</strong>t</strong> <strong>in</strong> a f<strong>in</strong>e regular<br />

grid <strong>of</strong> locations.<br />

Copyright c○CSIRO 2008


32 Objects, classes <strong>and</strong> methods <strong>in</strong> R<br />

> Z Z<br />

real-valued pixel image<br />

100 x 100 pixel array (ny, nx)<br />

enclos<strong>in</strong>g rectangle: [0, 96] x [0, 100] units (one unit = 0.1 metres)<br />

The class <strong>of</strong> pixel images <strong>in</strong> spatstat has methods for pr<strong>in</strong>t, summary, plot <strong>and</strong> so on.<br />

> summary(Z)<br />

real-valued pixel image<br />

100 x 100 pixel array (ny, nx)<br />

enclos<strong>in</strong>g rectangle: [0, 96] x [0, 100] units<br />

dimensions <strong>of</strong> each pixel: 0.96 x 1 units<br />

(one unit = 0.1 metres)<br />

Image is def<strong>in</strong>ed on the full rectangular grid<br />

Frame area = 9600 square units<br />

Pixel values :<br />

range = [0.00188947243195950,0.0155470858797917]<br />

<strong>in</strong>tegral = 71.3036909843861<br />

mean = 0.00742746781087355<br />

Another example is the comm<strong>and</strong> Kest which estimates Ripley’s K-function. The value<br />

returned by Kest is an object <strong>of</strong> class "fv" (‘function value table’) conta<strong>in</strong><strong>in</strong>g the estimated<br />

values <strong>of</strong> K(r), obta<strong>in</strong>ed us<strong>in</strong>g several different estimators, for a range <strong>of</strong> r values. This class<br />

has methods for pr<strong>in</strong>t, plot <strong>and</strong> so on.<br />

> u u<br />

Function value object (class ’fv’)<br />

for the function r -> K(r)<br />

Entries:<br />

id label description<br />

-- ----- ----------r<br />

r distance argument r<br />

theo Kpois(r) theoretical Poisson K(r)<br />

border Kbord(r) border-corrected estimate <strong>of</strong> K(r)<br />

trans Ktrans(r) translation-corrected estimate <strong>of</strong> K(r)<br />

iso Kiso(r) Ripley isotropic correction estimate <strong>of</strong> K(r)<br />

--------------------------------------<br />

Default plot formula:<br />

. ~ r<br />

Recommended range <strong>of</strong> argument r: [0, 24]<br />

Unit <strong>of</strong> length: 0.1 metres<br />

> plot(u)<br />

Copyright c○CSIRO 2008


K(r)<br />

0 500 1000 1500<br />

0 5 10 15 20<br />

u<br />

r (one unit = 0.1 metres)<br />

6 Po<strong>in</strong>t <strong>patterns</strong> <strong>in</strong> spatstat<br />

To analyse your own <strong>po<strong>in</strong>t</strong> pattern data <strong>in</strong> spatstat, you’ll need to read the raw data <strong>in</strong>to R<br />

<strong>and</strong> convert them <strong>in</strong>to an object <strong>of</strong> class "ppp". This tutorial gives one basic recipe.<br />

6.1 Basic recipe<br />

In many cases, the observation w<strong>in</strong>dow is a rectangle. The follow<strong>in</strong>g steps will then be sufficient.<br />

1. store the x <strong>and</strong> y coord<strong>in</strong>ates for the <strong>po<strong>in</strong>t</strong>s <strong>in</strong> two vectors x <strong>and</strong> y.<br />

2. if there are marks attached to the <strong>po<strong>in</strong>t</strong>s, store the correspond<strong>in</strong>g marks <strong>in</strong> a vector m.<br />

(Note: only a s<strong>in</strong>gle mark value per <strong>po<strong>in</strong>t</strong> is allowed; multivariate marks are not supported.<br />

But we’re work<strong>in</strong>g on it.)<br />

3. create the <strong>po<strong>in</strong>t</strong> pattern object by<br />

> ppp(x, y, xrange, yrange)<br />

or, if there are marks,<br />

> ppp(x, y, xrange, yrange, marks = m)<br />

where xrange, yrange are vectors <strong>of</strong> length 2 giv<strong>in</strong>g the x <strong>and</strong> y dimensions <strong>of</strong> the rectangular<br />

w<strong>in</strong>dow.<br />

The value returned by the function ppp is an object <strong>of</strong> class "ppp" represent<strong>in</strong>g a <strong>po<strong>in</strong>t</strong><br />

pattern.<br />

If the w<strong>in</strong>dow is not a rectangle, then you need to use a comm<strong>and</strong> like<br />

> ppp(x, y, w<strong>in</strong>dow = W)<br />

where W is a w<strong>in</strong>dow object. See Section 7.5 for details on how to do this.<br />

33<br />

Copyright c○CSIRO 2008


34 Po<strong>in</strong>t <strong>patterns</strong> <strong>in</strong> spatstat<br />

Enter<strong>in</strong>g coord<strong>in</strong>ate data<br />

Suppose we have recorded the x,y coord<strong>in</strong>ates <strong>of</strong> 25 <strong>po<strong>in</strong>t</strong>s that lie <strong>in</strong> a rectangle [0,2] × [0,1].<br />

They can be entered <strong>in</strong>to R <strong>in</strong> various ways, for example by typ<strong>in</strong>g them directly:<br />

> x y P plot(P)<br />

> P<br />

planar <strong>po<strong>in</strong>t</strong> pattern: 25 <strong>po<strong>in</strong>t</strong>s<br />

w<strong>in</strong>dow: rectangle = [0, 2] x [0, 1] units<br />

Copyright c○CSIRO 2008<br />

P


6.1 Basic recipe 35<br />

Marked <strong>po<strong>in</strong>t</strong> pattern<br />

Mark values may have any atomic type: numeric, <strong>in</strong>teger, character, logical, or complex. For<br />

example, let’s take a vector <strong>of</strong> real numbers:<br />

> m Q Q<br />

marked planar <strong>po<strong>in</strong>t</strong> pattern: 25 <strong>po<strong>in</strong>t</strong>s<br />

marks are numeric, <strong>of</strong> type ’double’<br />

w<strong>in</strong>dow: rectangle = [0, 2] x [0, 1] units<br />

> plot(Q)<br />

0 10 20 30 40<br />

0.00000000 0.04323888 0.08647777 0.12971665 0.17295553<br />

Q<br />

The last l<strong>in</strong>e <strong>of</strong> output is the return value from plot(Q), which <strong>in</strong>dicates the scale used to plot<br />

the marks. The mark value 10 was plotted as a circle <strong>of</strong> radius 0.0432.<br />

Categorical marks<br />

When the mark is a categorical variable, we have a multitype <strong>po<strong>in</strong>t</strong> pattern. The ‘types’ are the<br />

different levels <strong>of</strong> the mark variable. The mark values should be stored as a ‘factor’ <strong>in</strong> R.<br />

For example, let’s attach r<strong>and</strong>om marks to the pattern, tak<strong>in</strong>g two possible values Yes <strong>and</strong><br />

No with equal probability.<br />

> m m YN YN<br />

Copyright c○CSIRO 2008


36 Po<strong>in</strong>t <strong>patterns</strong> <strong>in</strong> spatstat<br />

marked planar <strong>po<strong>in</strong>t</strong> pattern: 25 <strong>po<strong>in</strong>t</strong>s<br />

multitype, with levels = No Yes<br />

w<strong>in</strong>dow: rectangle = [0, 2] x [0, 1] units<br />

> plot(YN)<br />

No Yes<br />

1 2<br />

YN<br />

If the marks are <strong>in</strong>tended to be a categorical variable, ensure that m is stored as<br />

a ‘factor’.<br />

The last l<strong>in</strong>e <strong>of</strong> output <strong>in</strong>dicates how the marks were plotted: the mark No was plotted as<br />

symbol 1 (circle) <strong>and</strong> mark Yes was plotted as symbol 2 (triangle).<br />

Notice that the factor levels have been re-sorted alphabetically (by default). This is one <strong>of</strong><br />

the common slip-ups with factors <strong>in</strong> R. To stipulate a different order<strong>in</strong>g <strong>of</strong> the levels,<br />

> m YN YN<br />

marked planar <strong>po<strong>in</strong>t</strong> pattern: 25 <strong>po<strong>in</strong>t</strong>s<br />

multitype, with levels = Yes No<br />

w<strong>in</strong>dow: rectangle = [0, 2] x [0, 1] units<br />

Tip: whenever you create a factor, check that the factor levels are as you <strong>in</strong>tended,<br />

us<strong>in</strong>g levels(x).<br />

Other ways <strong>of</strong> add<strong>in</strong>g marks to a <strong>po<strong>in</strong>t</strong> pattern will be described <strong>in</strong> Section 25.<br />

6.2 Check<strong>in</strong>g data<br />

It is prudent to check for quirks <strong>in</strong> the data.<br />

• Pr<strong>in</strong>t out the coord<strong>in</strong>ate values <strong>and</strong> marks to check for errors <strong>in</strong> data entry, <strong>and</strong> to determ<strong>in</strong>e<br />

whether the coord<strong>in</strong>ates have been rounded.<br />

• Duplicated <strong>po<strong>in</strong>t</strong>s are surpris<strong>in</strong>gly common <strong>in</strong> data files (i.e. where two records <strong>in</strong> the file<br />

refer to the same (x,y) location). Once you have entered the coord<strong>in</strong>ates <strong>in</strong>to R as a twocolumn<br />

matrix or a data frame D say, you can check for duplication us<strong>in</strong>g the comm<strong>and</strong><br />

any(duplicated(D)). If your data are already <strong>in</strong> the form <strong>of</strong> a <strong>po<strong>in</strong>t</strong> pattern X, you can<br />

also type any(duplicated(X)) to detect duplication. To remove duplicated <strong>po<strong>in</strong>t</strong>s, type<br />

Y


6.3 Units 37<br />

• Plott<strong>in</strong>g the <strong>po<strong>in</strong>t</strong> pattern is always wise. Look for unexpected <strong>patterns</strong>, <strong>and</strong> <strong>po<strong>in</strong>t</strong>s that<br />

lie outside the w<strong>in</strong>dow.<br />

• On a plot <strong>of</strong> a <strong>po<strong>in</strong>t</strong> pattern X, you can identify an <strong>in</strong>dividual <strong>po<strong>in</strong>t</strong> by typ<strong>in</strong>g plot(X);<br />

identify(X) then click<strong>in</strong>g on the <strong>po<strong>in</strong>t</strong>.<br />

The function ppp automatically checks for duplicated <strong>po<strong>in</strong>t</strong>s, <strong>and</strong> for <strong>po<strong>in</strong>t</strong>s that lie outside<br />

the specified w<strong>in</strong>dow.<br />

6.3 Units<br />

A <strong>po<strong>in</strong>t</strong> pattern X may <strong>in</strong>clude <strong>in</strong>formation about the units <strong>of</strong> length <strong>in</strong> which the x <strong>and</strong> y<br />

coord<strong>in</strong>ates are recorded. This <strong>in</strong>formation is optional; it merely enables the package to pr<strong>in</strong>t<br />

better reports <strong>and</strong> to annotate the axes <strong>in</strong> plots.<br />

If the x <strong>and</strong> y coord<strong>in</strong>ates <strong>in</strong> the <strong>po<strong>in</strong>t</strong> pattern P were recorded <strong>in</strong> metres, type<br />

> unitname(P) unitname(P) unitname(P)


38 Po<strong>in</strong>t <strong>patterns</strong> <strong>in</strong> spatstat<br />

The package help file help(spatstat) lists all the available options.<br />

Note that it is a st<strong>and</strong>ard nam<strong>in</strong>g convention <strong>in</strong> R that, for a class "foo", there should<br />

be a ‘creator’ function foo that creates objects <strong>of</strong> this class from raw numerical data, <strong>and</strong> a<br />

‘converter’ function as.foo that converts data from other formats <strong>in</strong>to objects <strong>of</strong> class "foo".<br />

We adhere to this convention <strong>in</strong> spatstat:<br />

Class Creator Converter<br />

"ppp" ppp as.ppp<br />

"ow<strong>in</strong>" ow<strong>in</strong> as.ow<strong>in</strong><br />

"im" im as.im<br />

More alternatives for us<strong>in</strong>g ppp will be covered <strong>in</strong> Section 7.5.<br />

Copyright c○CSIRO 2008


7 W<strong>in</strong>dows <strong>in</strong> spatstat<br />

Many comm<strong>and</strong>s <strong>in</strong> spatstat require us to specify a w<strong>in</strong>dow, study region or doma<strong>in</strong>. It will<br />

be h<strong>and</strong>y to know more about w<strong>in</strong>dows <strong>in</strong> spatstat.<br />

An object <strong>of</strong> class "ow<strong>in</strong>" (“observation w<strong>in</strong>dow”) represents a region or w<strong>in</strong>dow <strong>in</strong> twodimensional<br />

space. The w<strong>in</strong>dow may be<br />

• a rectangle;<br />

• a polygon or polygons, with polygonal holes; or<br />

• an irregular shape represented by a b<strong>in</strong>ary pixel image mask.<br />

Rectangular w<strong>in</strong>dow<br />

Polygonal w<strong>in</strong>dow B<strong>in</strong>ary mask w<strong>in</strong>dow<br />

Objects <strong>of</strong> this class are created by the function ow<strong>in</strong>. There are methods for pr<strong>in</strong>t<strong>in</strong>g <strong>and</strong><br />

plott<strong>in</strong>g w<strong>in</strong>dows, <strong>and</strong> numerous geometrical operations.<br />

7.1 Mak<strong>in</strong>g w<strong>in</strong>dows<br />

7.1.1 Rectangular w<strong>in</strong>dow<br />

To create a rectangular w<strong>in</strong>dow, type<br />

> ow<strong>in</strong>(xrange, yrange)<br />

where xrange, yrange are vectors <strong>of</strong> length 2 giv<strong>in</strong>g the x <strong>and</strong> y dimensions, respectively, <strong>of</strong><br />

the rectangle.<br />

> ow<strong>in</strong>(c(0, 3), c(1, 2))<br />

w<strong>in</strong>dow: rectangle = [0, 3] x [1, 2] units<br />

For a square w<strong>in</strong>dow you can also use square:<br />

> square(5)<br />

w<strong>in</strong>dow: rectangle = [0, 5] x [0, 5] units<br />

7.1.2 Circular w<strong>in</strong>dow<br />

For a circular w<strong>in</strong>dow use disc:<br />

> W


40 W<strong>in</strong>dows <strong>in</strong> spatstat<br />

7.1.3 Polygonal w<strong>in</strong>dow<br />

Spatstat supports polygonal w<strong>in</strong>dows <strong>of</strong> arbitrary shape <strong>and</strong> topology. That is, the boundary<br />

<strong>of</strong> the w<strong>in</strong>dow may consist <strong>of</strong> one or more closed polygonal curves, which do not <strong>in</strong>tersect<br />

themselves or each other. The w<strong>in</strong>dow may have ‘holes’. Type<br />

> ow<strong>in</strong>(poly = p)<br />

or<br />

> ow<strong>in</strong>(poly = p, xrange, yrange)<br />

to create a polygonal w<strong>in</strong>dow. The argument poly=p <strong>in</strong>dicates that the w<strong>in</strong>dow is polygonal<br />

<strong>and</strong> its boundary is given by the dataset p. Note we must use the “name=value” syntax to give<br />

the argument poly. The arguments xrange <strong>and</strong> yrange are optional here; if they are absent,<br />

the x <strong>and</strong> y dimensions <strong>of</strong> the bound<strong>in</strong>g rectangle will be computed from the polygon.<br />

If the w<strong>in</strong>dow boundary is a s<strong>in</strong>gle polygon, then p should be a list with components x <strong>and</strong> y<br />

giv<strong>in</strong>g the coord<strong>in</strong>ates <strong>of</strong> the vertices <strong>of</strong> the w<strong>in</strong>dow boundary, traversed anticlockwise. For<br />

example, the triangle with corners (0,0), (1,0) <strong>and</strong> (0,1) is created by<br />

> Z plot(Z)<br />

Z<br />

Note that polygons should not be closed, i.e. the last vertex should not equal the first<br />

vertex. The same convention is used <strong>in</strong> the st<strong>and</strong>ard plott<strong>in</strong>g function polygon().<br />

If the w<strong>in</strong>dow boundary consists <strong>of</strong> several separate polygons, then p should be a list, each<br />

<strong>of</strong> whose components p[[i]] is a list with components x <strong>and</strong> y describ<strong>in</strong>g one <strong>of</strong> the polygons.<br />

The vertices <strong>of</strong> each polygon should be traversed anticlockwise for external boundaries <strong>and</strong><br />

clockwise for <strong>in</strong>ternal boundaries (holes). For example, the follow<strong>in</strong>g creates a triangle<br />

with a square hole.<br />

> Z plot(Z)<br />

Copyright c○CSIRO 2008


7.1 Mak<strong>in</strong>g w<strong>in</strong>dows 41<br />

Z<br />

Notice that the first boundary polygon is traversed anticlockwise <strong>and</strong> the second clockwise,<br />

because it is a hole.<br />

It is <strong>of</strong>ten useful to plot a polygonal w<strong>in</strong>dow with l<strong>in</strong>e shad<strong>in</strong>g:<br />

> plot(Z, hatch = TRUE)<br />

7.1.4 B<strong>in</strong>ary mask<br />

Z<br />

A w<strong>in</strong>dow may be def<strong>in</strong>ed by a discrete pixel approximation. Type<br />

ow<strong>in</strong>(mask=m, xrange, yrange)<br />

to create the w<strong>in</strong>dow object. Heremshould be a matrix with logical entries; it will be <strong>in</strong>terpreted<br />

as a b<strong>in</strong>ary pixel image whose entries are TRUE where the correspond<strong>in</strong>g pixel belongs to the<br />

w<strong>in</strong>dow.<br />

The rectangle with dimensionsxrange, yrange is divided <strong>in</strong>to equal rectangular pixels. The<br />

correspondence between matrix <strong>in</strong>dicesm[i,j] <strong>and</strong> cartesian coord<strong>in</strong>ates is slightly idiosyncratic:<br />

the rows <strong>of</strong> m correspond to the y coord<strong>in</strong>ate, <strong>and</strong> the columns to the x coord<strong>in</strong>ate. The entry<br />

m[i,j] is TRUE if the <strong>po<strong>in</strong>t</strong> (xx[j],yy[i]) (sic) belongs to the w<strong>in</strong>dow, where xx, yy are<br />

vectors <strong>of</strong> pixel coord<strong>in</strong>ates equally spaced over xrange <strong>and</strong> yrange respectively. The length <strong>of</strong><br />

xx is ncol(m) while the length <strong>of</strong> yy is nrow(m).<br />

In some GIS applications the study region will be given as a b<strong>in</strong>ary pixel image. A safe<br />

strategy is to dump the data from the GIS system to a text file, <strong>and</strong> read the text file <strong>in</strong>to R<br />

us<strong>in</strong>g scan. Then reformat it as a matrix, <strong>and</strong> use ow<strong>in</strong> to create the w<strong>in</strong>dow object.<br />

To convert a rectangle or polygonal w<strong>in</strong>dow to a b<strong>in</strong>ary mask, use as.mask.<br />

Copyright c○CSIRO 2008


42 W<strong>in</strong>dows <strong>in</strong> spatstat<br />

> Z W plot(W)<br />

W<br />

7.2 Convert<strong>in</strong>g from GIS formats<br />

There is a wide variety <strong>of</strong> s<strong>of</strong>tware packages for h<strong>and</strong>l<strong>in</strong>g <strong>spatial</strong> data, especially Geographical<br />

Information Systems (GIS). These packages use many different formats to represent <strong>spatial</strong> data.<br />

Typically spatstat does not support these formats: this would not be good s<strong>of</strong>tware design.<br />

Specialised R packages exist for h<strong>and</strong>l<strong>in</strong>g different <strong>spatial</strong> data file formats. The most useful<br />

ones are rgdal,shapefiles <strong>and</strong> maptools. These packages will make it possible for you to read<br />

your data from a file <strong>in</strong>to an R session. The rgdal package has the most functionality, but can<br />

sometimes be difficult to <strong>in</strong>stall, as it requires <strong>in</strong>stallation <strong>of</strong> an external library on your system.<br />

The packages shapefiles <strong>and</strong> maptools have no such difficulty.<br />

The package sp provides generic support for <strong>spatial</strong> data types <strong>in</strong> R. It enables you to convert<br />

between different representations <strong>of</strong> your data <strong>in</strong> R.<br />

The usual procedure for convert<strong>in</strong>g <strong>spatial</strong> data is:<br />

1. read your data file <strong>in</strong>to R us<strong>in</strong>g a package designed specifically for that file format (e.g.<br />

shapefiles for ESRI shapefiles), convert<strong>in</strong>g it <strong>in</strong>to an R dataset;<br />

2. convert this R dataset <strong>in</strong>to a generic format used by sp;<br />

3. convert the generic sp format to the required spatstat format, us<strong>in</strong>g sp.<br />

For example, if your w<strong>in</strong>dow (<strong>spatial</strong> region) is supplied as an “ESRI shapefile” with a name<br />

like myfile.shp, then type the follow<strong>in</strong>g:<br />

> library(maptools)<br />

> S library(sp)<br />

> SP W


7.3 Functions that return a w<strong>in</strong>dow 43<br />

> S SP P elev W plot(W)<br />

W<br />

The result W is a w<strong>in</strong>dow.<br />

The accompany<strong>in</strong>g dataset bei.extra$grad is a pixel image <strong>of</strong> the slope (gradient) <strong>of</strong> the<br />

terra<strong>in</strong>. To f<strong>in</strong>d the subset where altitude is below 140 <strong>and</strong> slope exceeds 0.1,<br />

> grad V plot(V)<br />

Copyright c○CSIRO 2008


44 W<strong>in</strong>dows <strong>in</strong> spatstat<br />

7.4 Operations on w<strong>in</strong>dows<br />

V<br />

Basic methods for the class "ow<strong>in</strong>" <strong>in</strong>clude<br />

pr<strong>in</strong>t.ow<strong>in</strong> pr<strong>in</strong>t short description <strong>of</strong> a w<strong>in</strong>dow<br />

summary.ow<strong>in</strong> pr<strong>in</strong>t detailed summary <strong>of</strong> a w<strong>in</strong>dow<br />

plot.ow<strong>in</strong> plot a w<strong>in</strong>dow<br />

Numerous geometrical operations are implemented for w<strong>in</strong>dow objects. They <strong>in</strong>clude:<br />

area.ow<strong>in</strong> compute w<strong>in</strong>dow’s area<br />

diameter compute w<strong>in</strong>dow’s diameter<br />

<strong>in</strong>tersect.ow<strong>in</strong> <strong>in</strong>tersection <strong>of</strong> two w<strong>in</strong>dows<br />

union.ow<strong>in</strong> union <strong>of</strong> two w<strong>in</strong>dows<br />

bound<strong>in</strong>g.box F<strong>in</strong>d a tight bound<strong>in</strong>g box for the w<strong>in</strong>dow<br />

complement.ow<strong>in</strong> swap <strong>in</strong>side <strong>and</strong> outside<br />

rotate rotate w<strong>in</strong>dow<br />

shift translate w<strong>in</strong>dow<br />

aff<strong>in</strong>e apply aff<strong>in</strong>e transformation<br />

rescale change scale <strong>and</strong> adjust units<br />

as.mask convert to b<strong>in</strong>ary image mask<br />

dilate.ow<strong>in</strong> morphological dilation<br />

erode.ow<strong>in</strong> morphological erosion<br />

eroded.areas compute areas <strong>of</strong> eroded w<strong>in</strong>dows<br />

<strong>in</strong>side.ow<strong>in</strong> determ<strong>in</strong>e whether a <strong>po<strong>in</strong>t</strong> is <strong>in</strong>side a w<strong>in</strong>dow<br />

distmap.ow<strong>in</strong> distance transform image<br />

centroid.ow<strong>in</strong> compute centroid (centre <strong>of</strong> mass) <strong>of</strong> w<strong>in</strong>dow<br />

is.subset.ow<strong>in</strong> determ<strong>in</strong>e whether one w<strong>in</strong>dow conta<strong>in</strong>s another<br />

7.5 Creat<strong>in</strong>g a <strong>po<strong>in</strong>t</strong> pattern <strong>in</strong> any w<strong>in</strong>dow<br />

As we saw <strong>in</strong> Section 6.1, the function ppp() will create a <strong>po<strong>in</strong>t</strong> pattern (an object <strong>of</strong> class<br />

"ppp") from raw numerical data <strong>in</strong> R.<br />

Suppose the x,y coord<strong>in</strong>ates <strong>of</strong> the <strong>po<strong>in</strong>t</strong>s <strong>of</strong> the pattern are conta<strong>in</strong>ed <strong>in</strong> vectors x <strong>and</strong> y <strong>of</strong><br />

equal length. Then<br />

ppp(x, y, other.arguments)<br />

will create the <strong>po<strong>in</strong>t</strong> pattern. The ‘other arguments’ must determ<strong>in</strong>e a w<strong>in</strong>dow for the pattern,<br />

<strong>in</strong> one <strong>of</strong> two ways:<br />

Copyright c○CSIRO 2008


7.5 Creat<strong>in</strong>g a <strong>po<strong>in</strong>t</strong> pattern <strong>in</strong> any w<strong>in</strong>dow 45<br />

• the other arguments can be passed to ow<strong>in</strong> to determ<strong>in</strong>e a w<strong>in</strong>dow:<br />

ppp(x, y, xrange, yrange) <strong>po<strong>in</strong>t</strong> pattern <strong>in</strong> rectangle<br />

ppp(x, y, poly=p) <strong>po<strong>in</strong>t</strong> pattern <strong>in</strong> polygonal w<strong>in</strong>dow<br />

ppp(x, y, poly=p, xrange, yrange) <strong>po<strong>in</strong>t</strong> pattern <strong>in</strong> polygonal w<strong>in</strong>dow<br />

ppp(x, y, mask=m, xrange, yrange) <strong>po<strong>in</strong>t</strong> pattern <strong>in</strong> b<strong>in</strong>ary mask w<strong>in</strong>dow<br />

• if W is a w<strong>in</strong>dow object (class "ow<strong>in</strong>") then<br />

> ppp(x, y, w<strong>in</strong>dow = W)<br />

will create the <strong>po<strong>in</strong>t</strong> pattern.<br />

You may already have a w<strong>in</strong>dow W (an object <strong>of</strong> class "ow<strong>in</strong>") ready to h<strong>and</strong>, <strong>and</strong> now want<br />

to create a pattern <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>in</strong> this w<strong>in</strong>dow. For example you may want to put a new <strong>po<strong>in</strong>t</strong><br />

pattern <strong>in</strong>side the w<strong>in</strong>dow <strong>of</strong> an exist<strong>in</strong>g <strong>po<strong>in</strong>t</strong> pattern X; the w<strong>in</strong>dow is accessed as X$w<strong>in</strong>dow,<br />

so type<br />

ppp(x, y, w<strong>in</strong>dow=X$w<strong>in</strong>dow)<br />

Copyright c○CSIRO 2008


46 Manipulat<strong>in</strong>g <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

8 Manipulat<strong>in</strong>g <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

Before proceed<strong>in</strong>g, we need to know more about how to manipulate <strong>and</strong> <strong>in</strong>terrogate <strong>po<strong>in</strong>t</strong> pattern<br />

data.<br />

8.1 Format <strong>of</strong> ppp objects<br />

A <strong>po<strong>in</strong>t</strong> pattern is represented <strong>in</strong> spatstat by an object <strong>of</strong> the class "ppp". This conta<strong>in</strong>s the<br />

coord<strong>in</strong>ates <strong>of</strong> the <strong>po<strong>in</strong>t</strong>s, optional ‘mark’ values attached to the <strong>po<strong>in</strong>t</strong>s, <strong>and</strong> a description <strong>of</strong> the<br />

study region or <strong>spatial</strong> ‘w<strong>in</strong>dow’.<br />

8.1.1 Format<br />

A <strong>po<strong>in</strong>t</strong> pattern object P has the follow<strong>in</strong>g components:<br />

• P$n is the number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s (which may be zero).<br />

• P$x is a numeric vector conta<strong>in</strong><strong>in</strong>g the x coord<strong>in</strong>ates <strong>of</strong> the <strong>po<strong>in</strong>t</strong>s. Its length equals P$n<br />

(<strong>and</strong> may be zero).<br />

• P$y is a numeric vector conta<strong>in</strong><strong>in</strong>g the y coord<strong>in</strong>ates <strong>of</strong> the <strong>po<strong>in</strong>t</strong>s. Its length also equals<br />

P$n.<br />

• P$marks conta<strong>in</strong>s the marks. It is either NULL, or a vector <strong>of</strong> length P$n conta<strong>in</strong><strong>in</strong>g the<br />

mark values. The entries <strong>of</strong> P$marks may be <strong>of</strong> any atomic type (character, numeric,<br />

logical, complex).<br />

• P$w<strong>in</strong>dow is an object <strong>of</strong> class"ow<strong>in</strong>" (“observation w<strong>in</strong>dow”) determ<strong>in</strong><strong>in</strong>g the study region<br />

or <strong>spatial</strong> ‘w<strong>in</strong>dow’.<br />

You can extract these components <strong>in</strong>dividually; for example, to make a histogram <strong>of</strong> the<br />

x coord<strong>in</strong>ates just type hist(P$x). However, do not assign values to these components<br />

directly, or you may create <strong>in</strong>consistencies <strong>in</strong> the data which cause spatstat to crash. To<br />

manipulate <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>, use the functions provided.<br />

Although a <strong>po<strong>in</strong>t</strong> pattern should be treated as an unordered set, the coord<strong>in</strong>ates are obviously<br />

stored <strong>in</strong> a particular order, <strong>and</strong> can be addressed us<strong>in</strong>g that order.<br />

> data(longleaf)<br />

> x y diameter cb<strong>in</strong>d(x, y, diameter)[1:5, ]<br />

x y diameter<br />

[1,] 200.0 8.8 32.9<br />

[2,] 199.3 10.0 53.5<br />

[3,] 193.6 22.4 68.0<br />

[4,] 167.7 35.6 17.7<br />

[5,] 183.9 45.4 36.9<br />

If the marks are a categorical variable, then P$marks is a factor.<br />

Copyright c○CSIRO 2008


8.2 Operations on ppp objects 47<br />

> data(chorley)<br />

> x y type data.frame(x, y, type)[55:60, ]<br />

x y type<br />

55 355.6 413.9 larynx<br />

56 355.5 413.9 larynx<br />

57 355.7 413.9 larynx<br />

58 355.6 414.1 larynx<br />

59 359.0 417.3 lung<br />

60 353.1 426.9 lung<br />

> is.factor(type)<br />

[1] TRUE<br />

> levels(type)<br />

[1] "larynx" "lung"<br />

> table(type)<br />

type<br />

larynx lung<br />

58 978<br />

8.1.2 A <strong>po<strong>in</strong>t</strong> pattern needs a w<strong>in</strong>dow<br />

Note especially that, when you create a new <strong>po<strong>in</strong>t</strong> pattern object, you need to specify the <strong>spatial</strong><br />

region or w<strong>in</strong>dow <strong>in</strong> which the pattern was observed. In spatstat, the observation w<strong>in</strong>dow is an<br />

<strong>in</strong>tegral part <strong>of</strong> the <strong>po<strong>in</strong>t</strong> pattern. A <strong>po<strong>in</strong>t</strong> pattern dataset consists <strong>of</strong> knowledge about where<br />

<strong>po<strong>in</strong>t</strong>s were not observed, as well as the locations where they were observed. Even someth<strong>in</strong>g as<br />

simple as estimat<strong>in</strong>g the <strong>in</strong>tensity <strong>of</strong> the pattern depends on the w<strong>in</strong>dow <strong>of</strong> observation. It would<br />

be wrong, or at least different, to analyze a <strong>po<strong>in</strong>t</strong> pattern dataset by “guess<strong>in</strong>g” the appropriate<br />

w<strong>in</strong>dow (e.g. by comput<strong>in</strong>g the convex hull <strong>of</strong> the <strong>po<strong>in</strong>t</strong>s). An analogy may be drawn with the<br />

difference between sequential experiments <strong>and</strong> experiments <strong>in</strong> which the sample size is fixed a<br />

priori.<br />

Often, the w<strong>in</strong>dow <strong>of</strong> observation is a rectangle, so this requirement just means that we have<br />

to specify the x <strong>and</strong> y dimensions <strong>of</strong> the rectangle when we create the <strong>po<strong>in</strong>t</strong> pattern. W<strong>in</strong>dows<br />

with a more complicated shape can easily be represented <strong>in</strong> spatstat, as described below.<br />

For situations where the w<strong>in</strong>dow is really unknown, spatstat provides the function ripras<br />

to compute the Ripley-Rasson estimator <strong>of</strong> the w<strong>in</strong>dow, given only the <strong>po<strong>in</strong>t</strong> locations.<br />

8.2 Operations on ppp objects<br />

Directly manipulat<strong>in</strong>g the entries <strong>in</strong>side an object is not safe. It is also unnecessary, because<br />

these manipulations can be performed us<strong>in</strong>g functions or operators.<br />

For <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> (objects <strong>of</strong> class "ppp") there are the follow<strong>in</strong>g operations.<br />

Copyright c○CSIRO 2008


48 Manipulat<strong>in</strong>g <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

8.2.1 Extract<strong>in</strong>g subsets<br />

Recall that <strong>in</strong> R the subset operator is [ ]. If x is a vector <strong>of</strong> numbers, then x[s] extracts an<br />

element or subset <strong>of</strong> x. The subset <strong>in</strong>dex s can be<br />

• a positive <strong>in</strong>teger: x[3] means the third element <strong>of</strong> x;<br />

• a vector <strong>of</strong> positive <strong>in</strong>tegers <strong>in</strong>dicat<strong>in</strong>g which elements to extract: x[c(2,4,6)] extracts<br />

the 2nd, 4th <strong>and</strong> 6th elements <strong>of</strong> x;<br />

• a vector <strong>of</strong> negative <strong>in</strong>tegers <strong>in</strong>dicat<strong>in</strong>g which elements not to extract: x[-1] means all<br />

elements <strong>of</strong> x except the first one;<br />

• a vector <strong>of</strong> logical values, <strong>of</strong> the same length as x, with each TRUE entry <strong>of</strong> s <strong>in</strong>dicat<strong>in</strong>g<br />

that the correspond<strong>in</strong>g entry <strong>of</strong> x should be extracted, <strong>and</strong> FALSE <strong>in</strong>dicat<strong>in</strong>g that it should<br />

not be extracted. For example x[x > 3.1] extracts those elements <strong>of</strong> x which are greater<br />

than 3.1.<br />

To extract a subset <strong>of</strong> a <strong>po<strong>in</strong>t</strong> pattern <strong>in</strong> spatstat, we also use the subset operator [ ]. If<br />

X is a <strong>po<strong>in</strong>t</strong> pattern then X[s] is also a <strong>po<strong>in</strong>t</strong> pattern, consist<strong>in</strong>g <strong>of</strong> those <strong>po<strong>in</strong>t</strong>s <strong>of</strong> X selected by<br />

the subset <strong>in</strong>dex s, where s can be any <strong>of</strong> the three types listed above, (Recall that the <strong>po<strong>in</strong>t</strong>s<br />

<strong>in</strong> a <strong>po<strong>in</strong>t</strong> pattern object are stored <strong>in</strong> a particular order; this is the order <strong>in</strong> which they are<br />

<strong>in</strong>dexed by s.)<br />

> data(bei)<br />

> bei<br />

planar <strong>po<strong>in</strong>t</strong> pattern: 3604 <strong>po<strong>in</strong>t</strong>s<br />

w<strong>in</strong>dow: rectangle = [0, 1000] x [0, 500] metres<br />

> bei[1:10]<br />

planar <strong>po<strong>in</strong>t</strong> pattern: 10 <strong>po<strong>in</strong>t</strong>s<br />

w<strong>in</strong>dow: rectangle = [0, 1000] x [0, 500] metres<br />

It is also possible to extract the subset def<strong>in</strong>ed by a <strong>spatial</strong> region. If X is a <strong>po<strong>in</strong>t</strong> pattern<br />

<strong>and</strong> W is a <strong>spatial</strong> w<strong>in</strong>dow (object <strong>of</strong> class "ow<strong>in</strong>") then X[W] is the <strong>po<strong>in</strong>t</strong> pattern consist<strong>in</strong>g <strong>of</strong><br />

all <strong>po<strong>in</strong>t</strong>s <strong>of</strong> X that lie <strong>in</strong>side W.<br />

> W W<br />

w<strong>in</strong>dow: rectangle = [100, 800] x [100, 400] units<br />

> bei[W]<br />

planar <strong>po<strong>in</strong>t</strong> pattern: 918 <strong>po<strong>in</strong>t</strong>s<br />

w<strong>in</strong>dow: rectangle = [100, 800] x [100, 400] units<br />

Tip: You may need to put quotes around the subset operator <strong>in</strong> some contexts.<br />

The generic subset operator is [ but the help file is summoned by typ<strong>in</strong>g help("[").<br />

The subset method for <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> is called [.ppp but the help file is summoned<br />

by typ<strong>in</strong>g help("[.ppp").<br />

The comm<strong>and</strong> split.ppp allows you to divide a <strong>po<strong>in</strong>t</strong> pattern <strong>in</strong>to sub-<strong>patterns</strong>, <strong>and</strong> the<br />

comm<strong>and</strong> by.ppp allows you to perform an operation on each sub-pattern.<br />

Copyright c○CSIRO 2008


8.2 Operations on ppp objects 49<br />

8.2.2 Fiddl<strong>in</strong>g with marks<br />

To extract the marks from a <strong>po<strong>in</strong>t</strong> pattern, use marks:<br />

> m radii X X<br />

marked planar <strong>po<strong>in</strong>t</strong> pattern: 62 <strong>po<strong>in</strong>t</strong>s<br />

marks are numeric, <strong>of</strong> type ’double’<br />

w<strong>in</strong>dow: rectangle = [0, 1] x [-1, 0] units<br />

> unmark(X)<br />

planar <strong>po<strong>in</strong>t</strong> pattern: 62 <strong>po<strong>in</strong>t</strong>s<br />

w<strong>in</strong>dow: rectangle = [0, 1] x [-1, 0] units<br />

For a <strong>po<strong>in</strong>t</strong> pattern with real-valued marks, the method cut.ppp for the generic function<br />

cut will divide the range <strong>of</strong> mark values <strong>in</strong>to several discrete b<strong>and</strong>s, yield<strong>in</strong>g a <strong>po<strong>in</strong>t</strong> pattern<br />

with categorical marks:<br />

> Y Y Y<br />

marked planar <strong>po<strong>in</strong>t</strong> pattern: 62 <strong>po<strong>in</strong>t</strong>s<br />

multitype, with levels = (0,1] (1,10] (10,Inf]<br />

w<strong>in</strong>dow: rectangle = [0, 1] x [-1, 0] units<br />

8.2.3 Chang<strong>in</strong>g scales <strong>and</strong> units<br />

A scalar dilation can be applied us<strong>in</strong>g aff<strong>in</strong>e. For example, the Swedish P<strong>in</strong>es data were<br />

recorded <strong>in</strong> decimetres. To convert the coord<strong>in</strong>ates to metres, we could type<br />

> data(swedishp<strong>in</strong>es)<br />

> X unitname(X) X<br />

Copyright c○CSIRO 2008


50 Manipulat<strong>in</strong>g <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

planar <strong>po<strong>in</strong>t</strong> pattern: 71 <strong>po<strong>in</strong>t</strong>s<br />

w<strong>in</strong>dow: rectangle = [0, 9.6] x [0, 10] metres<br />

The comm<strong>and</strong> rescale performs the same function:<br />

> data(swedishp<strong>in</strong>es)<br />

> X X<br />

planar <strong>po<strong>in</strong>t</strong> pattern: 71 <strong>po<strong>in</strong>t</strong>s<br />

w<strong>in</strong>dow: rectangle = [0, 9.6] x [0, 10] metres<br />

Beware that this does not change the marks <strong>in</strong> the <strong>po<strong>in</strong>t</strong> pattern. If your marks represent<br />

tree diameter <strong>and</strong> you want to rescale them as well, this must be done by h<strong>and</strong>.<br />

8.2.4 Geometrical transformations<br />

The comm<strong>and</strong>s rotate, shift <strong>and</strong> aff<strong>in</strong>e apply two-dimensional rotation, vector shifts, <strong>and</strong><br />

aff<strong>in</strong>e transformations, respectively.<br />

8.2.5 R<strong>and</strong>om perturbations <strong>of</strong> a <strong>po<strong>in</strong>t</strong> pattern<br />

It is sometimes useful to r<strong>and</strong>omise the data, for example for hypothesis test<strong>in</strong>g. The comm<strong>and</strong><br />

rshift will apply the same r<strong>and</strong>om shift to each <strong>po<strong>in</strong>t</strong>, while rjitter will apply a different<br />

r<strong>and</strong>om shift to each <strong>po<strong>in</strong>t</strong>. The comm<strong>and</strong> quadratresample performs a block resampl<strong>in</strong>g<br />

procedure <strong>in</strong> which the w<strong>in</strong>dow is divided <strong>in</strong>to rectangles <strong>and</strong> these rectangles are r<strong>and</strong>omly<br />

resampled.<br />

8.3 Example<br />

We will use one <strong>of</strong> the st<strong>and</strong>ard <strong>po<strong>in</strong>t</strong> pattern datasets that is <strong>in</strong>stalled with the package. The<br />

NZ trees dataset represent the positions <strong>of</strong> 86 trees <strong>in</strong> a forest plot 153 by 95 feet.<br />

> data(nztrees)<br />

> nztrees<br />

planar <strong>po<strong>in</strong>t</strong> pattern: 86 <strong>po<strong>in</strong>t</strong>s<br />

w<strong>in</strong>dow: rectangle = [0, 153] x [0, 95] feet<br />

> plot(nztrees)<br />

Copyright c○CSIRO 2008<br />

nztrees


0 2 4 6 8 10<br />

8.3 Example 51<br />

To get an impression <strong>of</strong> local <strong>spatial</strong> variations <strong>in</strong> <strong>in</strong>tensity, we plot a kernel density estimate<br />

<strong>of</strong> <strong>in</strong>tensity.<br />

> contour(density(nztrees, 10), axes = FALSE)<br />

0.01<br />

0.002<br />

0.004<br />

0.008<br />

0.01<br />

density(nztrees, 10)<br />

0.006<br />

0.004<br />

0.004<br />

0.006<br />

0.008 0.008<br />

0.01<br />

0.004<br />

0.012<br />

0.01<br />

0.018<br />

0.014<br />

0.008<br />

The density surface has a steep slope at the top right-h<strong>and</strong> corner <strong>of</strong> the study region.<br />

Look<strong>in</strong>g at the plot <strong>of</strong> the <strong>po<strong>in</strong>t</strong> pattern itself, we can see a cluster <strong>of</strong> trees at the top right.<br />

You may also notice a l<strong>in</strong>e <strong>of</strong> trees at the right-h<strong>and</strong> edge <strong>of</strong> the study region. It looks<br />

as though the study region may have <strong>in</strong>cluded some trees that were planted as a boundary or<br />

avenue. This sticks out like a sore thumb if we plot the x coord<strong>in</strong>ates <strong>of</strong> the trees:<br />

> hist(nztrees$x, nclass = 25)<br />

Histogram <strong>of</strong> nztrees$x<br />

We might want to exclude the right-h<strong>and</strong> boundary from the study region, to focus on the<br />

pattern <strong>of</strong> the rema<strong>in</strong><strong>in</strong>g trees. Let’s say we decide to trim a 5-foot marg<strong>in</strong> <strong>of</strong>f the right-h<strong>and</strong><br />

side.<br />

First we create the new, trimmed study region:<br />

> chopped w<strong>in</strong> chopped chopped<br />

w<strong>in</strong>dow: rectangle = [0, 148] x [0, 95] feet<br />

0.008<br />

Copyright c○CSIRO 2008


52 Manipulat<strong>in</strong>g <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

(Notice that chopped is not a <strong>po<strong>in</strong>t</strong> pattern, but simply a rectangle <strong>in</strong> the plane.)<br />

Then, us<strong>in</strong>g the subset operator [.ppp, we simply extract the subset <strong>of</strong> the orig<strong>in</strong>al <strong>po<strong>in</strong>t</strong><br />

pattern that lies <strong>in</strong>side the new w<strong>in</strong>dow:<br />

> nzchop summary(nzchop)<br />

Planar <strong>po<strong>in</strong>t</strong> pattern: 78 <strong>po<strong>in</strong>t</strong>s<br />

Average <strong>in</strong>tensity 0.00555 <strong>po<strong>in</strong>t</strong>s per square foot<br />

W<strong>in</strong>dow: rectangle = [0, 148] x [0, 95] feet<br />

W<strong>in</strong>dow area = 14060 square feet<br />

Unit <strong>of</strong> length: 1 foot<br />

> plot(density(nzchop, 10))<br />

> plot(nzchop, add = TRUE)<br />

density(nzchop, 10)<br />

Remov<strong>in</strong>g the right marg<strong>in</strong> seems to have produced a much more uniform pattern.<br />

8.4 Splitt<strong>in</strong>g <strong>and</strong> comb<strong>in</strong><strong>in</strong>g <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

Sometimes it is useful to split a <strong>po<strong>in</strong>t</strong> pattern dataset <strong>in</strong>to several sub-<strong>patterns</strong>, <strong>and</strong> perform<br />

some calculations on each sub-pattern.<br />

8.4.1 Splitt<strong>in</strong>g a <strong>po<strong>in</strong>t</strong> pattern <strong>in</strong>to sub-<strong>patterns</strong><br />

The powerful R comm<strong>and</strong> split has a method for <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>. This enables the user to<br />

divide a <strong>po<strong>in</strong>t</strong> pattern <strong>in</strong>to sub-<strong>patterns</strong> us<strong>in</strong>g any suitable criterion.<br />

• If X is a marked <strong>po<strong>in</strong>t</strong> pattern, <strong>and</strong> the marks are a factor, then split(X) separates the<br />

data <strong>po<strong>in</strong>t</strong>s <strong>in</strong>to different <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> accord<strong>in</strong>g to their mark value.<br />

• If Z is a pixel image with factor values, then split(X,Z) separates the data <strong>po<strong>in</strong>t</strong>s <strong>in</strong>to<br />

different <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> accord<strong>in</strong>g to the pixel value <strong>of</strong> Z at each <strong>po<strong>in</strong>t</strong>.<br />

• If Z is a tessellation, then split(X,Z) separates the <strong>po<strong>in</strong>t</strong> pattern X <strong>in</strong>to sub-<strong>patterns</strong><br />

del<strong>in</strong>eated by the tiles <strong>of</strong> Z.<br />

Copyright c○CSIRO 2008<br />

0.005 0.01 0.015 0.02


8.4 Splitt<strong>in</strong>g <strong>and</strong> comb<strong>in</strong><strong>in</strong>g <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> 53<br />

In each case the result is a list <strong>of</strong> <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>. You can then use the R comm<strong>and</strong> lapply<br />

to perform any desired operation on each element <strong>of</strong> the list. For example, to apply adaptive<br />

estimation <strong>of</strong> <strong>in</strong>tensity to each species <strong>of</strong> tree <strong>in</strong> the Lans<strong>in</strong>g Woods data,<br />

> data(lans<strong>in</strong>g)<br />

> V A plot(as.list<strong>of</strong>(A))<br />

A neater way to operate on sub-<strong>patterns</strong> is to use by.ppp, a method for the R function<br />

by. The call by(X, INDICES=Z, FUN=f) is essentially equivalent to lapply(split(X,Z), f).<br />

It splits the dataset X <strong>in</strong>to sub-<strong>patterns</strong> accord<strong>in</strong>g to Z, then applies the function f to each<br />

sub-pattern. So to apply adaptive estimation <strong>of</strong> <strong>in</strong>tensity to each species <strong>of</strong> tree <strong>in</strong> the Lans<strong>in</strong>g<br />

Woods data,<br />

> data(lans<strong>in</strong>g)<br />

> A plot(A)<br />

8.4.2 Comb<strong>in</strong><strong>in</strong>g <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

Any number <strong>of</strong> <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> can be comb<strong>in</strong>ed to make a s<strong>in</strong>gle pattern, us<strong>in</strong>g superimpose.<br />

> X Y superimpose(X, Y)<br />

planar <strong>po<strong>in</strong>t</strong> pattern: 30 <strong>po<strong>in</strong>t</strong>s<br />

w<strong>in</strong>dow: rectangle = [0, 1] x [0, 1] units<br />

The argument W, if given, specifies the w<strong>in</strong>dow for the comb<strong>in</strong>ed <strong>po<strong>in</strong>t</strong> pattern.<br />

> superimpose(X, Y, W = square(2))<br />

planar <strong>po<strong>in</strong>t</strong> pattern: 30 <strong>po<strong>in</strong>t</strong>s<br />

w<strong>in</strong>dow: rectangle = [0, 2] x [0, 2] units<br />

To attach a separate mark to each component pattern, use argument names:<br />

> superimpose(Hooray = X, Boo = Y)<br />

marked planar <strong>po<strong>in</strong>t</strong> pattern: 30 <strong>po<strong>in</strong>t</strong>s<br />

multitype, with levels = Hooray Boo<br />

w<strong>in</strong>dow: rectangle = [0, 1] x [0, 1] units<br />

Copyright c○CSIRO 2008


54 Pixel images <strong>in</strong> spatstat<br />

9 Pixel images <strong>in</strong> spatstat<br />

An object <strong>of</strong> class "im" represents a pixel image. It specifies a rectangular grid <strong>of</strong> locations<br />

(“pixels”) <strong>in</strong> two dimensional space, <strong>and</strong> a numerical value for each pixel. The pixel values<br />

can be real numbers, <strong>in</strong>tegers, complex numbers, s<strong>in</strong>gle characters or str<strong>in</strong>gs, logical values or<br />

categorical values. A pixel’s value can also be NA, mean<strong>in</strong>g that it is not def<strong>in</strong>ed at that location.<br />

A pixel image represents a <strong>spatial</strong> function Z(u) <strong>in</strong> many different contexts. It may conta<strong>in</strong><br />

experimental data (such as a map <strong>of</strong> terra<strong>in</strong> elevation) or computed values (such as a kernel<br />

estimate <strong>of</strong> <strong>po<strong>in</strong>t</strong> process <strong>in</strong>tensity) or it may be directly obta<strong>in</strong>ed from a camera (such as a<br />

satellite image).<br />

9.1 Creat<strong>in</strong>g a pixel image<br />

9.1.1 Creat<strong>in</strong>g an image from raw data<br />

To create a pixel image from raw data, use im:<br />

> im(mat, xcol, yrow)<br />

where mat is a matrix conta<strong>in</strong><strong>in</strong>g the pixel values. The pixel values could have been generated<br />

by h<strong>and</strong>, or read from a file.<br />

The correspondence between matrix <strong>in</strong>dices mat[i,j] <strong>and</strong> cartesian coord<strong>in</strong>ates is slightly<br />

idiosyncratic: the rows <strong>of</strong>mcorrespond to the y coord<strong>in</strong>ate, <strong>and</strong> the columns to the x coord<strong>in</strong>ate.<br />

The argument xcol is a vector <strong>of</strong> equally-spaced x coord<strong>in</strong>ate values correspond<strong>in</strong>g to the<br />

columns <strong>of</strong> mat, <strong>and</strong> yrow is a vector <strong>of</strong> equally-spaced y coord<strong>in</strong>ate values correspond<strong>in</strong>g to<br />

the rows <strong>of</strong> mat. These vectors determ<strong>in</strong>e the <strong>spatial</strong> position <strong>of</strong> the pixel grid. The length <strong>of</strong><br />

xcol is ncol(mat) while the length <strong>of</strong> yrow is nrow(mat). If mat is not a matrix, it will be<br />

converted <strong>in</strong>to a matrix with nrow(mat) = length(yrow) <strong>and</strong> ncol(mat) = length(xcol).<br />

> vec mat noisy plot(noisy)<br />

noisy<br />

−6 −4 −2 0 2 4 6<br />

For some strange reason, R does not allow matrices with categorical (factor) values. To<br />

create a pixel image with categorical values, leave the pixel values as a vector. The im comm<strong>and</strong><br />

will reshape it:<br />

Copyright c○CSIRO 2008


9.1 Creat<strong>in</strong>g a pixel image 55<br />

> cutvec cutnoise plot(cutnoise)<br />

cutnoise<br />

(−7.09,−1.98] (−1.98,3.14] (3.14,8.25]<br />

Although mat was a matrix, cutvec is a vector, with factor values. F<strong>in</strong>ally cutnoise is a<br />

factor-valued image.<br />

9.1.2 Convert<strong>in</strong>g a function to an image<br />

The comm<strong>and</strong> as.im will convert other types <strong>of</strong> data to a pixel image.<br />

A function f(x,y) can be converted <strong>in</strong>to a pixel image. This makes it easy to create a pixel<br />

image <strong>in</strong> which the pixel values are def<strong>in</strong>ed by an algebraic formula <strong>in</strong> the x <strong>and</strong> y coord<strong>in</strong>ates.<br />

> f w Z


56 Pixel images <strong>in</strong> spatstat<br />

as.im converts other data to a pixel image<br />

density.ppp kernel smooth<strong>in</strong>g <strong>of</strong> <strong>po<strong>in</strong>t</strong> pattern<br />

density.psp kernel smooth<strong>in</strong>g <strong>of</strong> l<strong>in</strong>e segment pattern<br />

distmap.ow<strong>in</strong> distance function <strong>of</strong> w<strong>in</strong>dow<br />

distmap.ppp distance function <strong>of</strong> <strong>po<strong>in</strong>t</strong> pattern<br />

distmap.psp distance function <strong>of</strong> l<strong>in</strong>e segment pattern<br />

setcov geometric covariance function <strong>of</strong> a w<strong>in</strong>dow<br />

predict.ppm fitted <strong>in</strong>tensity <strong>of</strong> a <strong>po<strong>in</strong>t</strong> process model<br />

[.im subset <strong>of</strong> an image (or look up pixel values)<br />

shift.im vector shift <strong>of</strong> image doma<strong>in</strong><br />

rescale.im rescal<strong>in</strong>g <strong>of</strong> image doma<strong>in</strong><br />

eval.im evaluate any expression <strong>in</strong>volv<strong>in</strong>g images<br />

cut.im convert numeric image to factor image<br />

split.im divide pixel image <strong>in</strong>to sub-images<br />

by.im apply function to subsets <strong>of</strong> pixel image<br />

<strong>in</strong>terp.im <strong>spatial</strong> <strong>in</strong>terpolation <strong>of</strong> image<br />

blur <strong>spatial</strong> blurr<strong>in</strong>g <strong>and</strong> extrapolation <strong>of</strong> image<br />

9.2 Inspect<strong>in</strong>g an image<br />

9.2.1 Plott<strong>in</strong>g an image<br />

Methods for plott<strong>in</strong>g an image object <strong>in</strong>clude:<br />

plot.im display as colour image<br />

contour.im contour plot<br />

persp.im perspective plot <strong>of</strong> surface<br />

These are methods for generic functions, so you would typeplot(Z),contour(Z) orpersp(Z)<br />

to display a pixel image Z.<br />

> opa data(redwood)<br />

> D plot(D)<br />

> persp(D)<br />

> contour(D)<br />

> par(opa)<br />

D<br />

0 20 40 60 80 100 120<br />

D<br />

y<br />

D<br />

x<br />

For plot.im, note that the default colour map for image plots <strong>in</strong> R has only 12 colours<br />

<strong>and</strong> can convey a mislead<strong>in</strong>g impression <strong>of</strong> the gradation <strong>of</strong> pixel values <strong>in</strong> the image. Use the<br />

argument col to control the colour map.<br />

Copyright c○CSIRO 2008<br />

−1.0 −0.8 −0.6 −0.4 −0.2 0.0<br />

50<br />

30<br />

10<br />

20<br />

40<br />

20<br />

40<br />

60<br />

90<br />

110<br />

120<br />

100<br />

60<br />

30<br />

70<br />

110<br />

100<br />

80<br />

50<br />

70<br />

D<br />

70<br />

60<br />

70<br />

100<br />

80<br />

110<br />

30<br />

120<br />

40<br />

90<br />

50<br />

40<br />

50<br />

60<br />

70<br />

80


9.2 Inspect<strong>in</strong>g an image 57<br />

> opa plot(Z)<br />

> plot(Z, col = grey(seq(1, 0, length = 512)))<br />

> par(opa)<br />

Z<br />

0 0.5 1 1.5<br />

For persp.im, see also the help for persp.default for the names <strong>of</strong> various arguments to<br />

control the appearance <strong>of</strong> the plot. For example, the view<strong>in</strong>g direction is controlled by the angles<br />

theta <strong>and</strong> phi.<br />

> persp(density(redwood), theta = 30)<br />

density(redwood)<br />

x<br />

density(redwood)<br />

y<br />

Similarly for contour.im, consult also the help file for contour.default to control the<br />

appearance <strong>of</strong> the contours.<br />

For some <strong>in</strong>spir<strong>in</strong>g examples <strong>of</strong> perspective <strong>and</strong> contour plots with beautiful colour schemes<br />

<strong>and</strong> shad<strong>in</strong>g, see the R graphics demonstration by typ<strong>in</strong>g demo(graphics).<br />

Z<br />

0 0.5 1 1.5<br />

Copyright c○CSIRO 2008


58 Pixel images <strong>in</strong> spatstat<br />

9.2.2 Exploratory analysis<br />

To <strong>in</strong>spect an image, the follow<strong>in</strong>g are useful.<br />

as.matrix extract matrix <strong>of</strong> pixel values from image<br />

cut.im convert numeric image to factor image<br />

hist.im histogram <strong>of</strong> pixel values<br />

For an image Z with any type <strong>of</strong> values, plot(cut(Z, 3)) will divide the pixel values <strong>in</strong>to 3<br />

b<strong>and</strong>s, <strong>and</strong> display the image with the 3 b<strong>and</strong>s rendered <strong>in</strong> 3 different colours.<br />

To compute numerical summaries <strong>of</strong> pixel values, like the median or order statistics <strong>of</strong> the<br />

pixel values, extract the pixel values us<strong>in</strong>g as.matrix(Z) then apply the summary operation.<br />

9.3 Manipulat<strong>in</strong>g images<br />

9.3.1 Subsets <strong>of</strong> an image<br />

The subset operator [ has a method for pixel images, [.im:<br />

> X[S]<br />

> X[S, drop = TRUE]<br />

The subset to be extracted is determ<strong>in</strong>ed by the <strong>in</strong>dex argument S.<br />

• If S is a <strong>po<strong>in</strong>t</strong> pattern, or a list(x,y), then the values <strong>of</strong> the pixel image X at these <strong>po<strong>in</strong>t</strong>s<br />

are extracted, <strong>and</strong> returned as a vector.<br />

• If S is a w<strong>in</strong>dow (an object <strong>of</strong> class "ow<strong>in</strong>"), the values <strong>of</strong> the image <strong>in</strong>side this w<strong>in</strong>dow<br />

are extracted. The result is a pixel image if possible, <strong>and</strong> a numeric vector otherwise (see<br />

help("[.im") for details).<br />

• If S is a pixel image with logical values, it is <strong>in</strong>terpreted as a w<strong>in</strong>dow (with TRUE <strong>in</strong>side<br />

the w<strong>in</strong>dow).<br />

The logical argument drop determ<strong>in</strong>es whether pixel values that are undef<strong>in</strong>ed are omitted<br />

(drop = TRUE) or returned as the value NA (drop=FALSE).<br />

See help("[.im") for full details.<br />

The subset operator can be used to look up the value <strong>of</strong> a pixel image at a s<strong>in</strong>gle <strong>po<strong>in</strong>t</strong>:<br />

> data(bei)<br />

> elev elev[list(x = 142, y = 356)]<br />

[1] 147.08<br />

or to display a subregion:<br />

> S plot(elev[S])<br />

Copyright c○CSIRO 2008


9.3 Manipulat<strong>in</strong>g images 59<br />

elev[S]<br />

140.5 141 141.5 142 142.5 143 143.5<br />

This can even be performed <strong>in</strong>teractively, us<strong>in</strong>g the R function locator to click on a <strong>po<strong>in</strong>t</strong><br />

<strong>in</strong> the w<strong>in</strong>dow:<br />

> elev[locator(1)]<br />

9.3.2 Computation with images<br />

The h<strong>and</strong>y function eval.im allows us to perform pixel-by-pixel calculations on an image or on<br />

several compatible images.<br />

If Z is a pixel image, to take the logarithm <strong>of</strong> each pixel value,<br />

> logZ C W V 3)<br />

> U 3, 42, Z))<br />

Other functions which manipulate images <strong>in</strong>clude the follow<strong>in</strong>g:<br />

shift.im vector shift <strong>of</strong> an image<br />

cut.im convert numeric image to factor image<br />

split.im divide pixel image <strong>in</strong>to sub-images<br />

by.im apply function to subsets <strong>of</strong> pixel image<br />

<strong>in</strong>terp.im <strong>spatial</strong>ly <strong>in</strong>terpolate an image<br />

levelset threshold an image (produces a w<strong>in</strong>dow)<br />

solutionset f<strong>in</strong>d the region where a statement is true (produces a w<strong>in</strong>dow)<br />

Copyright c○CSIRO 2008


60 Tessellations<br />

10 Tessellations<br />

A “tessellation” is a division <strong>of</strong> space <strong>in</strong>to non-overlapp<strong>in</strong>g regions (“tiles”).<br />

Tessellation<br />

Tessellations have several uses <strong>in</strong> spatstat. The tessellation may be ‘real’, for example,<br />

a cont<strong>in</strong>ent divided <strong>in</strong>to states or prov<strong>in</strong>ces. The tessellation may be completely artificial, for<br />

example, the rectangular quadrats which we use <strong>in</strong> quadrat count<strong>in</strong>g. Or the tessellation may<br />

be computed from other data, for example, the Dirichlet tessellation def<strong>in</strong>ed by a set <strong>of</strong> <strong>po<strong>in</strong>t</strong>s.<br />

10.1 Creat<strong>in</strong>g a tessellation<br />

An object <strong>of</strong> class "tess" represents a tessellation. Currently spatstat supports three k<strong>in</strong>ds <strong>of</strong><br />

tessellations:<br />

• rectangular tessellations <strong>in</strong> which the tiles are rectangles with sides parallel to the<br />

coord<strong>in</strong>ate axes;<br />

• tile lists, tessellations consist<strong>in</strong>g <strong>of</strong> a list <strong>of</strong> w<strong>in</strong>dows, usually polygonal w<strong>in</strong>dows;<br />

• pixellated tessellations, <strong>in</strong> which space is divided <strong>in</strong>to pixels <strong>and</strong> each tile occupies a<br />

subset <strong>of</strong> the pixel grid.<br />

Copyright c○CSIRO 2008<br />

rectangular list<br />

pixel<br />

a b c d e f g h i j k


10.2 Computed tessellations 61<br />

All three types <strong>of</strong> tessellation can be created by the comm<strong>and</strong> tess.<br />

To create a rectangular tessellation:<br />

> tess(xgrid = xg, ygrid = yg)<br />

where xg <strong>and</strong> yg are vectors <strong>of</strong> coord<strong>in</strong>ates <strong>of</strong> vertical <strong>and</strong> horizontal l<strong>in</strong>es determ<strong>in</strong><strong>in</strong>g a<br />

grid <strong>of</strong> rectangles. Alternatively, if you want to divide a rectangular w<strong>in</strong>dow W <strong>in</strong>to rectangles<br />

<strong>of</strong> equal size, you can type<br />

> quadrats(W, nx, ny)<br />

wherenx,ny are the numbers <strong>of</strong> rectangles <strong>in</strong> the x <strong>and</strong> y directions, respectively. A common<br />

use <strong>of</strong> this comm<strong>and</strong> is to create quadrats for a quadrat-count<strong>in</strong>g method.<br />

To create a tessellation from a list <strong>of</strong> w<strong>in</strong>dows,<br />

> tess(tiles = z)<br />

wherezis a list <strong>of</strong> objects <strong>of</strong> class "ow<strong>in</strong>". The w<strong>in</strong>dows should not be overlapp<strong>in</strong>g; currently<br />

spatstat does not check this. This comm<strong>and</strong> is commonly used when the study region is divided<br />

<strong>in</strong>to adm<strong>in</strong>istrative regions (states, départements, postcodes, counties) <strong>and</strong> the boundaries <strong>of</strong><br />

each sub-region are provided by GIS data files.<br />

To create a tessellation from a pixel image,<br />

> tess(image = Z)<br />

where Z is a pixel image with factor values. Each level <strong>of</strong> the factor represents a different tile<br />

<strong>of</strong> the tessellation. The pixels that have a particular value <strong>of</strong> the factor constitute a tile. This<br />

comm<strong>and</strong> is <strong>of</strong>ten used to separate the l<strong>and</strong>cover types <strong>in</strong> a l<strong>and</strong>cover image (a pixel image <strong>in</strong><br />

which each pixel is labelled by the type <strong>of</strong> vegetation or l<strong>and</strong> use at that location) <strong>in</strong>to different<br />

regions.<br />

The comm<strong>and</strong> as.tess can also be used to convert other types <strong>of</strong> data to a tessellation.<br />

10.2 Computed tessellations<br />

There are two comm<strong>and</strong>s which compute a tessellation from a <strong>po<strong>in</strong>t</strong> pattern.<br />

The comm<strong>and</strong> dirichlet(X) computes the Dirichlet tessellation or Voronoi tessellation <strong>of</strong><br />

the <strong>po<strong>in</strong>t</strong> pattern X. The tile associated with a given <strong>po<strong>in</strong>t</strong> <strong>of</strong> the pattern X is the region <strong>of</strong> space<br />

which is closer to that <strong>po<strong>in</strong>t</strong> than to any other <strong>po<strong>in</strong>t</strong> <strong>of</strong> X. The Dirichlet tiles are polygons. The<br />

comm<strong>and</strong> dirichlet(X) computes these polygons <strong>and</strong> <strong>in</strong>tersects them with the w<strong>in</strong>dow <strong>of</strong> X.<br />

> X plot(dirichlet(X))<br />

Copyright c○CSIRO 2008


62 Tessellations<br />

dirichlet(X)<br />

The comm<strong>and</strong> delaunay(X) computes the Delaunay triangulation <strong>of</strong> the <strong>po<strong>in</strong>t</strong> pattern X.<br />

Strictly speak<strong>in</strong>g this is not a tessellation but a network or graph, formed by jo<strong>in</strong><strong>in</strong>g some <strong>of</strong> the<br />

<strong>po<strong>in</strong>t</strong>s <strong>of</strong> X by straight l<strong>in</strong>es. Two <strong>po<strong>in</strong>t</strong>s <strong>of</strong> X are jo<strong>in</strong>ed if their Dirichlet tiles share a common<br />

edge. The result<strong>in</strong>g network forms a set <strong>of</strong> non-overlapp<strong>in</strong>g triangles. These triangles cover the<br />

convex hull <strong>of</strong> X rather than the entire w<strong>in</strong>dow <strong>of</strong> X.<br />

> plot(delaunay(X))<br />

delaunay(X)<br />

10.3 Operations <strong>in</strong>volv<strong>in</strong>g a tessellation<br />

There are methods for pr<strong>in</strong>t, plot <strong>and</strong> [ for tessellations.<br />

Use the comm<strong>and</strong> tiles to extract a list <strong>of</strong> the tiles <strong>in</strong> a tessellation. The result is a list<br />

<strong>of</strong> w<strong>in</strong>dows ("ow<strong>in</strong>" objects). This can be h<strong>and</strong>y if, for example, you want to compute some<br />

characteristic <strong>of</strong> the tiles <strong>in</strong> a tessellation, such as their areas or diameters:<br />

> X V U unlist(lapply(U, area.ow<strong>in</strong>))<br />

Copyright c○CSIRO 2008


10.3 Operations <strong>in</strong>volv<strong>in</strong>g a tessellation 63<br />

Tile 1 Tile 2 Tile 3 Tile 4 Tile 5 Tile 6 Tile 7<br />

0.08923453 0.18978755 0.06111215 0.11110646 0.07150611 0.05885538 0.05563879<br />

Tile 8 Tile 9 Tile 10<br />

0.19671046 0.08942098 0.07662757<br />

Tessellations can be used to classify the <strong>po<strong>in</strong>t</strong>s <strong>of</strong> a <strong>po<strong>in</strong>t</strong> pattern, <strong>in</strong> split.ppp, cut.ppp<br />

<strong>and</strong> by.ppp. If X is a <strong>po<strong>in</strong>t</strong> pattern <strong>and</strong> V is a tessellation, then<br />

• cut(X,V) attaches marks to the <strong>po<strong>in</strong>t</strong>s <strong>of</strong> X identify<strong>in</strong>g which tile <strong>of</strong> V each <strong>po<strong>in</strong>t</strong> falls <strong>in</strong>to;<br />

• split(X,V) divides the <strong>po<strong>in</strong>t</strong> pattern <strong>in</strong>to sub-<strong>patterns</strong> accord<strong>in</strong>g to the tiles <strong>of</strong> V, <strong>and</strong><br />

returns a list <strong>of</strong> the sub-<strong>patterns</strong>;<br />

• by(X,V,FUN) divides the <strong>po<strong>in</strong>t</strong> pattern <strong>in</strong>to sub-<strong>patterns</strong> accord<strong>in</strong>g to the tiles <strong>of</strong> V, applies<br />

the function FUN to each sub-pattern, <strong>and</strong> returns the results as a list.<br />

> par(mfrow = c(1, 3))<br />

> X plot(X)<br />

> Z plot(Z)<br />

> plot(cut(X, Z))<br />

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16<br />

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16<br />

> par(mfrow = c(1, 1))<br />

> plot(split(X, Z))<br />

X Z cut(X, Z)<br />

Copyright c○CSIRO 2008


64 Tessellations<br />

1 2<br />

5<br />

9<br />

13<br />

6<br />

10<br />

14<br />

split(X, Z)<br />

3<br />

7 8<br />

11<br />

15<br />

If we plot two tessellations on the same <strong>spatial</strong> doma<strong>in</strong>, what we see is another tessellation.<br />

The “<strong>in</strong>tersection” (or “overlay” or “common ref<strong>in</strong>ement”) <strong>of</strong> two tessellations X <strong>and</strong> Y is the<br />

tessellation whose tiles are the <strong>in</strong>tersections between tiles <strong>of</strong> X <strong>and</strong> tiles <strong>of</strong> Y. The comm<strong>and</strong><br />

<strong>in</strong>tersect.tess computes the <strong>in</strong>tersection <strong>of</strong> two tessellations.<br />

> opa plot(X)<br />

> plot(Y)<br />

> plot(<strong>in</strong>tersect.tess(X, Y))<br />

> par(opa)<br />

Copyright c○CSIRO 2008<br />

X Y <strong>in</strong>tersect.tess(X, Y)<br />

4<br />

12<br />

16


10.3 Operations <strong>in</strong>volv<strong>in</strong>g a tessellation 65<br />

PART III. INTENSITY AND RANDOMNESS<br />

F<strong>in</strong>ally we can start work<strong>in</strong>g on statistical methods for analys<strong>in</strong>g <strong>po<strong>in</strong>t</strong> pattern data. Part III<br />

<strong>of</strong> the workshop discusses how to <strong>in</strong>vestigate the <strong>in</strong>tensity <strong>of</strong> a <strong>po<strong>in</strong>t</strong> pattern, <strong>and</strong> how to assess<br />

whether a pattern is completely r<strong>and</strong>om.<br />

Copyright c○CSIRO 2008


66 Methods 1: Investigat<strong>in</strong>g <strong>in</strong>tensity<br />

11 Methods 1: Investigat<strong>in</strong>g <strong>in</strong>tensity<br />

When we analyse numerical data, we <strong>of</strong>ten beg<strong>in</strong> by tak<strong>in</strong>g the sample mean. The analogue <strong>of</strong><br />

the mean or expected value <strong>of</strong> a r<strong>and</strong>om variable is the <strong>in</strong>tensity <strong>of</strong> a <strong>po<strong>in</strong>t</strong> process.<br />

‘Intensity’ is the average density <strong>of</strong> <strong>po<strong>in</strong>t</strong>s (expected number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s per unit area). Intensity<br />

may be constant (‘uniform’ or ‘homogeneous’) or may vary from location to location<br />

(‘<strong>in</strong>homogeneous’). Investigation <strong>of</strong> the <strong>in</strong>tensity should be one <strong>of</strong> the first steps <strong>in</strong> analys<strong>in</strong>g a<br />

<strong>po<strong>in</strong>t</strong> pattern.<br />

11.1 Uniform <strong>in</strong>tensity<br />

11.1.1 Theory<br />

If the <strong>po<strong>in</strong>t</strong> process X is homogeneous, then for any sub-region B <strong>of</strong> two-dimensional space, the<br />

expected number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>in</strong> B is proportional to the area <strong>of</strong> B:<br />

E[N(X ∩ B)] = λarea(B)<br />

<strong>and</strong> the constant <strong>of</strong> proportionality λ is the <strong>in</strong>tensity. Intensity units are numbers per unit area<br />

(length −2 ). If we know that a <strong>po<strong>in</strong>t</strong> process is homogeneous, then the empirical density <strong>of</strong> <strong>po<strong>in</strong>t</strong>s,<br />

λ = n(x)<br />

area(W)<br />

is an unbiased estimator <strong>of</strong> the true <strong>in</strong>tensity λ.<br />

11.1.2 Implementation <strong>in</strong> spatstat<br />

To compute the estimator λ <strong>in</strong> spatstat, use summary.ppp:<br />

> data(bei)<br />

> summary(bei)<br />

Planar <strong>po<strong>in</strong>t</strong> pattern: 3604 <strong>po<strong>in</strong>t</strong>s<br />

Average <strong>in</strong>tensity 0.00721 <strong>po<strong>in</strong>t</strong>s per square metre<br />

W<strong>in</strong>dow: rectangle = [0, 1000] x [0, 500] metres<br />

W<strong>in</strong>dow area = 5e+05 square metres<br />

Unit <strong>of</strong> length: 1 metre<br />

The estimated <strong>in</strong>tensity is λ = 0.00721 <strong>po<strong>in</strong>t</strong>s per square metre. To extract this <strong>in</strong>tensity<br />

value, type<br />

> lamb lamb<br />

[1] 0.007208<br />

Copyright c○CSIRO 2008


11.2 Inhomogeneous <strong>in</strong>tensity 67<br />

11.2 Inhomogeneous <strong>in</strong>tensity<br />

11.2.1 Theory<br />

In general the <strong>in</strong>tensity <strong>of</strong> a <strong>po<strong>in</strong>t</strong> process will vary from place to place. Assume that the<br />

expected number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s fall<strong>in</strong>g <strong>in</strong> a small region <strong>of</strong> area du around a location u is equal to<br />

λ(u)du. Then λ(u) is the “<strong>in</strong>tensity function” <strong>of</strong> the process, satisfy<strong>in</strong>g<br />

<br />

E[N(X ∩ B)] = λ(u)du<br />

for all regions B.<br />

More generally there could be s<strong>in</strong>gular concentrations <strong>of</strong> <strong>in</strong>tensity (e.g. earthquake epicentres<br />

may be concentrated along a fault l<strong>in</strong>e) so that an <strong>in</strong>tensity function does not exist. Then we<br />

speak <strong>of</strong> the “<strong>in</strong>tensity measure” Λ def<strong>in</strong>ed by<br />

B<br />

Λ(B) = E[N(X ∩ B)]<br />

for each B ⊂ R 2 , assum<strong>in</strong>g the expectation is f<strong>in</strong>ite.<br />

If it is suspected that the <strong>in</strong>tensity may be <strong>in</strong>homogeneous, the <strong>in</strong>tensity function or <strong>in</strong>tensity<br />

measure can be estimated nonparametrically by techniques such as quadrat count<strong>in</strong>g <strong>and</strong> kernel<br />

smooth<strong>in</strong>g.<br />

In quadrat count<strong>in</strong>g, the w<strong>in</strong>dow W is divided <strong>in</strong>to subregions (‘quadrats’) B1,... ,Bm <strong>of</strong><br />

equal area. We count the numbers <strong>of</strong> <strong>po<strong>in</strong>t</strong>s fall<strong>in</strong>g <strong>in</strong> each quadrat, nj = n(x ∩ Bj) for j =<br />

1,... ,m. These are unbiased estimators <strong>of</strong> the correspond<strong>in</strong>g <strong>in</strong>tensity measure values Λ(Bj).<br />

The usual kernel estimator <strong>of</strong> the <strong>in</strong>tensity function is<br />

λ(u) = e(u)<br />

n<br />

κ(u − xi), (1)<br />

i=1<br />

where κ(u) is the kernel (an arbitrary probability density) <strong>and</strong><br />

e(u) −1 <br />

= κ(u − v)dv (2)<br />

is an edge effect bias correction. Clearly λ(u) is an unbiased estimator <strong>of</strong><br />

λ ∗ <br />

(u) = e(u) κ(u − v)λ(v)dv,<br />

W<br />

W<br />

a smoothed version <strong>of</strong> the true <strong>in</strong>tensity function λ(u). The choice <strong>of</strong> smooth<strong>in</strong>g kernel κ <strong>in</strong>volves<br />

a trade<strong>of</strong>f between bias <strong>and</strong> variance.<br />

Intensity can also be estimated us<strong>in</strong>g parametric methods, as we expla<strong>in</strong> <strong>in</strong> Section 13.<br />

11.2.2 Implementation <strong>in</strong> spatstat<br />

Quadrat count<strong>in</strong>g is performed <strong>in</strong> spatstat by the function quadratcount.<br />

> quadratcount(bei, nx = 4, ny = 2)<br />

x<br />

y [0,250] (250,500] (500,750] (750,1e+03]<br />

(250,500] 666 677 130 481<br />

[0,250] 544 165 643 298<br />

Copyright c○CSIRO 2008


68 Methods 1: Investigat<strong>in</strong>g <strong>in</strong>tensity<br />

> Q plot(bei, cex = 0.5, pch = "+")<br />

> plot(Q, add = TRUE, cex = 2)<br />

bei<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ + +<br />

+<br />

+<br />

+ + +<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+++<br />

+<br />

++<br />

+ +<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+++<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + ++<br />

+<br />

+<br />

+ ++<br />

+ +<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+ +++ +<br />

+<br />

+<br />

+<br />

+<br />

+++<br />

+<br />

+<br />

++ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+++<br />

+ +<br />

+<br />

+<br />

++<br />

+<br />

+<br />

++<br />

+ +<br />

+<br />

+<br />

+ + +<br />

++<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

++++<br />

++ + ++<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

++<br />

+ +<br />

+ +<br />

+ +<br />

++<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

++<br />

++ + + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+ ++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++<br />

++<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + ++<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ + ++<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+++<br />

++ +<br />

+ +<br />

+<br />

+ +<br />

++<br />

+ +<br />

+++<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+++++<br />

+<br />

+ + ++<br />

+ +<br />

+<br />

+<br />

+<br />

++ + +<br />

+<br />

+<br />

+ +<br />

+ +<br />

+ +<br />

+<br />

+ + + +<br />

++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ + ++<br />

+<br />

++++++ ++++<br />

+ + + + +<br />

+ +<br />

+<br />

++<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

++++<br />

+<br />

++ +<br />

+<br />

++ ++<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ ++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+ +<br />

+ +<br />

+<br />

+<br />

+ +<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+++ ++<br />

++ + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

++<br />

+<br />

+ +<br />

+<br />

+ + +<br />

+<br />

++<br />

+<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

++ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+ +<br />

+ + + +<br />

+<br />

+ + + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+ +<br />

+<br />

+<br />

+ + +<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ + +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+ +<br />

+<br />

++<br />

++<br />

+<br />

+<br />

+ ++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

++<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+++<br />

+<br />

+ +<br />

+ + +<br />

+ +<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++++<br />

+<br />

+ + +<br />

++ + ++ +<br />

+<br />

++ +<br />

+ + +<br />

+<br />

++<br />

+<br />

+ +<br />

+ +<br />

++<br />

+<br />

++<br />

+<br />

+ ++ +<br />

+ +<br />

+<br />

+ + + +<br />

++ +++ ++<br />

+ +<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

++ + +<br />

++<br />

+<br />

+ +<br />

+ ++ + + ++<br />

++<br />

+<br />

+<br />

+<br />

+ ++<br />

+++<br />

+<br />

++<br />

+<br />

+ + + + ++ +<br />

+ +<br />

+<br />

+<br />

+ + + +<br />

++<br />

++<br />

++ +<br />

++<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ ++ +<br />

+<br />

+ + +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

++<br />

+<br />

+ ++<br />

++<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

+ +<br />

+ +<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + ++<br />

+<br />

+ +<br />

+<br />

+++<br />

+ +<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

+ +<br />

+ + +<br />

+<br />

+<br />

+ ++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

++<br />

+<br />

++<br />

+ +<br />

+++<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

++<br />

+<br />

+<br />

+ ++<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

++<br />

+<br />

+ ++<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

++ ++ + + +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++<br />

+ +<br />

++<br />

+<br />

+ +++<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ ++ +<br />

+<br />

+<br />

++ +<br />

++ ++<br />

+<br />

+<br />

+ ++<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+++<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ +++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++<br />

+ +<br />

+<br />

+ +<br />

+ +++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+ + +<br />

++<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+ ++<br />

+<br />

++<br />

+ +<br />

+ +<br />

++<br />

+<br />

++ +<br />

+ +<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

++<br />

+ + +<br />

+ +<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ ++ +<br />

+ + +<br />

++<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ ++<br />

+++<br />

+ +<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+ ++<br />

+ +<br />

+ +<br />

+ +<br />

++<br />

+ ++<br />

+<br />

+<br />

+ +<br />

+<br />

++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

++ +<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+ +<br />

++ +<br />

+ ++<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ + + ++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +++<br />

+ + +<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+++<br />

+ +<br />

++<br />

+<br />

+<br />

++<br />

++<br />

+<br />

+<br />

+ +<br />

+ +<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ + +<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

++<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

++ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + + ++ +<br />

+<br />

+ +<br />

+<br />

+ +<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ ++<br />

+ ++<br />

+ +<br />

+ +<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

++ + +<br />

+<br />

+ +++<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+ +<br />

++<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+ +<br />

+<br />

+ ++ +<br />

+<br />

+ + +<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+ + + +<br />

+<br />

+ +<br />

+<br />

+<br />

++<br />

+ + +<br />

+ + +<br />

+<br />

+++<br />

++ +<br />

+<br />

+ +<br />

+ +<br />

+<br />

+ +<br />

+ +<br />

+<br />

+ +<br />

+ +<br />

+<br />

++ + ++ +<br />

+<br />

+<br />

+<br />

+<br />

++ ++<br />

+<br />

+<br />

+ +<br />

++ ++<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ ++<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+ + + + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

++ +<br />

+ +<br />

++<br />

+ + +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

++<br />

+++ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+ +++<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

++ +<br />

+<br />

+<br />

+ +<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ + +<br />

+ +<br />

+<br />

++<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

++<br />

++<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ + ++<br />

+<br />

++<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ + + +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+ +<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

337 608 162 73 105 268<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ + ++ +<br />

422 49 17 52 128 146<br />

231 134 92 406 310 64<br />

The value returned byquadratcount is an object belong<strong>in</strong>g to the special class"quadratcount".<br />

We have used the plot method for this class to get the display above.<br />

Kernel density (or <strong>in</strong>tensity) estimation us<strong>in</strong>g an isotropic Gaussian kernel is implemented<br />

<strong>in</strong> spatstat by the function density.ppp, a method for the generic comm<strong>and</strong> density.<br />

> den plot(den)<br />

> plot(bei, add = TRUE, cex = 0.5)<br />

den<br />

The value returned by density.ppp is a pixel image (object <strong>of</strong> class "im"). This class has<br />

methods for pr<strong>in</strong>t, summary, plot, contour (contour plots), persp (perspective plots) <strong>and</strong> so<br />

on.<br />

> persp(den)<br />

Copyright c○CSIRO 2008<br />

0.005 0.01 0.015 0.02


0 100 200 300 400 500<br />

11.2 Inhomogeneous <strong>in</strong>tensity 69<br />

den<br />

y<br />

> contour(den)<br />

0.012<br />

0.014<br />

0.012<br />

0.016<br />

0.02<br />

0.014<br />

0.01<br />

0.018<br />

0.006<br />

0.008<br />

0.002<br />

den<br />

x<br />

den<br />

0.004<br />

0.012<br />

0.002<br />

0.006<br />

0.01<br />

0.014<br />

0.016<br />

0.018<br />

0.008<br />

Alternatively, there is an adaptive estimator <strong>of</strong> <strong>in</strong>tensity which uses a fraction f <strong>of</strong> the<br />

data to construct a Dirichlet tessellation, then forms an <strong>in</strong>tensity estimate that is constant <strong>in</strong><br />

each tile <strong>of</strong> the tessellation:<br />

> aden plot(aden, ma<strong>in</strong> = "Adaptive <strong>in</strong>tensity")<br />

> plot(bei, add = TRUE, cex = 0.5)<br />

0.008<br />

0.006<br />

0.002<br />

0.004<br />

Copyright c○CSIRO 2008


70 Methods 1: Investigat<strong>in</strong>g <strong>in</strong>tensity<br />

Adaptive <strong>in</strong>tensity<br />

The value returned by adaptive.density is also a pixel image (object <strong>of</strong> class "im").<br />

11.3 Quadrats determ<strong>in</strong>ed by a covariate<br />

In quadrat count<strong>in</strong>g methods, any choice <strong>of</strong> quadrats is permissible. From a theoretical view<strong>po<strong>in</strong>t</strong>,<br />

the quadrats do not have to be rectangles <strong>of</strong> equal area, <strong>and</strong> could be regions <strong>of</strong> any<br />

shape.<br />

Quadrat count<strong>in</strong>g is more useful if we choose the quadrats <strong>in</strong> a mean<strong>in</strong>gful way. One way to<br />

do this is to def<strong>in</strong>e the quadrats us<strong>in</strong>g covariate <strong>in</strong>formation.<br />

For example, the tropical ra<strong>in</strong>forest <strong>po<strong>in</strong>t</strong> pattern dataset bei comes with an extra set <strong>of</strong><br />

covariate data bei.extra, which conta<strong>in</strong>s a pixel image <strong>of</strong> terra<strong>in</strong> elevation bei.extra$elev<br />

<strong>and</strong> a pixel image <strong>of</strong> terra<strong>in</strong> slope bei.extra$grad. It might be useful to split the study region<br />

<strong>in</strong>to several sub-regions accord<strong>in</strong>g to the terra<strong>in</strong> slope.<br />

> data(bei)<br />

> Z b Zcut V plot(V)<br />

> plot(bei, add = TRUE, pch = "+")<br />

+<br />

+<br />

+ +<br />

+<br />

+ + +<br />

+<br />

+ +++<br />

+ +<br />

++<br />

+ + + +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

++ +<br />

+ +<br />

+<br />

+ + + +<br />

++<br />

+<br />

++<br />

+ + +<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+ + +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ + +<br />

+ +<br />

++ + +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

+ +<br />

+<br />

+ +<br />

+ + +<br />

+ +<br />

+ +<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+ + + +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+++ +<br />

+<br />

++<br />

+<br />

+<br />

++<br />

+ + ++<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

++<br />

+<br />

+ +<br />

+ +<br />

+<br />

++<br />

+<br />

++<br />

+<br />

++<br />

+++<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+ +++<br />

+ + +<br />

+<br />

+ ++<br />

++<br />

+<br />

+<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+++<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+ ++<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

++ + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ ++<br />

++<br />

++ + + ++<br />

+<br />

+<br />

+<br />

++<br />

+ ++ +<br />

+<br />

+ ++<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ ++<br />

+ +<br />

+<br />

+<br />

++<br />

+<br />

++++<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+ +<br />

+<br />

+<br />

++ +++<br />

++++<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+++ + ++<br />

+<br />

++<br />

++<br />

+<br />

+ +<br />

+++<br />

+<br />

+ ++<br />

++<br />

+<br />

+++<br />

++<br />

+<br />

+<br />

+ +<br />

+<br />

++<br />

+<br />

+ + +<br />

+<br />

++<br />

+<br />

+<br />

+ + ++ ++<br />

+<br />

+<br />

+<br />

++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

++<br />

++<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

++ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

++<br />

+<br />

+ ++ +++<br />

+++ +<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+ +++ + +<br />

+ +<br />

+ +<br />

+<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

++<br />

++<br />

+<br />

+<br />

+<br />

+<br />

++<br />

++ +<br />

+<br />

++ +<br />

++ +<br />

+ +<br />

+<br />

+<br />

+++<br />

+++ +<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+ +<br />

++<br />

+ + +<br />

+<br />

++<br />

+<br />

+<br />

+<br />

++<br />

+<br />

++<br />

+ +<br />

+<br />

+<br />

+ +<br />

+ ++<br />

++ +<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++<br />

++ +<br />

+<br />

+<br />

+<br />

+++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

++<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

++<br />

+ +<br />

+++<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ ++ +<br />

+ +<br />

+++<br />

+<br />

+<br />

+++<br />

+<br />

+ +<br />

++<br />

++<br />

+<br />

+<br />

++<br />

+ ++<br />

+<br />

+ ++ + + ++<br />

+<br />

+<br />

++<br />

+<br />

+ +<br />

+<br />

+ ++ +<br />

+ ++<br />

++ + +<br />

+ +<br />

+<br />

+ +<br />

+<br />

+ +<br />

+ ++<br />

++++<br />

+<br />

+ + ++<br />

+<br />

+++ +<br />

+<br />

+ + +<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ ++<br />

+++<br />

++<br />

+<br />

++ +<br />

+ + ++<br />

++<br />

++<br />

++<br />

+ +<br />

+ +<br />

++ +<br />

+ ++ +<br />

+<br />

+<br />

+ +++<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

++<br />

+++<br />

+ + + + +<br />

+<br />

++ + ++++<br />

++<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

++<br />

++<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ ++<br />

+ + +<br />

+<br />

+ ++ ++<br />

+ +<br />

+ + +<br />

+ + +++++ +<br />

+<br />

+ +<br />

+<br />

++ +<br />

+<br />

+<br />

+ +<br />

+ +++<br />

+<br />

++<br />

+ +++<br />

+ ++ +<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

++<br />

+++<br />

+ ++<br />

++<br />

+<br />

+<br />

+++<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

++<br />

++<br />

+ +<br />

++<br />

+<br />

++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ ++<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+<br />

+<br />

+<br />

++<br />

+ +<br />

+<br />

+<br />

+ ++++<br />

++<br />

+<br />

+<br />

+ + +<br />

+ +<br />

+<br />

+ +<br />

++<br />

+ +++<br />

++<br />

+<br />

++++<br />

+ +<br />

++<br />

+<br />

++<br />

+<br />

++<br />

+++ +<br />

+<br />

++<br />

+ +<br />

+ + + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+ ++<br />

+ +++<br />

++<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+++<br />

+<br />

+ +<br />

+ ++<br />

++<br />

++ +<br />

+<br />

+ +<br />

+++<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+ + + ++<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ + +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+ ++<br />

++ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

+ +<br />

+ ++<br />

+ ++ +<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

++<br />

+<br />

+ + +++<br />

+++<br />

+ +<br />

+++ ++<br />

+ +<br />

++<br />

+<br />

+ ++<br />

+ ++ +<br />

+<br />

++<br />

++<br />

+ +<br />

+<br />

+ ++<br />

+<br />

+++<br />

++<br />

+++<br />

+<br />

+<br />

+ + ++<br />

+<br />

+ ++ +<br />

+ + ++<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+ +<br />

+ +<br />

+<br />

+++<br />

+<br />

++++<br />

+ +<br />

+<br />

+ + ++<br />

+<br />

+ +<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+++<br />

++<br />

+ ++ +<br />

++<br />

+<br />

+++ + +<br />

+<br />

+++<br />

+<br />

+++++ +<br />

+ + ++<br />

+<br />

++<br />

+ + ++ + +<br />

+<br />

++<br />

++<br />

+<br />

+<br />

++<br />

+ +<br />

+ +<br />

+ + + + +<br />

+ + ++<br />

++ +<br />

++<br />

+++<br />

+<br />

+<br />

+<br />

+<br />

+ ++<br />

++<br />

+<br />

+ ++<br />

+++<br />

+<br />

+<br />

+ +++<br />

+<br />

+<br />

+ +<br />

++ +<br />

+ ++<br />

+ +<br />

++<br />

++ ++<br />

+<br />

+ + +<br />

+<br />

++<br />

+<br />

+<br />

+ +<br />

+<br />

+ + +<br />

+<br />

+<br />

++<br />

+<br />

+ +<br />

+<br />

+ +<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

+ ++<br />

+ +<br />

+<br />

+<br />

+++<br />

+<br />

++ +<br />

+ ++<br />

+<br />

+<br />

+ ++ +++<br />

+<br />

+<br />

++<br />

+<br />

+ ++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++ + ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+++ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+++<br />

+<br />

+<br />

+<br />

+ + +<br />

+ +<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ +++<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

++ +++ ++<br />

+ ++<br />

+ + +<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+<br />

+ + + +<br />

++<br />

++ + + +<br />

+ +<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+++<br />

+<br />

+ +<br />

+<br />

+ +<br />

++ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ + +++<br />

+ + +<br />

+ +++ +<br />

+ +<br />

+++<br />

+<br />

+ ++ ++<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+ + + +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+ +<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

++<br />

+ +<br />

++<br />

+ +<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+ +<br />

+ +<br />

+<br />

+<br />

+ +<br />

++<br />

++ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

++ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ ++++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+<br />

+ ++ ++<br />

+<br />

+<br />

+++<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+ +<br />

+<br />

++<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+ +<br />

+<br />

+ +++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++<br />

++++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+ ++ +<br />

+<br />

+ +<br />

+<br />

+ ++<br />

+<br />

++ +<br />

+<br />

+<br />

++<br />

+ +<br />

+<br />

+<br />

+++<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

++<br />

+<br />

++<br />

+ +<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+ + ++<br />

+<br />

++ ++<br />

+<br />

+ + +<br />

++ +<br />

+<br />

+<br />

++<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ ++ +<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+ + +<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+ + + ++<br />

+ +<br />

+<br />

+ + + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

++<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+ + ++<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+ + + + + + +<br />

+ +<br />

++<br />

+<br />

+<br />

++ +<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

V<br />

The call to quantile gave us the quartiles <strong>of</strong> the slope values, so the four tiles <strong>in</strong> the<br />

tessellation V have equal area (ignor<strong>in</strong>g discretisation effects). In other words, we have divided<br />

the study region <strong>in</strong>to four zones <strong>of</strong> equal area accord<strong>in</strong>g to the terra<strong>in</strong> slope.<br />

Copyright c○CSIRO 2008<br />

0.01 0.03 0.05<br />

1 2 3 4


11.3 Quadrats determ<strong>in</strong>ed by a covariate 71<br />

We can now use this tessellation to study the <strong>po<strong>in</strong>t</strong> pattern bei. We could <strong>in</strong>voke the<br />

comm<strong>and</strong>s split, cut or by to divide the <strong>po<strong>in</strong>t</strong>s accord<strong>in</strong>g to this tessellation <strong>and</strong> manipulate<br />

the sub-<strong>patterns</strong>.<br />

The comm<strong>and</strong> quadratcount also works with tessellations:<br />

> qb qb<br />

tile<br />

1 2 3 4<br />

271 984 1028 1321<br />

> plot(qb)<br />

1028<br />

984<br />

qb<br />

271<br />

The text annotations show the number <strong>of</strong> trees <strong>in</strong> each region. S<strong>in</strong>ce the four regions have<br />

equal area, the counts should be approximately equal if there is a uniform density <strong>of</strong> trees.<br />

Obviously they are not equal; there appears to be a strong preference for steeper slopes.<br />

1321<br />

1 2 3 4<br />

Copyright c○CSIRO 2008


72 Methods 2: Tests <strong>of</strong> Complete Spatial R<strong>and</strong>omness<br />

12 Methods 2: Tests <strong>of</strong> Complete Spatial R<strong>and</strong>omness<br />

The basic ‘reference’ or ‘benchmark’ model <strong>of</strong> a <strong>po<strong>in</strong>t</strong> process is the uniform Poisson <strong>po<strong>in</strong>t</strong><br />

process <strong>in</strong> the plane with <strong>in</strong>tensity λ, sometimes called Complete Spatial R<strong>and</strong>omness (CSR).<br />

Its basic properties are<br />

• the number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s fall<strong>in</strong>g <strong>in</strong> any region A has a Poisson distribution with mean λarea(A)<br />

• given that there are n <strong>po<strong>in</strong>t</strong>s <strong>in</strong>side region A, the locations <strong>of</strong> these <strong>po<strong>in</strong>t</strong>s are i.i.d. <strong>and</strong><br />

uniformly distributed <strong>in</strong>side A<br />

• the contents <strong>of</strong> two disjo<strong>in</strong>t regions A <strong>and</strong> B are <strong>in</strong>dependent.<br />

The uniform Poisson process is <strong>of</strong>ten the ‘null model’ <strong>in</strong> an analysis. For historical reasons,<br />

many applied writers focus on establish<strong>in</strong>g that their data do not conform to a uniform Poisson<br />

process.<br />

12.1 Def<strong>in</strong>ition<br />

The homogeneous Poisson process <strong>of</strong> <strong>in</strong>tensity λ > 0 has the properties<br />

(PP1): the number N(X ∩ B) <strong>of</strong> <strong>po<strong>in</strong>t</strong>s fall<strong>in</strong>g <strong>in</strong> any region B is a Poisson r<strong>and</strong>om variable;<br />

(PP2): the expected number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s fall<strong>in</strong>g <strong>in</strong> B is E[N(X ∩ B)] = λ · area(B);<br />

(PP3): if B1,B2 are disjo<strong>in</strong>t sets then N(X∩B1) <strong>and</strong> N(X∩B2) are <strong>in</strong>dependent r<strong>and</strong>om variables;<br />

(PP4): given that N(X ∩ B) = n, the n <strong>po<strong>in</strong>t</strong>s are <strong>in</strong>dependent <strong>and</strong> uniformly distributed <strong>in</strong> B.<br />

The list is redundant; (PP2) <strong>and</strong> (PP3) are sufficient.<br />

This process is <strong>of</strong>ten called “Complete Spatial R<strong>and</strong>omness” (CSR) especially <strong>in</strong> biological<br />

science. Under CSR, <strong>po<strong>in</strong>t</strong>s are <strong>in</strong>dependent <strong>of</strong> each other <strong>and</strong> have the same propensity to be<br />

found at any location.<br />

It is easy to simulate the Poisson process directly by follow<strong>in</strong>g the properties (PP1)–(PP4).<br />

In spatstat, use the comm<strong>and</strong> rpoispp (by convention, r<strong>and</strong>om data generators have names<br />

beg<strong>in</strong>n<strong>in</strong>g with r).<br />

> plot(rpoispp(100))<br />

rpoispp(100)<br />

Copyright c○CSIRO 2008


12.1 Def<strong>in</strong>ition 73<br />

Conceptually, if we discretise a homogeneous Poisson process <strong>in</strong>to <strong>in</strong>f<strong>in</strong>itesimal pixels, the<br />

<strong>in</strong>dicators I are <strong>in</strong>dependent <strong>and</strong> identically distributed, with success probability P {I = 1} =<br />

λdA where dA is the <strong>in</strong>f<strong>in</strong>itesimal area <strong>of</strong> a pixel.<br />

To develop some <strong>in</strong>tuition about completely r<strong>and</strong>om <strong>patterns</strong>, it’s useful to repeat the comm<strong>and</strong><br />

plot(rpoispp(100)) several times (use the up-arrow key to recall the previous comm<strong>and</strong><br />

l<strong>in</strong>e) so that you see several replicates <strong>of</strong> the Poisson process. In particular you will notice that<br />

the <strong>po<strong>in</strong>t</strong>s <strong>in</strong> a homogeneous Poisson process are not ‘uniformly spread’: there are empty gaps<br />

<strong>and</strong> clusters <strong>of</strong> <strong>po<strong>in</strong>t</strong>s.<br />

The comm<strong>and</strong> rpoispp has arguments lambda (the <strong>in</strong>tensity) <strong>and</strong> w<strong>in</strong> (the w<strong>in</strong>dow <strong>in</strong> which<br />

to simulate). The default w<strong>in</strong>dow is the unit square.<br />

> data(letterR)<br />

> plot(rpoispp(100, w<strong>in</strong> = letterR))<br />

rpoispp(100, w<strong>in</strong> = letterR)<br />

If you want to simulate a Poisson process conditionally on a fixed number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s, use the<br />

comm<strong>and</strong> runif<strong>po<strong>in</strong>t</strong>.<br />

> runif<strong>po<strong>in</strong>t</strong>(100)<br />

planar <strong>po<strong>in</strong>t</strong> pattern: 100 <strong>po<strong>in</strong>t</strong>s<br />

w<strong>in</strong>dow: rectangle = [0, 1] x [0, 1] units<br />

Copyright c○CSIRO 2008


74 Methods 2: Tests <strong>of</strong> Complete Spatial R<strong>and</strong>omness<br />

12.2 Quadrat count<strong>in</strong>g tests for CSR<br />

In classical literature, the homogeneous Poisson process (CSR) is usually taken as the appropriate<br />

‘null’ model for a <strong>po<strong>in</strong>t</strong> pattern. Our basic task <strong>in</strong> analys<strong>in</strong>g a <strong>po<strong>in</strong>t</strong> pattern is to f<strong>in</strong>d evidence<br />

aga<strong>in</strong>st CSR.<br />

A classical test for the null hypothesis <strong>of</strong> CSR is the χ 2 test based on quadrat counts. As<br />

expla<strong>in</strong>ed earlier, the w<strong>in</strong>dow W is divided <strong>in</strong>to subregions (‘quadrats’) B1,... ,Bm <strong>of</strong> equal area.<br />

We count the numbers <strong>of</strong> <strong>po<strong>in</strong>t</strong>s fall<strong>in</strong>g <strong>in</strong> each quadrat, nj = n(x ∩Bj) for j = 1,... ,m. Under<br />

the null hypothesis <strong>of</strong> CSR, the nj are i.i.d. Poisson r<strong>and</strong>om variables with the same expected<br />

value. The Pearson χ 2 goodness-<strong>of</strong>-fit test can be used.<br />

> quadrat.test(nzchop, nx = 3, ny = 2)<br />

Chi-squared test <strong>of</strong> CSR us<strong>in</strong>g quadrat counts<br />

data: nzchop<br />

X-squared = 5.0769, df = 5, p-value = 0.4066<br />

The value returned by quadrat.test is an object <strong>of</strong> class "htest" (the st<strong>and</strong>ard R class<br />

for hypothesis tests). Pr<strong>in</strong>t<strong>in</strong>g the object (as shown above) gives comprehensible output about<br />

the outcome <strong>of</strong> the test. Inspect<strong>in</strong>g the p-value, we see that the test does not reject the null<br />

hypothesis <strong>of</strong> CSR for the (chopped) New Zeal<strong>and</strong> trees data.<br />

The return value quadrat.test also belongs to the special class "quadrat.test". Plott<strong>in</strong>g<br />

the object will display the quadrats, annotated by their observed <strong>and</strong> expected counts <strong>and</strong> the<br />

Pearson residuals (observed counts nj at top left; expected count at top right; Pearson residuals<br />

at bottom).<br />

> M M<br />

Chi-squared test <strong>of</strong> CSR us<strong>in</strong>g quadrat counts<br />

data: nzchop<br />

X-squared = 5.0769, df = 5, p-value = 0.4066<br />

> plot(nzchop)<br />

> plot(M, add = TRUE, cex = 2)<br />

nzchop<br />

9 13 14 13 17 13<br />

−1.1 0.28 1.1<br />

17 13 9 13 12 13<br />

1.1 −1.1 −0.28<br />

The p-value can also be extracted by<br />

Copyright c○CSIRO 2008


12.3 Critique 75<br />

> M$p.value<br />

[1] 0.4065648<br />

12.3 Critique<br />

S<strong>in</strong>ce this k<strong>in</strong>d <strong>of</strong> technique is <strong>of</strong>ten used <strong>in</strong> the applied literature, a few comments are appropriate.<br />

The ma<strong>in</strong> critique <strong>of</strong> the quadrat test approach is the lack <strong>of</strong> <strong>in</strong>formation. This is a goodness<strong>of</strong>-fit<br />

test <strong>in</strong> which the alternative hypothesis H1 is simply the negation <strong>of</strong> H0, that is, the<br />

alternative is that “the process is not a homogeneous Poisson process”. A <strong>po<strong>in</strong>t</strong> process may<br />

fail to satisfy properties (PP1)–(PP4) either because it violates (PP2) by hav<strong>in</strong>g non-uniform<br />

<strong>in</strong>tensity, or because it violates (PP3)–(PP4) by exhibit<strong>in</strong>g dependence between <strong>po<strong>in</strong>t</strong>s. There<br />

are too many types <strong>of</strong> departure from H0.<br />

The usual justification for the classical χ 2 goodness-<strong>of</strong>-fit test is to assume that the counts<br />

are <strong>in</strong>dependent, <strong>and</strong> derive a test <strong>of</strong> the null hypothesis that all counts have the same expected<br />

value. Invok<strong>in</strong>g it here is slightly naive, s<strong>in</strong>ce the <strong>in</strong>dependence <strong>of</strong> counts is also open to question<br />

here.<br />

Indeed we can also turn th<strong>in</strong>gs around <strong>and</strong> view the χ 2 test as a test <strong>of</strong> the Poisson distributional<br />

properties (PP2)–(PP3) assum<strong>in</strong>g that the <strong>in</strong>tensity is uniform. The Pearson χ 2 test<br />

statistic<br />

X 2 <br />

j<br />

=<br />

(nj − n/m) 2<br />

n/m<br />

(where n = <br />

j nj is the total number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s) co<strong>in</strong>cides, up to a constant factor, with the<br />

sample variance-to-mean ratio <strong>of</strong> the counts nj, which is <strong>of</strong>ten <strong>in</strong>terpreted as a measure <strong>of</strong><br />

over/under-dispersion <strong>of</strong> the counts nj assum<strong>in</strong>g they have constant mean.<br />

The power <strong>of</strong> the quadrat test depends on the size <strong>of</strong> quadrats, <strong>and</strong> falls to zero for quadrats<br />

which are either very large or very small. The power also depends on the alternative hypothesis,<br />

<strong>in</strong> particular on the ‘<strong>spatial</strong> scale’ <strong>of</strong> any departures from the assumptions <strong>of</strong> constant <strong>in</strong>tensity<br />

<strong>and</strong> <strong>in</strong>dependence <strong>of</strong> <strong>po<strong>in</strong>t</strong>s. The choice <strong>of</strong> quadrat size carries an implicit assumption about the<br />

<strong>spatial</strong> scale.<br />

12.4 Kolmogorov-Smirnov test <strong>of</strong> CSR<br />

Typically a more powerful test <strong>of</strong> CSR is the Kolmogorov-Smirnov test <strong>in</strong> which we compare<br />

the observed <strong>and</strong> expected distributions <strong>of</strong> the values <strong>of</strong> some function T.<br />

We specify a real-valued function T(x,y) def<strong>in</strong>ed at all locations (x,y) <strong>in</strong> the w<strong>in</strong>dow. We<br />

evaluate this function at each <strong>of</strong> the data <strong>po<strong>in</strong>t</strong>s. Then we compare this empirical distribution<br />

<strong>of</strong> values <strong>of</strong> T with the predicted distribution <strong>of</strong> values <strong>of</strong> T under CSR, us<strong>in</strong>g the classical<br />

Kolmogorov-Smirnov test.<br />

In spatstat the <strong>spatial</strong> Kolmogorov-Smirnov test is performed by kstest. This function is<br />

generic. The method for <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>, kstest.ppp, performs the Kolmogorov-Smirnov test<br />

for CSR.<br />

If X is the data <strong>po<strong>in</strong>t</strong> pattern, then<br />

> kstest(X, fun)<br />

performs the test, where fun is a function(x,y) <strong>in</strong> the R language.<br />

For example, let’s consider thenzchop data <strong>and</strong> choose the function T to be the x coord<strong>in</strong>ate,<br />

T(x,y) = x. This means we are simply compar<strong>in</strong>g the observed <strong>and</strong> expected distributions <strong>of</strong><br />

the x coord<strong>in</strong>ate.<br />

Copyright c○CSIRO 2008


76 Methods 2: Tests <strong>of</strong> Complete Spatial R<strong>and</strong>omness<br />

> kstest(nzchop, function(x, y) {<br />

+ x<br />

+ })<br />

Spatial Kolmogorov-Smirnov test <strong>of</strong> CSR<br />

data: covariate ’function(x, y) { x}’ evaluated at <strong>po<strong>in</strong>t</strong>s <strong>of</strong> ’nzchop’<br />

<strong>and</strong> transformed to uniform distribution under CSR<br />

D = 0.0717, p-value = 0.7913<br />

alternative hypothesis: two-sided<br />

The result <strong>of</strong> kstest is an object <strong>of</strong> class "htest" (the st<strong>and</strong>ard R class for hypothesis<br />

tests) <strong>and</strong> also <strong>of</strong> class "kstest" so that it can be pr<strong>in</strong>ted <strong>and</strong> plotted. The pr<strong>in</strong>t method<br />

(demonstrated above) reports <strong>in</strong>formation about the hypothesis test such as the p-value. The<br />

plot method displays the observed <strong>and</strong> expected distribution functions.<br />

> KS plot(KS)<br />

> pval


12.5 Us<strong>in</strong>g covariate data 77<br />

that the density <strong>of</strong> trees depends on terra<strong>in</strong> slope. To test this formally we can divide the region<br />

<strong>in</strong>to irregular quadrats accord<strong>in</strong>g to the terra<strong>in</strong> slope, <strong>and</strong> apply the χ 2 test. The comm<strong>and</strong><br />

quadrat.test accepts a tessellation <strong>and</strong> uses the tiles <strong>of</strong> the tessellation as the quadrats:<br />

> data(bei)<br />

> Z b Zcut V quadrat.test(bei, tess = V)<br />

Chi-squared test <strong>of</strong> CSR us<strong>in</strong>g quadrat counts<br />

data: bei<br />

X-squared = 661.8402, df = 3, p-value < 2.2e-16<br />

Because <strong>of</strong> the large counts <strong>in</strong> these regions, we can probably ignore concerns about <strong>in</strong>dependence,<br />

<strong>and</strong> conclude that the trees are not uniform <strong>in</strong> their <strong>in</strong>tensity.<br />

A more powerful test (if that were needed!) is the Kolmogorov-Smirnov test us<strong>in</strong>g the slope<br />

covariate:<br />

> KS plot(KS)<br />

> KS<br />

Spatial Kolmogorov-Smirnov test <strong>of</strong> CSR<br />

data: covariate ’Z’ evaluated at <strong>po<strong>in</strong>t</strong>s <strong>of</strong> ’bei’<br />

<strong>and</strong> transformed to uniform distribution under CSR<br />

D = 0.1871, p-value < 2.2e-16<br />

alternative hypothesis: two-sided<br />

probability<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

Spatial Kolmogorov−Smirnov test <strong>of</strong> CSR<br />

based on distribution <strong>of</strong> covariate ’Z’<br />

p−value= 0<br />

0.00 0.05 0.10 0.15 0.20 0.25 0.30<br />

Z<br />

Copyright c○CSIRO 2008


78 Methods 2: Tests <strong>of</strong> Complete Spatial R<strong>and</strong>omness<br />

The Kolmogorov-Smirnov test would typically be preferred if the covariateZhas cont<strong>in</strong>uouslyvary<strong>in</strong>g<br />

numerical values. If the covariate is a factor or discrete variable, then the Kolmogorov-<br />

Smirnov test is <strong>in</strong>effective because <strong>of</strong> tied values, <strong>and</strong> the χ 2 test based on quadrat counts would<br />

be preferable.<br />

Copyright c○CSIRO 2008


13 Methods 3: Maximum likelihood for Poisson processes<br />

If we are will<strong>in</strong>g to assume (tentatively) that the <strong>po<strong>in</strong>t</strong>s are <strong>in</strong>dependent, then we can apply<br />

some decent statistical methods to the <strong>in</strong>vestigation <strong>of</strong> the <strong>in</strong>tensity.<br />

13.1 Inhomogeneous Poisson process<br />

The <strong>in</strong>homogeneous Poisson process with <strong>in</strong>tensity function λ(u), u ∈ R 2 , is a modification <strong>of</strong><br />

the homogeneous Poisson process, <strong>in</strong> which properties (PP2) <strong>and</strong> (PP4) above are replaced by<br />

(PP2’): the number N(X ∩ B) <strong>of</strong> <strong>po<strong>in</strong>t</strong>s fall<strong>in</strong>g <strong>in</strong> a region B has expectation<br />

<br />

E[N(X ∩ B)] = λ(u)du.<br />

(PP4’): given that N(X ∩ B) = n, the n <strong>po<strong>in</strong>t</strong>s are <strong>in</strong>dependent <strong>and</strong> identically distributed, with<br />

common probability density f(u) = λ(u)/I, where I = <br />

B λ(u)du.<br />

This process can also be simulated us<strong>in</strong>g rpoispp us<strong>in</strong>g the same properties. The <strong>in</strong>tensity<br />

argumentlambda can be a constant, a function(x,y) giv<strong>in</strong>g the values <strong>of</strong> the <strong>in</strong>tensity function<br />

at coord<strong>in</strong>ates x,y, or a pixel image conta<strong>in</strong><strong>in</strong>g the <strong>in</strong>tensity values at a grid <strong>of</strong> locations.<br />

> lambda plot(rpoispp(lambda))<br />

rpoispp(lambda)<br />

If we discretise an <strong>in</strong>homogeneous Poisson process, the <strong>in</strong>dicators I are <strong>in</strong>dependent, but<br />

have unequal success probabilities, P {I(u) = 1} = λ(u)dA.<br />

The <strong>in</strong>homogeneous Poisson process is a plausible model for <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> under several<br />

scenarios. One is r<strong>and</strong>om th<strong>in</strong>n<strong>in</strong>g: suppose that a homogeneous Poisson process <strong>of</strong> <strong>in</strong>tensity<br />

β is generated, <strong>and</strong> that each <strong>po<strong>in</strong>t</strong> is either deleted or reta<strong>in</strong>ed, <strong>in</strong>dependently <strong>of</strong> other <strong>po<strong>in</strong>t</strong>s.<br />

Suppose the probability <strong>of</strong> reta<strong>in</strong><strong>in</strong>g a <strong>po<strong>in</strong>t</strong> at the location u is p(u). Then the result<strong>in</strong>g process<br />

<strong>of</strong> reta<strong>in</strong>ed <strong>po<strong>in</strong>t</strong>s is <strong>in</strong>homogeneous Poisson, with <strong>in</strong>tensity λ(u) = βp(u).<br />

B<br />

79<br />

Copyright c○CSIRO 2008


80 Methods 3: Maximum likelihood for Poisson processes<br />

Consider, for example, a model <strong>of</strong> plant propagation which assumes that seeds are r<strong>and</strong>omly<br />

dispersed accord<strong>in</strong>g to a Poisson process, <strong>and</strong> seeds r<strong>and</strong>omly germ<strong>in</strong>ate or do not germ<strong>in</strong>ate,<br />

<strong>in</strong>dependently <strong>of</strong> each other, with a germ<strong>in</strong>ation probability that depends on the local soil<br />

conditions. The result<strong>in</strong>g pattern <strong>of</strong> plants is an <strong>in</strong>homogeneous Poisson process.<br />

13.2 Likelihood methods<br />

The log-likelihood for the homogeneous Poisson process with <strong>in</strong>tensity λ is<br />

log L(λ;x) = n(x)log λ − λarea(W) (3)<br />

where n(x) is the number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>in</strong> the dataset x. The maximum likelihood estimator <strong>of</strong> λ is<br />

λ = n(x)<br />

area(W)<br />

which is also an unbiased estimator. The variance <strong>of</strong> λ is var[ λ] = λ/area(W).<br />

Consider an <strong>in</strong>homogeneous Poisson process with <strong>in</strong>tensity function λθ(u) depend<strong>in</strong>g on a<br />

parameter θ. The log-likelihood for θ is<br />

log L(θ;x) =<br />

n<br />

i=1<br />

<br />

log λθ(xi) − λθ(u)du (4)<br />

W<br />

This is a well-behaved likelihood; for example if log λθ(u) is l<strong>in</strong>ear <strong>in</strong> θ, then the log-likelihood<br />

is concave, so there is a unique MLE. However, the MLE θ is not analytically tractable, so it<br />

must be computed us<strong>in</strong>g numerical algorithms such as Newton’s method.<br />

The usual asymptotic theory <strong>of</strong> maximum likelihood applies: under suitable large sample<br />

conditions, the MLE <strong>of</strong> θ is asymptotically normal. If we wish to test CSR, the likelihood ratio<br />

test statistic<br />

R = 2log L( θ)<br />

L( λ)<br />

is asymptotically χ 2 under CSR, <strong>and</strong> this gives an asymptotically optimal test <strong>of</strong> CSR aga<strong>in</strong>st<br />

the alternative <strong>of</strong> an <strong>in</strong>homogeneous Poisson process with <strong>in</strong>tensity λθ(u).<br />

13.3 Fitt<strong>in</strong>g Poisson processes <strong>in</strong> spatstat<br />

Mark Berman <strong>and</strong> Rolf Turner [14] (see also [34, 18, 35]) developed a clever computational device<br />

for f<strong>in</strong>d<strong>in</strong>g the MLE <strong>of</strong> θ by exploit<strong>in</strong>g a formal similarity between the Poisson log-likelihood (4)<br />

<strong>and</strong> that <strong>of</strong> a logl<strong>in</strong>ear Poisson regression.<br />

The Berman-Turner algorithm is implemented <strong>in</strong> spatstat. The <strong>in</strong>tensity function λθ(u)<br />

must be logl<strong>in</strong>ear <strong>in</strong> the parameter θ:<br />

log λθ(u) = θ · S(u) (5)<br />

where S(u) is a real-valued or vector-valued function <strong>of</strong> location u. The form <strong>of</strong> S is arbitrary so<br />

this is not much <strong>of</strong> a restriction. In practice S(u) could be a function <strong>of</strong> the <strong>spatial</strong> coord<strong>in</strong>ates<br />

<strong>of</strong> u, or an observed covariate, or a mixture <strong>of</strong> both. Assum<strong>in</strong>g (5), the log-likelihood (4) is a<br />

convex function <strong>of</strong> θ, so maximum likelihood is well-behaved.<br />

Copyright c○CSIRO 2008


13.3 Fitt<strong>in</strong>g Poisson processes <strong>in</strong> spatstat 81<br />

13.3.1 Model-fitt<strong>in</strong>g function<br />

The fitt<strong>in</strong>g function is called ppm (‘<strong>po<strong>in</strong>t</strong> process model’) <strong>and</strong> is very closely analogous to the<br />

model fitt<strong>in</strong>g functions <strong>in</strong> R such as lm <strong>and</strong> glm. The statistic S(u) is specified by an R language<br />

formula, like the formulas used to specify the systematic relationship <strong>in</strong> a l<strong>in</strong>ear model or<br />

generalised l<strong>in</strong>ear model. The basic syntax is:<br />

> ppm(X, ~trend)<br />

where X is the <strong>po<strong>in</strong>t</strong> pattern dataset, <strong>and</strong> ~trend is an R formula with no left-h<strong>and</strong> side. This<br />

should be viewed as a model with log l<strong>in</strong>k, so the formula ~trend specifies the form <strong>of</strong> the<br />

logarithm <strong>of</strong> the <strong>in</strong>tensity function.<br />

To fit the homogeneous Poisson model:<br />

> ppm(bei, ~1)<br />

Stationary Poisson process<br />

Uniform <strong>in</strong>tensity: 0.007208<br />

To fit an <strong>in</strong>homogeneous Poisson model with an <strong>in</strong>tensity that is log-l<strong>in</strong>ear <strong>in</strong> the cartesian<br />

coord<strong>in</strong>ates, i.e. λθ((x,y)) = exp(θ0 + θ1x + θ2y),<br />

> ppm(bei, ~x + y)<br />

Nonstationary Poisson process<br />

Trend formula: ~x + y<br />

Fitted coefficients for trend formula:<br />

(Intercept) x y<br />

-4.7245290274 -0.0008031288 0.0006496090<br />

Herex<strong>and</strong>yare reserved names that always refer to the cartesian coord<strong>in</strong>ates. In the output,<br />

the ‘fitted coefficients’ are the maximum likelihood estimates <strong>of</strong> θ0,θ1,θ2, the coefficients <strong>of</strong> the<br />

‘l<strong>in</strong>ear predictor’. The fitted <strong>in</strong>tensity function is<br />

λθ((x,y)) = exp (−4.724529 + −0.000803x + 0.00065y) .<br />

To fit an <strong>in</strong>homogeneous Poisson model with an <strong>in</strong>tensity that is log-quadratic <strong>in</strong> the cartesian<br />

coord<strong>in</strong>ates, i.e. such that log λθ((x,y)) is a quadratic <strong>in</strong> x <strong>and</strong> y:<br />

> ppm(bei, ~polynom(x, y, 2))<br />

Nonstationary Poisson process<br />

Trend formula: ~polynom(x, y, 2)<br />

Fitted coefficients for trend formula:<br />

(Intercept) polynom(x, y, 2)[x] polynom(x, y, 2)[y]<br />

-4.275762e+00 -1.609187e-03 -4.895166e-03<br />

polynom(x, y, 2)[x^2] polynom(x, y, 2)[x.y] polynom(x, y, 2)[y^2]<br />

1.625968e-06 -2.836387e-06 1.331331e-05<br />

Copyright c○CSIRO 2008


82 Methods 3: Maximum likelihood for Poisson processes<br />

Essentially any k<strong>in</strong>d <strong>of</strong> model formula can be used, <strong>in</strong>volv<strong>in</strong>g the reserved names x <strong>and</strong> y<br />

<strong>and</strong> any covariates (as we expla<strong>in</strong> later).<br />

To fit a model with constant but unequal <strong>in</strong>tensities on each side <strong>of</strong> the vertical l<strong>in</strong>e x = 500,<br />

the explanatory variable S(u) should be a factor with two levels, Left <strong>and</strong> Right say, tak<strong>in</strong>g<br />

the value Left when the location u is to the left <strong>of</strong> the l<strong>in</strong>e x = 500.<br />

> side ppm(bei, ~side(x))<br />

Nonstationary Poisson process<br />

Trend formula: ~side(x)<br />

Fitted coefficients for trend formula:<br />

(Intercept) side(x)right<br />

-4.8026460 -0.2792705<br />

When factors are <strong>in</strong>volved, the <strong>in</strong>terpretation <strong>of</strong> the coefficients depends on which ‘contrasts’<br />

are <strong>in</strong> force. By default the ‘treatment contrasts’ are assumed. This means that the treatment<br />

effect is taken to be zero for the first level <strong>of</strong> the factor, <strong>and</strong> the estimated treatment effects for<br />

other levels are effectively estimates <strong>of</strong> the difference from the first level. In this case "left"<br />

comes alphabetically before "right", so by default, the first level is "left". The fitted model<br />

is<br />

λθ((x,y)) =<br />

exp(−4.8026) if x < 500<br />

exp(−4.8026 + (−0.2793)) if x ≥ 500<br />

Rather than rely<strong>in</strong>g on such <strong>in</strong>terpretations, it is prudent to use the comm<strong>and</strong> predict to<br />

compute predicted values <strong>of</strong> the model, as expla<strong>in</strong>ed <strong>in</strong> Section 13.4 below.<br />

13.3.2 Models <strong>in</strong>volv<strong>in</strong>g <strong>spatial</strong> covariates<br />

It is also possible to fit an <strong>in</strong>homogeneous Poisson process model with an <strong>in</strong>tensity function that<br />

depends on an observed covariate. Let Z(u) be a covariate that has been measured at every<br />

location u <strong>in</strong> the study w<strong>in</strong>dow. Then Z(u), or any transformation <strong>of</strong> it, can serve as the statistic<br />

S(u) <strong>in</strong> the parametric form (5) for the <strong>in</strong>tensity function.<br />

The <strong>po<strong>in</strong>t</strong> pattern dataset bei is supplied with accompany<strong>in</strong>g covariate data bei.extra.<br />

The covariates are the elevation (altitude) <strong>and</strong> the slope <strong>of</strong> the terra<strong>in</strong> at each location <strong>in</strong> the<br />

w<strong>in</strong>dow, given as two pixel images bei.extra$elev <strong>and</strong> bei.extra$grad.<br />

> data(bei)<br />

> grad plot(grad)<br />

Copyright c○CSIRO 2008


13.3 Fitt<strong>in</strong>g Poisson processes <strong>in</strong> spatstat 83<br />

grad<br />

To fit the <strong>in</strong>homogeneous Poisson model with <strong>in</strong>tensity which is a logl<strong>in</strong>ear function <strong>of</strong> slope,<br />

i.e.<br />

λ(u) = exp(β0 + β1Z(u)) (6)<br />

where β0,β1 are parameters <strong>and</strong> Z(u) is the slope at location u, we type<br />

> ppm(bei, ~slope, covariates = list(slope = grad))<br />

Nonstationary Poisson process<br />

Trend formula: ~slope<br />

Fitted coefficients for trend formula:<br />

(Intercept) slope<br />

-5.390553 5.022021<br />

In the call to ppm, the argumentcovariates should be a list <strong>of</strong> name=value pairs. The names<br />

should match the variables appear<strong>in</strong>g <strong>in</strong> the model formula. The values should be pixel images.<br />

The pr<strong>in</strong>tout <strong>in</strong>cludes the fitted coefficients β0,β1 so the fitted model is<br />

λ(u) = exp(−5.390553 + 5.022021Z(u)). (7)<br />

It might be more appropriate to fit the <strong>in</strong>homogeneous Poisson model with <strong>in</strong>tensity that is<br />

proportional to slope,<br />

λ(u) = βZ(u) (8)<br />

where aga<strong>in</strong> Z(u) is the slope at u. Equivalently<br />

log λ(u) = log β + log Z(u). (9)<br />

There is no coefficient <strong>in</strong> front <strong>of</strong> the term log Z(u) <strong>in</strong> (9), so this term is an ‘<strong>of</strong>fset’. To fit this<br />

model,<br />

> ppm(bei, ~<strong>of</strong>fset(log(slope)), covariates = list(slope = grad))<br />

0 0.05 0.1 0.15 0.2 0.25 0.3<br />

Copyright c○CSIRO 2008


84 Methods 3: Maximum likelihood for Poisson processes<br />

Nonstationary Poisson process<br />

Trend formula: ~<strong>of</strong>fset(log(slope))<br />

Fitted coefficients for trend formula:<br />

(Intercept)<br />

-2.427127<br />

The fitted coefficient is the constant log β appear<strong>in</strong>g <strong>in</strong> (9), so convert<strong>in</strong>g back to the form<br />

(8), the fitted model is<br />

λ(u) = e −2.427127 Z(u) = 0.0883 Z(u).<br />

13.4 Fitted models<br />

The value returned by the model-fitt<strong>in</strong>g function ppm is an object <strong>of</strong> class "ppm" that represents<br />

the fitted model. This is analogous to the fitt<strong>in</strong>g <strong>of</strong> l<strong>in</strong>ear models (lm), generalised l<strong>in</strong>ear models<br />

(glm) <strong>and</strong> so on.<br />

13.4.1 St<strong>and</strong>ard operations<br />

The follow<strong>in</strong>g st<strong>and</strong>ard operations on fitted models <strong>in</strong> R can be applied to <strong>po<strong>in</strong>t</strong> process models<br />

(i.e. these operations have methods for the class "ppm"):<br />

pr<strong>in</strong>t pr<strong>in</strong>t basic <strong>in</strong>formation<br />

summary pr<strong>in</strong>t detailed summary <strong>in</strong>formation<br />

plot plot the fitted <strong>in</strong>tensity<br />

predict compute the fitted <strong>in</strong>tensity<br />

fitted compute the fitted <strong>in</strong>tensity at data <strong>po<strong>in</strong>t</strong>s<br />

update re-fit the model<br />

coef extract the fitted coefficient vector θ<br />

vcov variance-covariance matrix <strong>of</strong> θ<br />

anova analysis <strong>of</strong> deviance<br />

logLik log-likelihood value<br />

formula extract the model formula<br />

terms extract the terms <strong>in</strong> the model<br />

model.matrix compute the design matrix<br />

For <strong>in</strong>formation on these methods, see pr<strong>in</strong>t.ppm, summary.ppm, plot.ppm etc. The follow<strong>in</strong>g<br />

comm<strong>and</strong>s also work on "ppm" objects:<br />

step stepwise model selection<br />

drop1 one step model deletion<br />

AIC Akaike Information Criterion<br />

> fit fit<br />

Nonstationary Poisson process<br />

Trend formula: ~x + y<br />

Fitted coefficients for trend formula:<br />

(Intercept) x y<br />

-4.7245290274 -0.0008031288 0.0006496090<br />

Copyright c○CSIRO 2008


13.4 Fitted models 85<br />

> plot(fit, how = "image", se = FALSE)<br />

> predict(fit, type = "trend")<br />

Fitted trend<br />

real-valued pixel image<br />

50 x 50 pixel array (ny, nx)<br />

enclos<strong>in</strong>g rectangle: [0, 1000] x [0, 500] metres<br />

> predict(fit, type = "cif", ngrid = 256)<br />

real-valued pixel image<br />

256 x 256 pixel array (ny, nx)<br />

enclos<strong>in</strong>g rectangle: [0, 1000] x [0, 500] metres<br />

> coef(fit)<br />

(Intercept) x y<br />

-4.7245290274 -0.0008031288 0.0006496090<br />

> vcov(fit)<br />

(Intercept) x y<br />

(Intercept) 1.854091e-03 -1.491267e-06 -3.528289e-06<br />

x -1.491267e-06 3.437842e-09 1.208410e-14<br />

y -3.528289e-06 1.208410e-14 1.338955e-08<br />

> sqrt(diag(vcov(fit)))<br />

(Intercept) x y<br />

4.305915e-02 5.863311e-05 1.157132e-04<br />

> round(vcov(fit, what = "corr"), 2)<br />

(Intercept) x y<br />

(Intercept) 1.00 -0.59 -0.71<br />

x -0.59 1.00 0.00<br />

y -0.71 0.00 1.00<br />

0.004 0.006 0.008 0.01 0.012<br />

Copyright c○CSIRO 2008


86 Methods 3: Maximum likelihood for Poisson processes<br />

This is the fitted model with <strong>in</strong>tensity function<br />

with the follow<strong>in</strong>g estimates:<br />

i θi var( θi) st<strong>and</strong>ard deviation<br />

0 -4.724529 0.001854091 0.04305915<br />

1 -0.0008031288 3.437842e-09 5.863311e-05<br />

2 0.000649609 1.338955e-08 0.0001157132<br />

λθ((x,y)) = exp (θ0 + θ1x + θ2y) (10)<br />

It is also possible to compute the st<strong>and</strong>ard error <strong>of</strong> the fitted <strong>in</strong>tensity λθ(u) at each location<br />

u, as a pixel image. Use predict(fit, type="se") or plot(fit, se=TRUE).<br />

> SE plot(SE, ma<strong>in</strong> = "st<strong>and</strong>ard error <strong>of</strong> fitted <strong>in</strong>tensity")<br />

st<strong>and</strong>ard error <strong>of</strong> fitted <strong>in</strong>tensity<br />

If the model formula <strong>in</strong>volves transformations <strong>of</strong> the orig<strong>in</strong>al covariates, thenmodel.matrix(fit)<br />

gives the design matrix whose columns conta<strong>in</strong> these transformed covariates, <strong>and</strong>model.images(fit)<br />

gives a list <strong>of</strong> pixel images <strong>of</strong> these transformed covariates.<br />

> fit mo mo<br />

(Intercept) :<br />

real-valued pixel image<br />

100 x 100 pixel array (ny, nx)<br />

enclos<strong>in</strong>g rectangle: [0, 1000] x [0, 500] metres<br />

sqrt(slope) :<br />

real-valued pixel image<br />

100 x 100 pixel array (ny, nx)<br />

enclos<strong>in</strong>g rectangle: [0, 1000] x [0, 500] metres<br />

x :<br />

Copyright c○CSIRO 2008<br />

2e−04 3e−04 4e−04


13.4 Fitted models 87<br />

real-valued pixel image<br />

100 x 100 pixel array (ny, nx)<br />

enclos<strong>in</strong>g rectangle: [0, 1000] x [0, 500] metres<br />

> plot(mo[[2]])<br />

mo[[2]]<br />

It is also possible to plot the ‘effect’ <strong>of</strong> a s<strong>in</strong>gle covariate <strong>in</strong> the model. The comm<strong>and</strong><br />

effectfun computes the <strong>in</strong>tensity <strong>of</strong> the fitted model as a function <strong>of</strong> one <strong>of</strong> its covariates. This<br />

is chiefly useful if the model only has one covariate.<br />

> fit plot(effectfun(fit, "slope"))<br />

lambda<br />

0.005 0.010 0.015 0.020<br />

effectfun(fit, "slope")<br />

0.00 0.05 0.10 0.15 0.20 0.25 0.30<br />

slope<br />

0.1 0.2 0.3 0.4 0.5<br />

Copyright c○CSIRO 2008


88 Methods 3: Maximum likelihood for Poisson processes<br />

13.4.2 Model selection<br />

Analysis <strong>of</strong> deviance for nested Poisson <strong>po<strong>in</strong>t</strong> process models is implemented <strong>in</strong> spatstat as<br />

anova.ppm. The first model should be a sub-model <strong>of</strong> the second.<br />

> fit fitnull anova(fitnull, fit, test = "Chi")<br />

Analysis <strong>of</strong> Deviance Table<br />

Model 1: .mpl.Y ~ 1<br />

Model 2: .mpl.Y ~ slope<br />

Resid. Df Resid. Dev Df Deviance P(>|Chi|)<br />

1 20507 18728.4<br />

2 20506 18346.1 1 382.3 4.018e-85<br />

This effectively performs the likelihood ratio test <strong>of</strong> the null hypothesis <strong>of</strong> a homogeneous<br />

Poisson process (CSR) aga<strong>in</strong>st the alternative <strong>of</strong> an <strong>in</strong>homogeneous Poisson process with <strong>in</strong>tensity<br />

that is a logl<strong>in</strong>ear function <strong>of</strong> the slope covariate (6). The p-value is extremely small,<br />

<strong>in</strong>dicat<strong>in</strong>g rejection <strong>of</strong> CSR <strong>in</strong> favour <strong>of</strong> the alternative. (Please ignore the columns Resid.Df<br />

<strong>and</strong> Resid.Dev which are artefacts <strong>of</strong> the discretisation. Only the deviance difference <strong>and</strong> the<br />

difference <strong>in</strong> degrees <strong>of</strong> freedom are valid.)<br />

Note that st<strong>and</strong>ard Analysis <strong>of</strong> Deviance requires the null hypothesis to be a sub-model <strong>of</strong> the<br />

alternative. Unfortunately the model (8), <strong>in</strong> which <strong>in</strong>tensity is proportional to slope, does not<br />

<strong>in</strong>clude the homogeneous Poisson process as a special case, so we cannot use analysis <strong>of</strong> deviance<br />

to test the null hypothesis <strong>of</strong> homogeneous Poisson aga<strong>in</strong>st the alternative <strong>of</strong> an <strong>in</strong>homogeneous<br />

Poisson with <strong>in</strong>tensity (8).<br />

One possibility here is to use the Akaike Information Criterion AIC for model selection.<br />

> fitprop fitnull AIC(fitprop)<br />

[1] 42496.65<br />

> AIC(fitnull)<br />

[1] 42763.92<br />

The smaller AIC favours the model (8) with <strong>in</strong>tensity is proportional to slope.<br />

Automatic model selection can be performed us<strong>in</strong>g step. By default, this performs stepwise<br />

deletion. Start<strong>in</strong>g from the fitted model, the procedure considers each term <strong>in</strong> the model, <strong>and</strong><br />

determ<strong>in</strong>es whether the term should be deleted (accord<strong>in</strong>g to AIC). The deletion giv<strong>in</strong>g the<br />

biggest improvement <strong>in</strong> AIC is carried out. This is applied recursively until no more terms can<br />

be deleted.<br />

> X fit step(fit)<br />

Copyright c○CSIRO 2008


13.5 Simulat<strong>in</strong>g the fitted model 89<br />

Start: AIC=-707.35<br />

~x + y<br />

Df AIC<br />

- y 1 -709.35<br />

- x 1 -707.83<br />

-707.35<br />

Step: AIC=-709.35<br />

~x<br />

Df AIC<br />

- x 1 -709.83<br />

-709.35<br />

Step: AIC=-709.83<br />

~1<br />

Stationary Poisson process<br />

Uniform <strong>in</strong>tensity: 99<br />

13.5 Simulat<strong>in</strong>g the fitted model<br />

A fitted Poisson model can be simulated automatically us<strong>in</strong>g the function rmh.<br />

> X plot(X, ma<strong>in</strong> = "realisation <strong>of</strong> fitted model")<br />

realisation <strong>of</strong> fitted model<br />

Copyright c○CSIRO 2008


90 Methods 4: check<strong>in</strong>g a fitted Poisson model<br />

14 Methods 4: check<strong>in</strong>g a fitted Poisson model<br />

After fitt<strong>in</strong>g a <strong>po<strong>in</strong>t</strong> process model to a <strong>po<strong>in</strong>t</strong> pattern dataset, we should check that the model is a<br />

good fit (‘goodness-<strong>of</strong>-fit’), <strong>and</strong> that each component assumption <strong>of</strong> the model was appropriate<br />

(‘validation’). This section presents some techniques available for check<strong>in</strong>g a fitted Poisson<br />

model.<br />

Model check<strong>in</strong>g can be either ‘formal’ or ‘<strong>in</strong>formal’. Formal techniques are based on detailed<br />

probabilistic assumptions about the data, <strong>and</strong> allow us to make probabilistic statements about<br />

the outcome. They <strong>in</strong>clude hypothesis tests (χ 2 tests, goodness-<strong>of</strong>-fit tests, Monte Carlo tests)<br />

<strong>and</strong> Bayesian model selection.<br />

In contrast, ‘<strong>in</strong>formal’ tools do not impose assumptions on the data <strong>and</strong> their <strong>in</strong>terpretation<br />

depends on human judgement. A typical example is the residual, def<strong>in</strong>ed for each observation by<br />

(residual) = (observed) - (fitted). If the model is a good fit, then the residuals should<br />

be ‘noise’, centred around zero.<br />

14.1 Goodness-<strong>of</strong>-fit<br />

A goodness-<strong>of</strong>-fit test is a formal test <strong>of</strong> the null hypothesis that the model is true, aga<strong>in</strong>st the<br />

very general alternative that the model is not true.<br />

The χ 2 goodness-<strong>of</strong>-fit test based on quadrat counts can be applied to a fitted Poisson model,<br />

homogeneous or <strong>in</strong>homogeneous. Under the null hypothesis, the quadrat counts are <strong>in</strong>dependent<br />

Poisson variables with different mean values, <strong>and</strong> the means are estimated by the fitted model.<br />

> data(bei)<br />

> fit M M<br />

Chi-squared test <strong>of</strong> fitted model ’fit’ us<strong>in</strong>g quadrat counts<br />

data: data from fit<br />

X-squared = 711.5036, df = 6, p-value < 2.2e-16<br />

If (as <strong>in</strong> this case) the formal goodness-<strong>of</strong>-fit test rejects the fitted model, we would then like<br />

to ga<strong>in</strong> an <strong>in</strong>formal impression <strong>of</strong> the type <strong>of</strong> departure from the model (i.e. <strong>in</strong> what way the<br />

data appear to depart from the predictions <strong>of</strong> the model) so that we may formulate a better<br />

model. To do this we can <strong>in</strong>spect the residual counts.<br />

> plot(bei, pch = ".")<br />

> plot(M, add = TRUE, cex = 1.5, col = "red")<br />

Copyright c○CSIRO 2008


14.2 Validation us<strong>in</strong>g residuals 91<br />

bei<br />

666 601 677 478.5 130 402.8 481 319.7<br />

2.7 9.1 −14 9<br />

544 601.6 165 477.9 643 402.3 298 320.2<br />

−2.3 −14 12 −1.2<br />

The plot displays, for each quadrat, the observed number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s (top left), the predicted<br />

number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s accord<strong>in</strong>g to the model (top right), <strong>and</strong> the Pearson residual (bottom) def<strong>in</strong>ed<br />

by<br />

(observed) − (expected)<br />

Pearson residual = √<br />

expected<br />

If the orig<strong>in</strong>al data were Poisson, this transformation approximately st<strong>and</strong>ardises the residuals<br />

so that they have mean zero <strong>and</strong> variance 1 when the model is true. A Pearson residual <strong>of</strong> −14<br />

is a gross departure from the fitted model.<br />

The Kolmogorov-Smirnov test can also be applied to a fitted Poisson model, with homogeneous<br />

or <strong>in</strong>homogeneous <strong>in</strong>tensity.<br />

> kstest(fit, function(x, y) {<br />

+ y<br />

+ })<br />

Spatial Kolmogorov-Smirnov test <strong>of</strong> <strong>in</strong>homogeneous Poisson process<br />

data: covariate ’function(x, y) { y}’ evaluated at <strong>po<strong>in</strong>t</strong>s <strong>of</strong> ’bei’<br />

<strong>and</strong> transformed to uniform distribution under ’fit’<br />

D = 0.1019, p-value < 2.2e-16<br />

alternative hypothesis: two-sided<br />

This uses the method kstest.ppm for the generic function kstest.<br />

14.2 Validation us<strong>in</strong>g residuals<br />

14.2.1 Residuals<br />

Residuals from the fitted model are an important diagnostic tool <strong>in</strong> other areas <strong>of</strong> applied<br />

statistics, but <strong>in</strong> <strong>spatial</strong> statistics they have only recently been developed ([39, 45], [44, pp.<br />

49–50], [6]).<br />

For a fitted Poisson process model, with fitted <strong>in</strong>tensity λ(u), the predicted number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s<br />

fall<strong>in</strong>g <strong>in</strong> any region B is <br />

B λ(u)du. Hence the residual <strong>in</strong> each region B ⊂ R2 is def<strong>in</strong>ed [6]<br />

to be the observed m<strong>in</strong>us predicted number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s fall<strong>in</strong>g <strong>in</strong> B: [6]<br />

<br />

R(B) = n(x ∩ B) − λ(u)du (11)<br />

B<br />

Copyright c○CSIRO 2008


92 Methods 4: check<strong>in</strong>g a fitted Poisson model<br />

where x is the observed <strong>po<strong>in</strong>t</strong> pattern, n(x ∩ B) the number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>of</strong> x <strong>in</strong> the region B, <strong>and</strong><br />

λ(u) is the <strong>in</strong>tensity <strong>of</strong> the fitted model.<br />

These residuals are closely related to the residuals for quadrat counts that were used above.<br />

Tak<strong>in</strong>g the set B to be one <strong>of</strong> our quadrats, the ‘observed’ quadrat count is n(x ∩ B). The<br />

‘expected’ quadrat count is λarea(B) if the model is CSR, or more generally <br />

B λ(u)du if the<br />

model is an <strong>in</strong>homogeneous Poisson process. Hence the ‘raw residual’ is observed - expected<br />

= n(x ∩ B) − <br />

B λ(u)du.<br />

14.2.2 Residual measure<br />

Equation (11) def<strong>in</strong>es the total residual for any region B, large or small.<br />

Intuitively the residuals can be visualised as an electric charge, with unit positive charge at<br />

each data <strong>po<strong>in</strong>t</strong>, <strong>and</strong> a diffuse negative charge at all other locations u, with density ˆ λ(u). If the<br />

model is true, then these charges should approximately cancel.<br />

If we’d like to visualise this electric charge, one way is to plot the observed <strong>po<strong>in</strong>t</strong>s <strong>and</strong> the<br />

fitted <strong>in</strong>tensity function together:<br />

> data(bei)<br />

> fit plot(predict(fit))<br />

> plot(bei, add = TRUE, pch = "+")<br />

predict(fit)<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ + + +<br />

+ ++<br />

+<br />

+<br />

++<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ ++ +<br />

+ + ++ +<br />

+<br />

+<br />

+ + ++ + + +<br />

+ +<br />

+ + +<br />

+ ++<br />

++ + +<br />

+ + +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

++<br />

+<br />

++<br />

+<br />

+<br />

+ +<br />

+<br />

+ + + +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

++ +<br />

+<br />

++<br />

+ +<br />

+ + + + +<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + +++<br />

+<br />

+<br />

+ +<br />

++ +<br />

+<br />

++<br />

+<br />

+<br />

+ +<br />

++<br />

++ +<br />

++<br />

+<br />

+<br />

++ + +<br />

++<br />

+<br />

++<br />

+<br />

+<br />

+<br />

++<br />

+++<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ +++<br />

++<br />

+<br />

+<br />

+ ++ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

++<br />

+<br />

++<br />

+ ++<br />

++<br />

++<br />

+<br />

+<br />

+<br />

++<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

++ ++<br />

+<br />

+<br />

+ ++<br />

++<br />

+<br />

++ + +<br />

+<br />

++<br />

+ ++ +<br />

+<br />

++ +<br />

+ +<br />

+ +<br />

+++<br />

+<br />

+<br />

++<br />

+ + ++++<br />

++<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

++ ++++++<br />

+++<br />

+<br />

+ +<br />

+<br />

+ ++ +<br />

++<br />

+ ++<br />

+<br />

+ +<br />

+++<br />

+<br />

+ ++ +<br />

+<br />

++<br />

++<br />

+<br />

+<br />

+ +<br />

+<br />

++ +<br />

+ + +<br />

+<br />

++<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+++<br />

++<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

++ ++++<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+<br />

++<br />

+<br />

+ +<br />

+<br />

++ +<br />

+ +<br />

++ + +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+ ++<br />

+++ +<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+ ++<br />

+ ++ + +<br />

+ +<br />

+<br />

++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++<br />

++<br />

+<br />

+<br />

++<br />

+ +<br />

++ ++<br />

++ +<br />

+ ++<br />

+<br />

+<br />

+<br />

++<br />

++<br />

+<br />

++<br />

+ +<br />

+<br />

+<br />

+<br />

+ ++ +<br />

+<br />

+ +<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+ + ++<br />

+ +<br />

++<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

++ +<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+ +<br />

+<br />

++<br />

++<br />

+<br />

+<br />

+<br />

++<br />

+ ++<br />

++<br />

+++<br />

+<br />

++<br />

+<br />

+ ++ + +<br />

+ ++<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ ++<br />

+++<br />

+ + +++<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

++<br />

++ +<br />

++<br />

++<br />

+ +<br />

++ + + ++<br />

+<br />

+<br />

++<br />

+<br />

+ +<br />

+ ++<br />

+<br />

++<br />

+ +<br />

+<br />

++<br />

+<br />

+ +<br />

+ +++<br />

+<br />

++ +<br />

+<br />

+++<br />

+ + +<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

++<br />

++<br />

+ +<br />

+<br />

+ + +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ + ++<br />

+++<br />

+<br />

+<br />

+<br />

++<br />

+<br />

++<br />

+<br />

+<br />

+ + +<br />

+ + + +<br />

++++<br />

+<br />

+<br />

+++<br />

+ +<br />

+ ++<br />

++<br />

+ +<br />

+<br />

+ ++<br />

+ +++<br />

+<br />

+<br />

++<br />

+ + +<br />

+++<br />

+ ++<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

++ +<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

++<br />

++<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ ++<br />

++<br />

+ + ++<br />

+<br />

+<br />

+<br />

++++<br />

+ +<br />

+<br />

+<br />

+ +<br />

+++<br />

+<br />

++<br />

+ +++<br />

+<br />

+<br />

+<br />

+ +<br />

+++<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

++<br />

++<br />

+++<br />

+<br />

+<br />

+++<br />

+<br />

++<br />

+<br />

+ +<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

++<br />

+<br />

+<br />

++<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

++ +<br />

++ +++<br />

+<br />

++<br />

+<br />

+<br />

++ +<br />

+<br />

+<br />

+ + + +<br />

+<br />

+ ++++<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

++<br />

++<br />

+<br />

+<br />

+++<br />

+<br />

++<br />

+<br />

++ +<br />

++<br />

++<br />

+<br />

++<br />

+++ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ ++<br />

+++<br />

+<br />

+++<br />

+ +<br />

+++<br />

+<br />

+ + ++<br />

+<br />

+ +<br />

+<br />

++<br />

++ +<br />

+<br />

+<br />

++<br />

+<br />

+ ++ ++<br />

+++<br />

+<br />

+<br />

+ +<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+++<br />

+ +<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

++ +<br />

+ +<br />

+<br />

+ ++ ++ ++<br />

+<br />

+++ +<br />

+<br />

+<br />

+ +<br />

++<br />

+<br />

++<br />

+ ++<br />

+<br />

+<br />

++ +<br />

++<br />

++<br />

++<br />

++<br />

+++ ++ +<br />

+<br />

+<br />

++++<br />

+<br />

++<br />

+<br />

++<br />

+ +++++<br />

++<br />

++<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+++<br />

++<br />

+ ++ + ++ +<br />

++ +<br />

+<br />

+<br />

+<br />

+ + +<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

++++<br />

+<br />

++<br />

+<br />

+ +<br />

+++<br />

+<br />

++++<br />

+<br />

+ + +<br />

+<br />

+<br />

+++<br />

++<br />

+ +<br />

++<br />

+<br />

++<br />

+++<br />

+ +<br />

+++<br />

+<br />

++<br />

+ ++<br />

+<br />

+<br />

+ ++<br />

+ + + ++<br />

++<br />

++++ ++<br />

+<br />

+<br />

+<br />

++ +<br />

+<br />

+<br />

+ + ++<br />

++ +<br />

++<br />

+++<br />

+<br />

+<br />

+<br />

+ ++<br />

++<br />

+<br />

+ ++<br />

+++<br />

+<br />

+<br />

+ ++<br />

+ ++ +<br />

++<br />

+ + +<br />

++ +<br />

++ + +<br />

+<br />

++<br />

+ ++<br />

+++ +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+++<br />

+ +<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

++<br />

+ +<br />

+ ++ +<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

++<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ ++<br />

+<br />

+<br />

+<br />

++<br />

++ +<br />

++ + +<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+++<br />

+<br />

+<br />

+<br />

+ +++<br />

+ ++<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

++<br />

+<br />

++<br />

++ ++ + ++ +<br />

+<br />

+ ++ +<br />

+<br />

++<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

++ +<br />

+ +<br />

+ +<br />

+ +<br />

+<br />

+ +<br />

++ +<br />

++<br />

++ + ++ ++ + +<br />

+<br />

+++<br />

+<br />

+<br />

+<br />

+<br />

+++<br />

+<br />

++ + + +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ ++<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

+++<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++ +<br />

+ +<br />

+<br />

++<br />

+ +<br />

+ + +<br />

+<br />

++<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

++<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+ +++<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++ +<br />

+<br />

+<br />

+<br />

++<br />

++<br />

+ +<br />

+<br />

+<br />

+ + +<br />

+ +<br />

+<br />

++<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

++<br />

+<br />

+<br />

++++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+<br />

+<br />

++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++++<br />

+<br />

+<br />

+<br />

+ + +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

++ +<br />

+<br />

+<br />

+<br />

+ ++ +<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+ +<br />

+ +<br />

+<br />

++<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+ +++ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+++<br />

++++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+<br />

+ +++<br />

+ ++<br />

+ +<br />

++<br />

+ + + + + +<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++++<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

++<br />

+ + ++<br />

+<br />

+<br />

+++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+ +<br />

++ +<br />

+<br />

+<br />

+ +<br />

++<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+ + +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+ ++<br />

+<br />

+ + +<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ ++ +<br />

+<br />

+<br />

++ +<br />

+<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+ + + +<br />

+<br />

+ +<br />

+<br />

+<br />

+ +<br />

+<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

+ + +<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+ +<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

++<br />

+ +<br />

+<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+ +<br />

+<br />

++<br />

+<br />

+ +<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++<br />

+<br />

++<br />

+ +<br />

+ + + + +<br />

+ + +<br />

++<br />

+<br />

+<br />

+<br />

+<br />

+<br />

++ +<br />

+<br />

+<br />

Each data <strong>po<strong>in</strong>t</strong> should be visualised as a charge <strong>of</strong> +1, while the colour image <strong>in</strong>dicates a<br />

negative charge density. If the model is true then these positive <strong>and</strong> negative charges should<br />

even out to zero.<br />

14.2.3 Smoothed residuals<br />

A more useful way to visualise the residuals is to smooth them.<br />

> data(bei)<br />

> fitx diagnose.ppm(fitx, which = "smooth")<br />

Copyright c○CSIRO 2008<br />

0.004 0.006 0.008 0.01 0.012


14.2 Validation us<strong>in</strong>g residuals 93<br />

0.002<br />

0.003<br />

0.001<br />

0.006<br />

0.005<br />

0.004<br />

0<br />

−0.001<br />

−0.002<br />

Smoothed raw residuals<br />

−0.003<br />

−0.004<br />

This is an image plot <strong>of</strong> the ‘smoothed residual field’<br />

where λ(u) is the nonparametric, kernel estimate <strong>of</strong> the <strong>in</strong>tensity,<br />

−0.003<br />

−0.002<br />

−0.001<br />

0<br />

0.005<br />

0.006<br />

0.004<br />

0.001<br />

0.003<br />

0.002<br />

0.002<br />

0.003<br />

0.001<br />

0<br />

−0.001<br />

−0.002<br />

s(u) = λ(u) − λ † (u) (12)<br />

n(x) <br />

λ(u) = e(u) κ(u − xi)<br />

i=1<br />

while λ † (u) is a correspond<strong>in</strong>gly-smoothed version <strong>of</strong> the parametric estimate <strong>of</strong> the <strong>in</strong>tensity<br />

accord<strong>in</strong>g to the fitted model,<br />

λ † <br />

(u) = e(u) κ(u − v)λˆ θ (v)dv.<br />

W<br />

Here κ is the smooth<strong>in</strong>g kernel <strong>and</strong> e(u) is the edge correction (2) on page 67. The difference<br />

(12) should be approximately zero if the model is true.<br />

In this example the smoothed residual image conta<strong>in</strong>s a visible trend, suggest<strong>in</strong>g that the<br />

model is <strong>in</strong>appropriate.<br />

14.2.4 Lurk<strong>in</strong>g variable plot<br />

If there is a <strong>spatial</strong> covariate Z(u) that plays an important role <strong>in</strong> the analysis, it may be useful<br />

to display a lurk<strong>in</strong>g variable plot <strong>of</strong> the residuals aga<strong>in</strong>st Z. This is a plot <strong>of</strong> C(z) = R(B(z))<br />

aga<strong>in</strong>st z, where<br />

B(z) = {u ∈ W : Z(u) ≤ z}<br />

is the region <strong>of</strong> space where the covariate value is less than or equal to z.<br />

> grad lurk<strong>in</strong>g(fitx, grad, type = "raw")<br />

Copyright c○CSIRO 2008


94 Methods 4: check<strong>in</strong>g a fitted Poisson model<br />

cumulative raw residuals<br />

−600 −400 −200 0<br />

0.00 0.05 0.10 0.15 0.20 0.25 0.30<br />

covariate<br />

Note that the lurk<strong>in</strong>g variable plot typically starts <strong>and</strong> ends at the horizontal axis, s<strong>in</strong>ce (for<br />

any model with an <strong>in</strong>tercept term) the total residual for the entire w<strong>in</strong>dow W must equal zero.<br />

This is analogous to the fact that the residuals <strong>in</strong> l<strong>in</strong>ear regression sum to zero.<br />

The plot also shows approximate 5% significance b<strong>and</strong>s for the cumulative residual C(x) or<br />

C(y), obta<strong>in</strong>ed from the asymptotic variance under the model.<br />

This plot <strong>in</strong>dicates that the model is grossly <strong>in</strong>adequate; the fitted <strong>in</strong>tensity function fails to<br />

capture the dependence <strong>of</strong> <strong>in</strong>tensity on slope.<br />

It can be helpful to display the derivative C ′ (z), which <strong>of</strong>ten <strong>in</strong>dicates which values <strong>of</strong> z are<br />

associated with a lack <strong>of</strong> fit.<br />

> lurk<strong>in</strong>g(fitx, grad, type = "raw", cumulative = FALSE)<br />

marg<strong>in</strong>al raw residuals<br />

−30000 −10000 0 10000 20000 30000<br />

0.00 0.05 0.10 0.15 0.20 0.25 0.30<br />

Copyright c○CSIRO 2008<br />

covariate


14.2 Validation us<strong>in</strong>g residuals 95<br />

The derivative is estimated us<strong>in</strong>g a smooth<strong>in</strong>g spl<strong>in</strong>e <strong>and</strong> you may need to tweak the smooth<strong>in</strong>g<br />

parameters (argument spl<strong>in</strong>eargs) to get a useful plot. Also the package currently does<br />

not plot significance b<strong>and</strong>s for C ′ (z).<br />

14.2.5 Four-panel plot<br />

If there are no <strong>spatial</strong> covariates, use the comm<strong>and</strong> diagnose.ppm to plot the residuals:<br />

> data(japanesep<strong>in</strong>es)<br />

> fit diagnose.ppm(fit)<br />

cumulative sum <strong>of</strong> raw residuals<br />

−6 −4 −2 0 2 4 6<br />

0 0.2 0.4 0.6 0.8 1<br />

x coord<strong>in</strong>ate<br />

−15<br />

20<br />

−5<br />

5<br />

15<br />

cumulative sum <strong>of</strong> raw residuals<br />

8 6 4 2 0 −2 −4 −6<br />

This comb<strong>in</strong>ation <strong>of</strong> four plots has proved to be a useful quick <strong>in</strong>dication <strong>of</strong> departure from<br />

the trend <strong>in</strong> the model.<br />

The bottom right panel is an image <strong>of</strong> the smoothed residual field.<br />

0<br />

10<br />

−20<br />

−25<br />

0<br />

35<br />

−10<br />

25<br />

20<br />

15<br />

10<br />

5<br />

−15<br />

30<br />

−5<br />

−10<br />

0<br />

10<br />

−5<br />

5<br />

20<br />

30<br />

15<br />

−10<br />

−20<br />

−25<br />

−15<br />

−15<br />

0<br />

5<br />

−5<br />

25<br />

35<br />

0 0.2 0.4 0.6 0.8 1<br />

y coord<strong>in</strong>ate<br />

Copyright c○CSIRO 2008


96 Methods 4: check<strong>in</strong>g a fitted Poisson model<br />

The top left panel is a direct representation <strong>of</strong> the residual ‘charge’, with circles represent<strong>in</strong>g<br />

the data <strong>po<strong>in</strong>t</strong>s (positive residuals) <strong>and</strong> a colour scheme represent<strong>in</strong>g the fitted <strong>in</strong>tensity (negative<br />

residuals). However, it is <strong>of</strong>ten difficult to <strong>in</strong>terpret.<br />

The two other panels are lurk<strong>in</strong>g variables aga<strong>in</strong>st one <strong>of</strong> the cartesian coord<strong>in</strong>ates. For<br />

example, the bottom left panel is a lurk<strong>in</strong>g variable plot for the x-coord<strong>in</strong>ate. Imag<strong>in</strong>e a vertical<br />

l<strong>in</strong>e which sweeps from left to right across the w<strong>in</strong>dow. The progressive total residual to the left<br />

<strong>of</strong> the l<strong>in</strong>e is plotted aga<strong>in</strong>st the position <strong>of</strong> the l<strong>in</strong>e.<br />

In this example, the lurk<strong>in</strong>g variable plot for the y coord<strong>in</strong>ate suggests a lack <strong>of</strong> fit at about<br />

y = 0.15, <strong>and</strong> the image <strong>of</strong> the smoothed residual field suggests an excess <strong>of</strong> positive residuals<br />

at about x = 0.8,y = 0.15, both <strong>in</strong>dicat<strong>in</strong>g that the model underestimates the true <strong>in</strong>tensity <strong>of</strong><br />

<strong>po<strong>in</strong>t</strong>s <strong>in</strong> this vic<strong>in</strong>ity.<br />

14.2.6 Caveats<br />

The residual plots described above are useful for detect<strong>in</strong>g misspecification <strong>of</strong> the trend <strong>in</strong> a fitted<br />

Poisson process model. They are not very useful for check<strong>in</strong>g the <strong>in</strong>dependence assumption, that<br />

is, for check<strong>in</strong>g the properties (PP3)–(PP4) <strong>of</strong> a Poisson process listed on page 72.<br />

Effective diagnostics <strong>of</strong> <strong>in</strong>dependence or dependence between <strong>po<strong>in</strong>t</strong>s <strong>in</strong>clude the K-function<br />

(section 16.4) <strong>and</strong> a Q–Q plot <strong>of</strong> the residuals (section 22.2.3).<br />

Copyright c○CSIRO 2008


14.2 Validation us<strong>in</strong>g residuals 97<br />

PART IV. INTERACTION<br />

Part IV <strong>of</strong> the workshop expla<strong>in</strong>s how to <strong>in</strong>vestigate dependence between the <strong>po<strong>in</strong>t</strong>s <strong>in</strong> a <strong>po<strong>in</strong>t</strong><br />

pattern.<br />

Copyright c○CSIRO 2008


98 Simple models <strong>of</strong> non-Poisson <strong>patterns</strong><br />

15 Simple models <strong>of</strong> non-Poisson <strong>patterns</strong><br />

A <strong>po<strong>in</strong>t</strong> process that is not Poisson can be said to exhibit ‘<strong>in</strong>teraction’ or dependence between<br />

the <strong>po<strong>in</strong>t</strong>s. It’s time to <strong>in</strong>troduce some models for such processes. This section covers simple<br />

models that are derived from the Poisson process, <strong>and</strong> still reta<strong>in</strong> some <strong>of</strong> the tractable features<br />

<strong>of</strong> the Poisson model.<br />

15.1 Poisson cluster processes<br />

In a Poisson cluster process, we beg<strong>in</strong> with a Poisson process Y <strong>of</strong> ‘parent’ <strong>po<strong>in</strong>t</strong>s. Each ‘parent’<br />

<strong>po<strong>in</strong>t</strong> yi ∈ Y then gives rise to a f<strong>in</strong>ite set Zi <strong>of</strong> ‘<strong>of</strong>fspr<strong>in</strong>g’ <strong>po<strong>in</strong>t</strong>s accord<strong>in</strong>g to some stochastic<br />

mechanism. The set compris<strong>in</strong>g all the <strong>of</strong>fspr<strong>in</strong>g <strong>po<strong>in</strong>t</strong>s forms a <strong>po<strong>in</strong>t</strong> process X. Only X is<br />

observed.<br />

parents clusters <strong>of</strong>fspr<strong>in</strong>g<br />

An example is the Matérn cluster process <strong>in</strong> which the parent <strong>po<strong>in</strong>t</strong>s come from a homogeneous<br />

Poisson process with <strong>in</strong>tensity κ, <strong>and</strong> each parent has a Poisson (µ) number <strong>of</strong> <strong>of</strong>fspr<strong>in</strong>g,<br />

<strong>in</strong>dependently <strong>and</strong> uniformly distributed <strong>in</strong> a disc <strong>of</strong> radius r centred around the parent.<br />

The Matérn cluster process can be generated <strong>in</strong> spatstat us<strong>in</strong>g the comm<strong>and</strong> rMatClust.<br />

[By convention, r<strong>and</strong>om data generators <strong>in</strong> R always have names beg<strong>in</strong>n<strong>in</strong>g with r.]<br />

> plot(rMatClust(kappa = 10, r = 0.1, mu = 5))<br />

rMatClust(kappa = 10, r = 0.1, mu = 5)<br />

Other Poisson cluster processes implemented <strong>in</strong> spatstat are<br />

• rThomas: the Thomas process, <strong>in</strong> which each cluster consists <strong>of</strong> a Poisson(µ) number <strong>of</strong><br />

r<strong>and</strong>om <strong>po<strong>in</strong>t</strong>s, each hav<strong>in</strong>g an isotropic Gaussian N(0,σ 2 I) displacement from its parent.<br />

Copyright c○CSIRO 2008


15.2 Cox processes 99<br />

• rGaussPoisson: the Gauss-Poisson process <strong>in</strong> which each cluster is either a s<strong>in</strong>gle <strong>po<strong>in</strong>t</strong><br />

or a pair <strong>of</strong> <strong>po<strong>in</strong>t</strong>s.<br />

• rNeymanScott: the general Neyman-Scott cluster process <strong>in</strong> which the cluster mechanism<br />

is arbitrary.<br />

15.2 Cox processes<br />

A Cox <strong>po<strong>in</strong>t</strong> process is effectively a Poisson process with a r<strong>and</strong>om <strong>in</strong>tensity function. Let Λ(u)<br />

be a r<strong>and</strong>om function with non-negative values, def<strong>in</strong>ed at all locations u ∈ R 2 . Conditional on<br />

Λ, let X be a Poisson process with <strong>in</strong>tensity function Λ. Then X is a Cox process.<br />

A trivial example is the “mixed Poisson” process <strong>in</strong> which we generate a r<strong>and</strong>om variable Λ<br />

<strong>and</strong>, conditional on Λ, generate a uniform Poisson process with <strong>in</strong>tensity Λ. Follow<strong>in</strong>g are three<br />

different realisations <strong>of</strong> this process:<br />

> par(mfrow = c(1, 3))<br />

> for (i <strong>in</strong> 1:3) {<br />

+ lambda


100 Simple models <strong>of</strong> non-Poisson <strong>patterns</strong><br />

Poisson process, the result<strong>in</strong>g process <strong>of</strong> reta<strong>in</strong>ed <strong>po<strong>in</strong>t</strong>s is Poisson. To get a non-Poisson process<br />

we need some k<strong>in</strong>d <strong>of</strong> dependent th<strong>in</strong>n<strong>in</strong>g mechanism.<br />

In Matérn’s Model I, a homogeneous Poisson process Y is first generated. Any <strong>po<strong>in</strong>t</strong> <strong>in</strong> Y<br />

that lies closer than a distance r from the nearest other <strong>po<strong>in</strong>t</strong> <strong>of</strong> Y, is deleted. Thus, pairs <strong>of</strong><br />

close neighbours annihilate each other.<br />

> plot(rMaternI(70, 0.05))<br />

rMaternI(70, 0.05)<br />

In Matérn’s Model II, the <strong>po<strong>in</strong>t</strong>s <strong>of</strong> the homogeneous Poisson process Y are marked by ‘arrival<br />

times’ ti which are <strong>in</strong>dependent <strong>and</strong> uniformly distributed <strong>in</strong> [0,1]. Any <strong>po<strong>in</strong>t</strong> <strong>in</strong> Y that lies<br />

closer than distance r from another <strong>po<strong>in</strong>t</strong> that has an earlier arrival time, is deleted.<br />

> plot(rMaternII(70, 0.05))<br />

rMaternII(70, 0.05)<br />

15.4 Sequential models<br />

In a sequential model, we start with an empty w<strong>in</strong>dow, <strong>and</strong> the <strong>po<strong>in</strong>t</strong>s are placed <strong>in</strong>to the w<strong>in</strong>dow<br />

one-at-a-time, accord<strong>in</strong>g to some criterion.<br />

In Simple Sequential Inhibition, each new <strong>po<strong>in</strong>t</strong> is generated uniformly <strong>in</strong> the w<strong>in</strong>dow <strong>and</strong><br />

<strong>in</strong>dependently <strong>of</strong> preced<strong>in</strong>g <strong>po<strong>in</strong>t</strong>s. If the new <strong>po<strong>in</strong>t</strong> lies closer than r units from an exist<strong>in</strong>g<br />

<strong>po<strong>in</strong>t</strong>, then it is rejected <strong>and</strong> another r<strong>and</strong>om <strong>po<strong>in</strong>t</strong> is generated. The process term<strong>in</strong>ates when<br />

no further <strong>po<strong>in</strong>t</strong>s can be added.<br />

Copyright c○CSIRO 2008


15.4 Sequential models 101<br />

> plot(rSSI(0.05, 200))<br />

rSSI(0.05, 200)<br />

Sequential <strong>po<strong>in</strong>t</strong> processes are the easiest way to generate highly ordered <strong>patterns</strong> with high<br />

<strong>in</strong>tensity.<br />

Copyright c○CSIRO 2008


102 Methods 5: Distance methods for <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

16 Methods 5: Distance methods for <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

Suppose that a <strong>po<strong>in</strong>t</strong> pattern appears to have constant <strong>in</strong>tensity, <strong>and</strong> we wish to assess whether<br />

the pattern is Poisson. The alternative is that the <strong>po<strong>in</strong>t</strong>s are dependent (they exhibit ‘<strong>in</strong>teraction’).<br />

Classical writers suggested a simple trichotomy between ‘<strong>in</strong>dependence’ (the Poisson process),<br />

‘regularity’ (where <strong>po<strong>in</strong>t</strong>s tend to avoid each other), <strong>and</strong> ‘cluster<strong>in</strong>g’ (where <strong>po<strong>in</strong>t</strong>s tend to be<br />

close together). [The concept <strong>of</strong> ‘cluster<strong>in</strong>g’ does not imply that the <strong>po<strong>in</strong>t</strong>s are organised <strong>in</strong>to<br />

identifiable ‘clusters’; merely that they are closer together than expected for a Poisson process.]<br />

16.1 Distances<br />

<strong>in</strong>dependent regular clustered<br />

The classical techniques for <strong>in</strong>vestigat<strong>in</strong>g <strong>in</strong>ter<strong>po<strong>in</strong>t</strong> <strong>in</strong>teraction are distance methods, based on<br />

measur<strong>in</strong>g the distances between <strong>po<strong>in</strong>t</strong>s. Specifically we may consider<br />

• pairwise distances sij = ||xi − xj|| between all dist<strong>in</strong>ct pairs <strong>of</strong> <strong>po<strong>in</strong>t</strong>s xi <strong>and</strong> xj (i = j)<br />

<strong>in</strong> the pattern;<br />

• nearest neighbour distances ti = m<strong>in</strong>j=i sij, the distance from each <strong>po<strong>in</strong>t</strong> xi to its<br />

nearest neighbour;<br />

• empty space distances d(u) = m<strong>in</strong>i ||u−xi||, the distance from a fixed reference location<br />

u <strong>in</strong> the w<strong>in</strong>dow to the nearest data <strong>po<strong>in</strong>t</strong>.<br />

If you need to compute these directly, they are available <strong>in</strong> spatstat us<strong>in</strong>g the functions<br />

pairdist, nndist <strong>and</strong> distmap respectively. If X is a <strong>po<strong>in</strong>t</strong> pattern object,<br />

• pairdist(X) returns the matrix <strong>of</strong> pairwise distances.<br />

• nndist(X) returns the vector <strong>of</strong> nearest neighbour distances.<br />

• distmap(X) returns a pixel image whose pixel values are the empty space distances to the<br />

pattern X measured from every pixel.<br />

> data(cells)<br />

> emp plot(emp, ma<strong>in</strong> = "Empty space distances")<br />

> plot(cells, add = TRUE)<br />

Copyright c○CSIRO 2008


16.2 Empty space distances 103<br />

Empty space distances<br />

0 0.05 0.1 0.15 0.2 0.25<br />

Tip: Quite a useful exploratory tool is the Stienen diagram obta<strong>in</strong>ed by draw<strong>in</strong>g a<br />

circle around each data <strong>po<strong>in</strong>t</strong> <strong>of</strong> diameter equal to its nearest neighbour distance:<br />

> plot(X %mark% (nndist(X)/2), markscale = 1, ma<strong>in</strong> = "Stienen diagram")<br />

Stienen diagram<br />

16.2 Empty space distances<br />

It’s easiest to start by expla<strong>in</strong><strong>in</strong>g the analysis <strong>of</strong> the empty space distances<br />

The distance<br />

d(u,x) = m<strong>in</strong>{||u − xi|| : xi ∈ x}<br />

from a fixed location u ∈ R 2 to the nearest <strong>po<strong>in</strong>t</strong> <strong>in</strong> a <strong>po<strong>in</strong>t</strong> pattern x, is called the ‘empty<br />

space distance’ or ‘void distance’. It can be computed for all locations u on a f<strong>in</strong>e grid, us<strong>in</strong>g<br />

the spatstat function distmap as we saw above.<br />

Copyright c○CSIRO 2008


104 Methods 5: Distance methods for <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

16.2.1 Edge effects<br />

It is not easy to <strong>in</strong>terpret a histogram <strong>of</strong> the empty space distances. The empirical distribution <strong>of</strong><br />

the empty space distances depends on the geometry <strong>of</strong> the w<strong>in</strong>dow W as well as on characteristics<br />

<strong>of</strong> the <strong>po<strong>in</strong>t</strong> process X.<br />

Another view<strong>po<strong>in</strong>t</strong> is that the w<strong>in</strong>dow <strong>in</strong>troduces a sampl<strong>in</strong>g bias. Recall that under the<br />

‘st<strong>and</strong>ard model’ (Section 2.3) the <strong>po<strong>in</strong>t</strong> process X extends throughout 2-D space, but is observed<br />

only <strong>in</strong>side W. This leads to bias <strong>in</strong> the distance measurements. Conf<strong>in</strong><strong>in</strong>g observations to a<br />

w<strong>in</strong>dow W implies that the observed distance d(u,x) = d(u,X ∩ W) to the nearest data <strong>po<strong>in</strong>t</strong><br />

<strong>in</strong>side W, may be greater than the true distance d(u,X) to the nearest <strong>po<strong>in</strong>t</strong> <strong>of</strong> the complete<br />

<strong>po<strong>in</strong>t</strong> process X.<br />

16.2.2 Empty space function F<br />

observed true<br />

Ignor<strong>in</strong>g the edge problems for a moment, let us focus on the entire <strong>po<strong>in</strong>t</strong> process X.<br />

Assum<strong>in</strong>g X is stationary (statistically <strong>in</strong>variant under translations), we can def<strong>in</strong>e the cumulative<br />

distribution function <strong>of</strong> the empty space distance<br />

F(r) = P {d(u,X) ≤ r} (13)<br />

where u is an arbitrary reference location. If the process is stationary then this def<strong>in</strong>ition does<br />

not depend on u.<br />

The empirical distribution function <strong>of</strong> the observed empty space distances on a grid <strong>of</strong> loca-<br />

tions uj, j = 1,... ,m,<br />

F ∗ (r) = 1<br />

m<br />

<br />

1 {d(uj,x) ≤ r} (14)<br />

j<br />

is a negatively biased estimator <strong>of</strong> F(r), for reasons expla<strong>in</strong>ed above.<br />

Corrections for this ‘edge effect bias’ are required. Many edge corrections are available.<br />

Typically they are weighted versions <strong>of</strong> the ecdf,<br />

F(r) = <br />

e(uj,r)1 {d(uj,x) ≤ r} (15)<br />

j<br />

where e(u,r) is an edge correction weight designed so that F(r) is unbiased. These corrections<br />

are effectively forms <strong>of</strong> the Horvitz-Thompson estimator <strong>of</strong> survey sampl<strong>in</strong>g fame.<br />

Copyright c○CSIRO 2008


16.2 Empty space distances 105<br />

The edge effect problem can also be regarded as a form <strong>of</strong> censor<strong>in</strong>g (analogous to rightcensor<strong>in</strong>g<br />

<strong>in</strong> survival data), as first <strong>po<strong>in</strong>t</strong>ed out by CSIRO researcher Ge<strong>of</strong>f Laslett [33]. A<br />

counterpart <strong>of</strong> the Kaplan-Meier estimator is available. For further <strong>in</strong>formation see [7].<br />

Thus, assum<strong>in</strong>g that the <strong>po<strong>in</strong>t</strong> process is homogeneous, we are able to compute an unbiased<br />

<strong>and</strong> reasonably accurate estimate <strong>of</strong> the empty space function F def<strong>in</strong>ed by (13).<br />

To <strong>in</strong>terpret this estimate, a useful benchmark is the Poisson process. Notice that d(u,X) > r<br />

if <strong>and</strong> only if there are no <strong>po<strong>in</strong>t</strong>s <strong>of</strong> X <strong>in</strong> the disc b(u,r) <strong>of</strong> radius r centred on u. For a<br />

homogeneous Poisson process <strong>of</strong> <strong>in</strong>tensity λ, the number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s fall<strong>in</strong>g <strong>in</strong> b(u,r) is Poisson<br />

with mean µ = λarea(b(u,r)) = λπr 2 , so the probability that there are no <strong>po<strong>in</strong>t</strong>s <strong>in</strong> this region<br />

is exp(−µ) = exp(−λπr 2 ). Hence for a Poisson process<br />

Fpois(r) = 1 − exp(−λπr 2 ). (16)<br />

Typically we compare F(r) with the value <strong>of</strong> Fpois(r) obta<strong>in</strong>ed by plugg<strong>in</strong>g <strong>in</strong> the estimated<br />

<strong>in</strong>tensity ˆ λ = n(x)/area(W). Values F(r) > Fpois(r) suggest that empty space distances <strong>in</strong> the<br />

<strong>po<strong>in</strong>t</strong> pattern are shorter than for a Poisson process, suggest<strong>in</strong>g a regularly space pattern; while<br />

values F(r) < Fpois(r) suggest a clustered pattern.<br />

16.2.3 Implementation <strong>in</strong> spatstat<br />

The functionFest computes estimates <strong>of</strong> F(r) us<strong>in</strong>g several edge corrections, <strong>and</strong> the benchmark<br />

value for the Poisson process.<br />

> data(cells)<br />

> plot(cells)<br />

> Fc Fc<br />

Function value object (class ’fv’)<br />

for the function r -> F(r)<br />

Entries:<br />

id label description<br />

-- ----- ----------r<br />

r distance argument r<br />

theo Fpois(r) theoretical Poisson F(r)<br />

rs Fbord(r) border corrected estimate <strong>of</strong> F(r)<br />

km Fkm(r) Kaplan-Meier estimate <strong>of</strong> F(r)<br />

hazard lambda(r) Kaplan-Meier estimate <strong>of</strong> hazard function lambda(r)<br />

raw Fraw(r) uncorrected estimate <strong>of</strong> F(r)<br />

--------------------------------------<br />

Default plot formula:<br />

. ~ r<br />

Recommended range <strong>of</strong> argument r: [0, 0.085]<br />

Copyright c○CSIRO 2008


106 Methods 5: Distance methods for <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

cells<br />

Tip: Don’t use F as a variable name! It’s a reserved word — an abbreviation for<br />

FALSE.<br />

The value returned by Fest is an object <strong>of</strong> class "fv" (“function value table”). This is<br />

effectively a data frame with some extra <strong>in</strong>formation. The pr<strong>in</strong>tout for Fc <strong>in</strong>dicates that the<br />

columns <strong>in</strong> the data frame are named r, theo, rs, km, hazard <strong>and</strong> raw. The first column r<br />

conta<strong>in</strong>s a sequence <strong>of</strong> values <strong>of</strong> the function argument r. The next column theo conta<strong>in</strong>s the<br />

correspond<strong>in</strong>g values <strong>of</strong> F(r) for a homogeneous Poisson process. The columns rs, km <strong>and</strong> raw<br />

conta<strong>in</strong> different estimates <strong>of</strong> the empty space function F, namely the ‘reduced sample’ estimator,<br />

the Kaplan-Meier estimator, <strong>and</strong> the uncorrected empirical distribution function, respectively.<br />

The columnhazard conta<strong>in</strong>s an estimate <strong>of</strong> the hazard rate <strong>of</strong> F, i.e. h(r) = (d/dr)log(1−F(r)),<br />

a by-product <strong>of</strong> the Kaplan-Meier estimate.<br />

> par(pty = "s")<br />

> plot(Fest(cells))<br />

lty col<br />

km 1 1<br />

rs 2 2<br />

theo 3 3<br />

F(r)<br />

0.0 0.2 0.4 0.6 0.8<br />

Fest(cells)<br />

0.00 0.02 0.04 0.06 0.08<br />

Copyright c○CSIRO 2008<br />

r


16.2 Empty space distances 107<br />

This is a call to plot.fv. The pr<strong>in</strong>ted output is the return value from plot.fv, which<br />

expla<strong>in</strong>s the encod<strong>in</strong>g <strong>of</strong> the different function estimates us<strong>in</strong>g the R graphics parameters lty<br />

(l<strong>in</strong>e type) <strong>and</strong> col (l<strong>in</strong>e colour).<br />

You’ll notice that, by default, the uncorrected estimate raw <strong>and</strong> the hazard rate hazard were<br />

not plotted. The choice <strong>of</strong> estimates to be plotted, <strong>and</strong> the style <strong>in</strong> which they are plotted, are<br />

controlled by the second argument to plot.fv, which should be an R language formula <strong>in</strong>volv<strong>in</strong>g<br />

the identifier names r, theo, rs, km, hazard <strong>and</strong> raw. To plot the hazard rate aga<strong>in</strong>st r,<br />

> plot(Fest(cells), hazard ~ r, ma<strong>in</strong> = "Hazard rate <strong>of</strong> F")<br />

hazard<br />

0 20 40 60 80<br />

Hazard rate <strong>of</strong> F<br />

0.00 0.02 0.04 0.06 0.08<br />

r<br />

To plot all the estimates <strong>of</strong> F(r), <strong>in</strong>clud<strong>in</strong>g the uncorrected estimate:<br />

> plot(Fest(cells), cb<strong>in</strong>d(km, rs, raw, theo) ~ r)<br />

km , rs , raw , theo<br />

0.0 0.2 0.4 0.6 0.8<br />

Fest(cells)<br />

0.00 0.02 0.04 0.06 0.08<br />

r<br />

Notice the use <strong>of</strong> cb<strong>in</strong>d to specify several different graphs on the same plot.<br />

To plot the estimates <strong>of</strong> F(r) aga<strong>in</strong>st the Poisson value, <strong>in</strong> the style <strong>of</strong> a P–P plot:<br />

> plot(Fest(cells), cb<strong>in</strong>d(km, rs, theo) ~ theo)<br />

Copyright c○CSIRO 2008


108 Methods 5: Distance methods for <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

km , rs , theo<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

Fest(cells)<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

theo<br />

(<strong>in</strong>clud<strong>in</strong>g theo on the left side here gives us the diagonal l<strong>in</strong>e).<br />

The symbol . st<strong>and</strong>s for ‘all recommended estimates <strong>of</strong> the function’. So an abbreviation<br />

for the last comm<strong>and</strong> is<br />

> plot(Fest(cells), . ~ theo)<br />

Transformations can be applied to these function values. For example, to subtract the<br />

theoretical Poisson value from the estimates,<br />

> plot(Fest(cells), . - theo ~ r)<br />

cb<strong>in</strong>d(km, rs, theo) − theo<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

Fest(cells)<br />

0.00 0.02 0.04 0.06 0.08<br />

r<br />

To apply Fisher’s variance stabilis<strong>in</strong>g transformation φ( ˆ F(t)) = s<strong>in</strong> −1 ( ( ˆ F(t))),<br />

> plot(Fest(cells), as<strong>in</strong>(sqrt(.)) ~ r)<br />

Copyright c○CSIRO 2008


16.3 Nearest neighbour distances 109<br />

as<strong>in</strong>(sqrt(cb<strong>in</strong>d(km, rs, theo)))<br />

0.0 0.2 0.4 0.6 0.8 1.0 1.2<br />

Fest(cells)<br />

0.00 0.02 0.04 0.06 0.08<br />

r<br />

16.3 Nearest neighbour distances<br />

For other types <strong>of</strong> distances we encounter similar problems. For the nearest neighbour distances<br />

ti = m<strong>in</strong>j=i ||xi − x|j||, aga<strong>in</strong> it is not easy to <strong>in</strong>terpret a histogram <strong>of</strong> the observed distances.<br />

The empirical distribution <strong>of</strong> the nearest neighbour distances depends on the geometry <strong>of</strong> the<br />

w<strong>in</strong>dow W as well as on characteristics <strong>of</strong> the <strong>po<strong>in</strong>t</strong> process X. Conf<strong>in</strong><strong>in</strong>g observations to a<br />

w<strong>in</strong>dow W implies that the observed nearest-neighbour distances are larger, <strong>in</strong> general, than the<br />

‘true’ nearest neighbour distances <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>in</strong> the entire <strong>po<strong>in</strong>t</strong> process X. Corrections for this<br />

edge effect bias are required.<br />

16.3.1 G function<br />

Assum<strong>in</strong>g the <strong>po<strong>in</strong>t</strong> process X is stationary, we can def<strong>in</strong>e the cumulative distribution function<br />

<strong>of</strong> the nearest-neighbour distance for a typical <strong>po<strong>in</strong>t</strong> <strong>in</strong> the pattern,<br />

G(r) = P {d(u,X \ {u}) ≤ r | u ∈ X} (17)<br />

where u is an arbitrary location, <strong>and</strong> d(u,X \ {u}) is the shortest distance from u to the <strong>po<strong>in</strong>t</strong><br />

pattern X exclud<strong>in</strong>g u itself. If the process is stationary then this def<strong>in</strong>ition does not depend on<br />

u.<br />

The empirical distribution function <strong>of</strong> the observed nearest-neighbour distances<br />

G ∗ (r) = 1<br />

n(x)<br />

<br />

1 {ti ≤ r} (18)<br />

is a negatively biased estimator <strong>of</strong> G(r), for reasons we expla<strong>in</strong>ed above. Many edge corrections<br />

are available. Typically they are weighted versions <strong>of</strong> the ecdf,<br />

G(r) = <br />

e(xi,r)1 {ti ≤ r} (19)<br />

i<br />

where e(xi,r) is an edge correction weight designed so that G(r) is approximately unbiased. A<br />

counterpart <strong>of</strong> the Kaplan-Meier estimator is also available.<br />

For a homogeneous Poisson <strong>po<strong>in</strong>t</strong> process <strong>of</strong> <strong>in</strong>tensity λ, the nearest-neighbour distance<br />

distribution function is known to be<br />

i<br />

Gpois(r) = 1 − exp(−λπr 2 ). (20)<br />

Copyright c○CSIRO 2008


110 Methods 5: Distance methods for <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

This is identical to the empty space function for the Poisson process. Intuitively, because <strong>po<strong>in</strong>t</strong>s<br />

<strong>of</strong> the Poisson process are <strong>in</strong>dependent <strong>of</strong> each other, the knowledge that u is a <strong>po<strong>in</strong>t</strong> <strong>of</strong> X does<br />

not affect any other <strong>po<strong>in</strong>t</strong>s <strong>of</strong> the process, hence G is equivalent to F.<br />

Interpretation <strong>of</strong> G(r) is the reverse <strong>of</strong> F(r). Values G(r) > Gpois(r) suggest that nearest<br />

neighbour distances <strong>in</strong> the <strong>po<strong>in</strong>t</strong> pattern are shorter than for a Poisson process, suggest<strong>in</strong>g a<br />

clustered pattern; while values G(r) < Gpois(r) suggest a regular (<strong>in</strong>hibited) pattern.<br />

The function Gest computes estimates <strong>of</strong> G(r) us<strong>in</strong>g several edge corrections, <strong>and</strong> the benchmark<br />

value for the Poisson process.<br />

> Gc Gc<br />

Function value object (class ’fv’)<br />

for the function r -> G(r)<br />

Entries:<br />

id label description<br />

-- ----- ----------r<br />

r distance argument r<br />

theo Gpois(r) theoretical Poisson G(r)<br />

rs Gbord(r) border corrected estimate <strong>of</strong> G(r)<br />

km Gkm(r) Kaplan-Meier estimate <strong>of</strong> G(r)<br />

hazard lambda(r) Kaplan-Meier estimate <strong>of</strong> hazard function lambda(r)<br />

raw Graw(r) uncorrected estimate <strong>of</strong> G(r)<br />

--------------------------------------<br />

Default plot formula:<br />

. ~ r<br />

Recommended range <strong>of</strong> argument r: [0, 0.15]<br />

> par(pty = "s")<br />

> plot(Gest(cells))<br />

G(r)<br />

0.0 0.2 0.4 0.6 0.8<br />

Gest(cells)<br />

0.00 0.05 0.10 0.15<br />

r<br />

The estimate <strong>of</strong> G(r) suggests strongly that the pattern is regular. Indeed, G(r) is zero for<br />

r ≤ 0.07 which <strong>in</strong>dicates that there are no nearest-neighbour distances shorter than 0.07.<br />

Common ways <strong>of</strong> plott<strong>in</strong>g G <strong>in</strong>clude:<br />

Copyright c○CSIRO 2008


16.4 Pairwise distances <strong>and</strong> the K function 111<br />

G(r) <strong>and</strong> Gpois(r) plotted aga<strong>in</strong>st r plot(Gest(X))<br />

G(r) − Gpois(r) plotted aga<strong>in</strong>st r plot(Gest(X), . - theo ~ r)<br />

G(r) plotted aga<strong>in</strong>st Gpois(r) <strong>in</strong> P–P style plot(Gest(X), . ~ theo)<br />

<strong>and</strong> Fisher’s variance-stabilis<strong>in</strong>g transformation φ(G(t)) = s<strong>in</strong> −1 ( G(t)) applied to the P–P<br />

plot:<br />

> fisher plot(Gest(cells), fisher(.) ~ fisher(theo))<br />

fisher(cb<strong>in</strong>d(km, rs, theo))<br />

0.0 0.5 1.0 1.5<br />

Gest(cells)<br />

0.0 0.5 1.0 1.5<br />

fisher(theo)<br />

16.4 Pairwise distances <strong>and</strong> the K function<br />

The observed pairwise distances sij = ||xi −xj|| <strong>in</strong> the data pattern x constitute a biased sample<br />

<strong>of</strong> pairwise distances <strong>in</strong> the <strong>po<strong>in</strong>t</strong> process, with a bias <strong>in</strong> favour <strong>of</strong> smaller distances. For example,<br />

we can never observe a pairwise distance greater than the diameter <strong>of</strong> the w<strong>in</strong>dow.<br />

Ripley [40] def<strong>in</strong>ed the K-function for a stationary <strong>po<strong>in</strong>t</strong> process so that λK(r) is the expected<br />

number <strong>of</strong> other <strong>po<strong>in</strong>t</strong>s <strong>of</strong> the process with<strong>in</strong> a distance r <strong>of</strong> a typical <strong>po<strong>in</strong>t</strong> <strong>of</strong> the process.<br />

Formally<br />

K(r) = 1<br />

E [n(X ∩ b(u,r) \ {u}) | u ∈ X]. (21)<br />

λ<br />

For a homogeneous Poisson process, <strong>in</strong>tuitively, the knowledge that u is a <strong>po<strong>in</strong>t</strong> <strong>of</strong> X does<br />

not affect the other <strong>po<strong>in</strong>t</strong>s <strong>of</strong> the process, so that X\{u} is conditionally a Poisson process. The<br />

expected number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s fall<strong>in</strong>g <strong>in</strong> b(u,r) is λπr 2 . Thus for a homogeneous Poisson process<br />

Kpois(r) = πr 2<br />

regardless <strong>of</strong> the <strong>in</strong>tensity.<br />

Numerous estimators <strong>of</strong> K have been proposed. Most <strong>of</strong> them are weighted <strong>and</strong> renormalised<br />

empirical distribution functions <strong>of</strong> the pairwise distances, <strong>of</strong> the general form<br />

K(r) =<br />

1<br />

λ 2 area(W)<br />

(22)<br />

<br />

1 {||xi − xj|| ≤ r}e(xi,xj;r) (23)<br />

i<br />

j=i<br />

where e(u,v,r) is an edge correction weight. The choice <strong>of</strong> estimator does not seem to be very<br />

important, as long as some edge correction is applied.<br />

Copyright c○CSIRO 2008


112 Methods 5: Distance methods for <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

Aga<strong>in</strong> we usually compare the estimate K(r) with the Poisson K function. Values K(r) > πr 2<br />

suggest cluster<strong>in</strong>g, while K(r) < πr 2 suggests a regular pattern.<br />

In spatstat the function Kest computes several estimates <strong>of</strong> the K-function.<br />

> Gc Gc<br />

Function value object (class ’fv’)<br />

for the function r -> K(r)<br />

Entries:<br />

id label description<br />

-- ----- ----------r<br />

r distance argument r<br />

theo Kpois(r) theoretical Poisson K(r)<br />

border Kbord(r) border-corrected estimate <strong>of</strong> K(r)<br />

trans Ktrans(r) translation-corrected estimate <strong>of</strong> K(r)<br />

iso Kiso(r) Ripley isotropic correction estimate <strong>of</strong> K(r)<br />

--------------------------------------<br />

Default plot formula:<br />

. ~ r<br />

Recommended range <strong>of</strong> argument r: [0, 0.25]<br />

> par(pty = "s")<br />

> plot(Kest(cells))<br />

K(r)<br />

0.00 0.05 0.10 0.15 0.20<br />

Kest(cells)<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

r<br />

In this case, the <strong>in</strong>terpretation <strong>of</strong> all three summary statistics F, G <strong>and</strong> K is the same:<br />

emphatic evidence <strong>of</strong> a regular pattern. It is not always the case that these three summaries<br />

give equivalent messages.<br />

A commonly-used transformation <strong>of</strong> K is the L-function<br />

<br />

K(r)<br />

L(r) =<br />

π<br />

which transforms the Poisson K function to the straight l<strong>in</strong>e Lpois(r) = r, mak<strong>in</strong>g visual assessment<br />

<strong>of</strong> the graph much easier. The square root transformation also approximately stabilises<br />

the variance <strong>of</strong> the estimator, mak<strong>in</strong>g it easier to assess deviations.<br />

Copyright c○CSIRO 2008


16.4 Pairwise distances <strong>and</strong> the K function 113<br />

To compute the estimated L function, use Lest.<br />

> L plot(L, ma<strong>in</strong> = "L function")<br />

L(r)<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

L function<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

r<br />

Another related summary function is the pair correlation function<br />

g(r) = K′ (r)<br />

2πr<br />

where K ′ (r) is the derivative <strong>of</strong> K. The pair correlation is <strong>in</strong> some ways easier to <strong>in</strong>terpret than<br />

either K or L, although it is more difficult to estimate. Roughly speak<strong>in</strong>g, the pair correlation<br />

g(r) is the probability <strong>of</strong> observ<strong>in</strong>g a pair <strong>of</strong> <strong>po<strong>in</strong>t</strong>s separated by a distance r, divided by the<br />

correspond<strong>in</strong>g probability for a Poisson process. This is a non-centred correlation which may<br />

take any nonnegative value. The value g(r) = 1 corresponds to complete r<strong>and</strong>omness; for the<br />

Poisson process the pair correlation is gpois(r) ≡ 1. For other processes, values g(r) > 1 suggest<br />

cluster<strong>in</strong>g or attraction at distance r, while values g(r) < 1 suggest <strong>in</strong>hibition or regularity.<br />

To compute the estimated pair correlation function, use pcf.<br />

> plot(pcf(cells))<br />

g(r)<br />

0.0 0.5 1.0 1.5<br />

pcf(cells)<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

r<br />

Here we have used the method pcf.ppp. This computes a st<strong>and</strong>ard kernel estimate which<br />

performs well except at very small values <strong>of</strong> r. So it is prudent not to read too much <strong>in</strong>to the<br />

behaviour <strong>of</strong> the pcf close to r = 0.<br />

Copyright c○CSIRO 2008


114 Methods 5: Distance methods for <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

If you want to try another algebraic transformation <strong>of</strong> a summary function, the transformation<br />

can be computed us<strong>in</strong>g eval.fv. You can also plot algebraic transformations <strong>of</strong> a summary<br />

function us<strong>in</strong>g the ‘plott<strong>in</strong>g formula’ argument to plot.fv. For example, if we have already<br />

computed the K function, we can plot the L function by<br />

> K plot(K, sqrt(./pi) ~ r)<br />

<strong>and</strong> compute the L function us<strong>in</strong>g eval.fv:<br />

> K L K g 1 suggest regularity, <strong>and</strong> J(r) < 1 suggest cluster<strong>in</strong>g.<br />

An appeal<strong>in</strong>g property <strong>of</strong> the J function is that the superposition X• = X1 ∪ X2 <strong>of</strong> two<br />

<strong>in</strong>dependent <strong>po<strong>in</strong>t</strong> processes X1,X2 has J-function<br />

J(t) = λ1<br />

J1(t) +<br />

λ1 + λ2<br />

λ2<br />

J2(t)<br />

λ1 + λ2<br />

where J1,J2 are the J-functions <strong>of</strong> X1,X2 respectively <strong>and</strong> λ1,λ2 are their <strong>in</strong>tensities.<br />

The J function is computed by Jest.<br />

The convenient function allstats efficiently computes the F, G, J <strong>and</strong> K functions for a<br />

dataset. They can be plotted automatically.<br />

> plot(allstats(cells))<br />

Copyright c○CSIRO 2008


16.6 Manipulat<strong>in</strong>g <strong>and</strong> plott<strong>in</strong>g summary functions 115<br />

F(r)<br />

J(r)<br />

0.0 0.2 0.4 0.6 0.8<br />

2 4 6 8<br />

F function<br />

0.00 0.02 0.04 0.06 0.08<br />

r<br />

J function<br />

0.00 0.02 0.04 0.06 0.08<br />

allstats(cells)<br />

G(r)<br />

K(r)<br />

0.0 0.2 0.4 0.6 0.8<br />

0.00 0.05 0.10 0.15 0.20<br />

G function<br />

0.00 0.05 0.10 0.15<br />

r<br />

K function<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

16.6 Manipulat<strong>in</strong>g <strong>and</strong> plott<strong>in</strong>g summary functions<br />

As expla<strong>in</strong>ed above, the summary function comm<strong>and</strong>s Fest, Gest, Kest, Lest, pcf etc. return<br />

a function value table (an object <strong>of</strong> class "fv"). This is a data frame (i.e. it also belongs to<br />

the class "data.frame") with some extra <strong>in</strong>formation. One column <strong>of</strong> the data frame conta<strong>in</strong>s<br />

values <strong>of</strong> the distance argument r, while the other columns conta<strong>in</strong> different estimates <strong>of</strong> the<br />

value <strong>of</strong> the function, or the theoretical value <strong>of</strong> the function under CSR.<br />

The follow<strong>in</strong>g operations are def<strong>in</strong>ed on this class:<br />

pr<strong>in</strong>t.fv pr<strong>in</strong>t a summary description<br />

plot.fv plot the function estimates<br />

as.data.frame strip extra <strong>in</strong>formation (returns a data frame)<br />

$ extract one column (returns a numeric vector)<br />

[.fv extract subset (returns an "fv" object)<br />

with.fv perform calculations with specific columns <strong>of</strong> data frame<br />

eval.fv perform calculation on all columns <strong>of</strong> data frame<br />

stieltjes compute Stieltjes <strong>in</strong>tegral with respect to function<br />

To make life easier, there are several options for manipulat<strong>in</strong>g the function values.<br />

To manipulate or comb<strong>in</strong>e one or more columns <strong>of</strong> the data frame, it is typically easiest to<br />

use the comm<strong>and</strong> with.fv, which is a method for the generic comm<strong>and</strong> with. For example:<br />

> data(redwood)<br />

> K y x


116 Methods 5: Distance methods for <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

seedl<strong>in</strong>gs data. For this to work, we have to know that K conta<strong>in</strong>s columns named r, iso <strong>and</strong><br />

theo.<br />

The general syntax is with(X, expr) where X is an "fv" object <strong>and</strong> expr can be any<br />

expression <strong>in</strong>volv<strong>in</strong>g the names <strong>of</strong> columns <strong>of</strong> X. The expression can <strong>in</strong>clude functions, so long<br />

as they are capable <strong>of</strong> operat<strong>in</strong>g on numeric vectors. The expression can also <strong>in</strong>volve the symbol<br />

. represent<strong>in</strong>g “all recommended estimates <strong>of</strong> the function”. Thus:<br />

> L with(Kest(redwood), max(abs(iso - theo)))<br />

[1] 0.04945199<br />

To plot a transformed function, you can also use the plot method. Its second argument is<br />

a formula <strong>in</strong> the R language. The left side <strong>of</strong> the formula represents what curve or curves will<br />

be plotted on the y axis, <strong>and</strong> the right side determ<strong>in</strong>es the x variable for the plot. Thus:<br />

> plot(K, sqrt(./pi) ~ r)<br />

plots the estimates <strong>of</strong> L(r) = K(r), by all the available edge correction methods, aga<strong>in</strong>st<br />

r. The symbol . aga<strong>in</strong> signifies “all recommended estimates <strong>of</strong> the function”. The left h<strong>and</strong> side<br />

<strong>of</strong> the formula may use the comm<strong>and</strong> cb<strong>in</strong>d to <strong>in</strong>dicate that several different curves should be<br />

plotted. For example, to plot only two curves, giv<strong>in</strong>g the isotropic correction estimate <strong>and</strong> the<br />

theoretical value <strong>of</strong> K(r):<br />

> plot(K, cb<strong>in</strong>d(iso, theo) ~ r)<br />

The right-h<strong>and</strong> side can be any expression that evaluates to a numeric vector, <strong>and</strong> the left<br />

h<strong>and</strong> side is any expression that evaluates to a vector or matrix, <strong>of</strong> compatible dimensions.<br />

To manipulate or comb<strong>in</strong>e one or more "fv" objects, use eval.fv. Its argument is an<br />

expression conta<strong>in</strong><strong>in</strong>g the names <strong>of</strong> "fv" objects. For example<br />

> K L K1 K2 DK


16.7 Caveats 117<br />

16.7 Caveats<br />

The use <strong>of</strong> summary functions for analys<strong>in</strong>g <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> has become established across wide<br />

areas <strong>of</strong> applied science, follow<strong>in</strong>g Ripley’s <strong>in</strong>fluential paper [40] <strong>and</strong> many subsequent textbooks<br />

[19, 21, 23, 47, 42, 43, 46] until quite recently.<br />

There is a tendency to apply them uncritically <strong>and</strong> exclusively. It’s important to remember<br />

that<br />

1. the functions F, G <strong>and</strong> K are def<strong>in</strong>ed <strong>and</strong> estimated under the assumption that the <strong>po<strong>in</strong>t</strong><br />

process is stationary (homogeneous).<br />

2. these summary functions do not completely characterise the process.<br />

3. if the process is not stationary, deviations between the empirical <strong>and</strong> theoretical functions<br />

(e.g. K <strong>and</strong> Kpois) are not necessarily evidence <strong>of</strong> <strong>in</strong>ter<strong>po<strong>in</strong>t</strong> <strong>in</strong>teraction, s<strong>in</strong>ce they may<br />

also be attributable to variations <strong>in</strong> <strong>in</strong>tensity.<br />

For an example <strong>of</strong> caveat 2, here is a <strong>po<strong>in</strong>t</strong> process constructed by Baddeley <strong>and</strong> Silverman<br />

[10] which has the same K function as the homogeneous Poisson process:<br />

> par(mfrow = c(1, 2))<br />

> X plot(X)<br />

> plot(Kest(X))<br />

X<br />

0.00 0.05 0.10 0.15 0.20<br />

Kest(X)<br />

For an example <strong>of</strong> caveat 3, we generate an <strong>in</strong>homogeneous Poisson pattern <strong>and</strong> apply the<br />

ord<strong>in</strong>ary K function estimator. The result appears to show cluster<strong>in</strong>g, but this is an artefact <strong>of</strong><br />

the <strong>spatial</strong> <strong>in</strong>homogeneity.<br />

> par(mfrow = c(1, 2))<br />

> X plot(X)<br />

> plot(Kest(X))<br />

Copyright c○CSIRO 2008


118 Methods 5: Distance methods for <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

Copyright c○CSIRO 2008<br />

X<br />

0.00 0.05 0.10 0.15 0.20 0.25 0.30<br />

Kest(X)


17 Methods 6: simulation envelopes <strong>and</strong> goodness-<strong>of</strong>-fit tests<br />

Although summary statistics such as the K-function are <strong>in</strong>tended primarily for exploratory<br />

purposes, it is also possible to use them as a basis for statistical <strong>in</strong>ference.<br />

17.1 Envelopes <strong>and</strong> Monte Carlo tests<br />

17.1.1 Motivation<br />

In Section 16 we exam<strong>in</strong>ed plots <strong>of</strong> the K-function to judge whether a <strong>po<strong>in</strong>t</strong> pattern dataset is<br />

completely r<strong>and</strong>om. The K-function estimated from the <strong>po<strong>in</strong>t</strong> pattern data, K(r), was compared<br />

graphically with the theoretical K-function for a completely r<strong>and</strong>om pattern, Kpois(r) = πr 2 . In<br />

the toy examples, large discrepancies between K <strong>and</strong> Kpois were observed, <strong>in</strong>dicat<strong>in</strong>g that the<br />

toy examples were not completely r<strong>and</strong>om <strong>patterns</strong>.<br />

However, because <strong>of</strong> r<strong>and</strong>om variability, we will never obta<strong>in</strong> perfect agreement between K<br />

<strong>and</strong> Kpois, even with a completely r<strong>and</strong>om pattern. Try typ<strong>in</strong>g plot(Kest(rpoispp(50))) a<br />

few times to get an idea <strong>of</strong> the <strong>in</strong>herent variability.<br />

The follow<strong>in</strong>g plot shows the K-function estimated from the cells dataset (thick l<strong>in</strong>e), <strong>and</strong><br />

also the K-functions <strong>of</strong> 20 simulated realisations <strong>of</strong> CSR with the same <strong>in</strong>tensity (th<strong>in</strong> l<strong>in</strong>es).<br />

K(r)<br />

0.00 0.05 0.10 0.15<br />

0.00 0.05 0.10 0.15 0.20<br />

r<br />

The next plot shows the upper <strong>and</strong> lower envelopes <strong>of</strong> the simulated K-functions, that is, the<br />

maximum <strong>and</strong> m<strong>in</strong>imum values <strong>of</strong> K(r) for each value <strong>of</strong> r. The region between the envelopes<br />

is shaded.<br />

K(r)<br />

0.00 0.05 0.10 0.15 0.20<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

r<br />

119<br />

Copyright c○CSIRO 2008


120 Methods 6: simulation envelopes <strong>and</strong> goodness-<strong>of</strong>-fit tests<br />

Clearly, the K-function estimated from thecells data lies outside the typical range <strong>of</strong> values<br />

<strong>of</strong> the K-function for a completely r<strong>and</strong>om pattern.<br />

To conclude formally that there is a ‘significant’ difference between K <strong>and</strong> Kpois, we use<br />

the language <strong>of</strong> hypothesis test<strong>in</strong>g. Our null hypothesis is that the data <strong>po<strong>in</strong>t</strong> pattern is a<br />

realisation <strong>of</strong> complete <strong>spatial</strong> r<strong>and</strong>omness. The alternative hypothesis is that the data pattern<br />

is a realisation <strong>of</strong> another, unspecified <strong>po<strong>in</strong>t</strong> process.<br />

17.1.2 Monte Carlo tests<br />

A Monte Carlo test is a test based on simulations from the null hypothesis. The pr<strong>in</strong>ciple was<br />

orig<strong>in</strong>ated <strong>in</strong>dependently by Barnard [12] <strong>and</strong> Dwass [27]. It was applied <strong>in</strong> <strong>spatial</strong> statistics<br />

by Ripley [40, 42] <strong>and</strong> Besag [15, 16]. See also [28]. Monte Carlo tests are a special case <strong>of</strong><br />

r<strong>and</strong>omisation tests which are commonly used <strong>in</strong> nonparametric statistics.<br />

Suppose the reference curve is the theoretical K function for CSR. Generate M <strong>in</strong>dependent<br />

simulations <strong>of</strong> CSR <strong>in</strong>side the study region W. Compute the estimated K functions for each <strong>of</strong><br />

these realisations, say K (j) (r) for j = 1,... ,M. Obta<strong>in</strong> the <strong>po<strong>in</strong>t</strong>wise upper <strong>and</strong> lower envelopes<br />

<strong>of</strong> these simulated curves,<br />

L(r) = m<strong>in</strong><br />

j<br />

U(r) = max<br />

j<br />

K (j) (r)<br />

K (j) (r).<br />

For any fixed value <strong>of</strong> r, consider the probability that K(r) lies outside the envelope [L(r),U(r)]<br />

for the simulated curves. If the data came from a uniform Poisson process, then K(r) <strong>and</strong><br />

K (1) (r),... , K (M) (r) are statistically equivalent <strong>and</strong> <strong>in</strong>dependent, so this probability is equal<br />

to 2/(M + 1) by symmetry. That is, the test which rejects the null hypothesis <strong>of</strong> a uniform<br />

Poisson process when K(r) lies outside [L(r),U(r)], has exact significance level α = 2/(M + 1).<br />

Instead <strong>of</strong> the <strong>po<strong>in</strong>t</strong>wise maximum <strong>and</strong> m<strong>in</strong>imum, one could use the <strong>po<strong>in</strong>t</strong>wise order statistics<br />

(the <strong>po<strong>in</strong>t</strong>wise kth largest <strong>and</strong> k smallest values) giv<strong>in</strong>g a test <strong>of</strong> exact size α = 2k/(M + 1).<br />

17.1.3 Envelopes <strong>in</strong> spatstat<br />

In spatstat the function envelope computes the <strong>po<strong>in</strong>t</strong>wise envelopes.<br />

> data(cells)<br />

> E E<br />

Po<strong>in</strong>twise critical envelopes for K(r)<br />

Obta<strong>in</strong>ed from 39 simulations <strong>of</strong> simulations <strong>of</strong> CSR<br />

Significance level <strong>of</strong> <strong>po<strong>in</strong>t</strong>wise Monte Carlo test: 2/40 = 0.05<br />

Data: cells<br />

Function value object (class ’fv’)<br />

for the function r -> K(r)<br />

Entries:<br />

id label description<br />

-- ----- ----------r<br />

r distance argument r<br />

obs obs(r) function value for data pattern<br />

Copyright c○CSIRO 2008


17.1 Envelopes <strong>and</strong> Monte Carlo tests 121<br />

theo theo(r) theoretical value for CSR<br />

lo lo(r) lower <strong>po<strong>in</strong>t</strong>wise envelope <strong>of</strong> simulations<br />

hi hi(r) upper <strong>po<strong>in</strong>t</strong>wise envelope <strong>of</strong> simulations<br />

--------------------------------------<br />

Default plot formula:<br />

. ~ r<br />

Recommended range <strong>of</strong> argument r: [0, 0.25]<br />

> plot(E, ma<strong>in</strong> = "<strong>po<strong>in</strong>t</strong>wise envelopes")<br />

K(r)<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

<strong>po<strong>in</strong>t</strong>wise envelopes<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

r<br />

For example if r had been fixed at r = 0.10 we would have rejected the null hypothesis <strong>of</strong><br />

CSR at the 5% level. The value M = 39 is the smallest to yield a two-sided test with significance<br />

level 5%.<br />

Tip: A common <strong>and</strong> dangerous mistake is to mis<strong>in</strong>terpret the simulation envelopes<br />

as “confidence <strong>in</strong>tervals” around ˆ K. They cannot be <strong>in</strong>terpreted as a measure <strong>of</strong><br />

accuracy <strong>of</strong> the estimated K function! They are the critical values for a test <strong>of</strong> the<br />

hypothesis that K(r) = πr 2 .<br />

The value returned by envelope is an object <strong>of</strong> class "fv" that can be manipulated <strong>in</strong><br />

the usual way: you can plot it, transform it, extract columns, <strong>and</strong> so on (see Section 16.6 on<br />

page 115).<br />

17.1.4 Simultaneous Monte Carlo test<br />

Note that the theory <strong>of</strong> the Monte Carlo test, as presented above, requires that r be fixed <strong>in</strong><br />

advance. If we plot the envelope <strong>and</strong> check whether the empirical K function ever w<strong>and</strong>ers<br />

Copyright c○CSIRO 2008


122 Methods 6: simulation envelopes <strong>and</strong> goodness-<strong>of</strong>-fit tests<br />

outside the envelope, this is equivalent to choos<strong>in</strong>g the value <strong>of</strong> r <strong>in</strong> a data-dependent way, <strong>and</strong><br />

the true significance level is higher (less ‘significant’).<br />

To avoid this problem we can construct simultaneous critical b<strong>and</strong>s which have the property<br />

that, under H0, the probability that K ever w<strong>and</strong>ers outside the critical b<strong>and</strong>s is exactly 5%.<br />

One simple way to achieve this is to compute, for each estimate K(r), its maximum deviation<br />

from the Poisson K function, D = maxr | K(r) − Kpois(r)|. This is computed for each <strong>of</strong> the M<br />

simulated datasets, <strong>and</strong> the maximum value Dmax obta<strong>in</strong>ed. Then the upper <strong>and</strong> lower limits<br />

are<br />

L(r) = πr 2 − Dmax<br />

U(r) = πr 2 + Dmax.<br />

The estimated K function for the data transgresses these limits if <strong>and</strong> only if the D-value for<br />

the data exceeds Dmax. Under H0 this occurs with probability 1/(M + 1). Thus, a test <strong>of</strong> size<br />

5% is obta<strong>in</strong>ed by tak<strong>in</strong>g M = 19.<br />

> E plot(E, ma<strong>in</strong> = "global envelopes")<br />

K(r)<br />

−0.05 0.00 0.05 0.10 0.15 0.20 0.25<br />

global envelopes<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

r<br />

A more powerful test is obta<strong>in</strong>ed if we (approximately) stabilise the variance, by us<strong>in</strong>g the<br />

L function <strong>in</strong> place <strong>of</strong> K.<br />

> E plot(E, ma<strong>in</strong> = "global envelopes <strong>of</strong> L(r)")<br />

Copyright c○CSIRO 2008


17.1 Envelopes <strong>and</strong> Monte Carlo tests 123<br />

L(r)<br />

−0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30<br />

global envelopes <strong>of</strong> L(r)<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

r<br />

17.1.5 Envelopes for any fitted model<br />

In the explanation above, we assumed that the null hypothesis was CSR (complete <strong>spatial</strong><br />

r<strong>and</strong>omness, a uniform Poisson process). In fact the Monte Carlo test<strong>in</strong>g rationale can be<br />

applied to any <strong>po<strong>in</strong>t</strong> process model serv<strong>in</strong>g as a null hypothesis. We simply have to generate<br />

simulated realisations from the null hypothesis, <strong>and</strong> compute the summary function for each<br />

simulated realisation.<br />

To simulate from a fitted <strong>po<strong>in</strong>t</strong> process model (object <strong>of</strong> class "ppm"), call the envelope<br />

function, giv<strong>in</strong>g the fitted model as the first argument <strong>of</strong> envelope. Then the simulated <strong>patterns</strong><br />

will be generated accord<strong>in</strong>g to this fitted model. The orig<strong>in</strong>al data <strong>po<strong>in</strong>t</strong> pattern, to which the<br />

model was fitted, is stored <strong>in</strong> the fitted model object; the orig<strong>in</strong>al data are extracted <strong>and</strong> the<br />

summary function for the data is also computed.<br />

The follow<strong>in</strong>g code fits an <strong>in</strong>homogeneous Poisson process to the Beilschmiedia pattern, then<br />

generates simulation envelopes <strong>of</strong> the L function by simulat<strong>in</strong>g from the fitted <strong>in</strong>homogeneous<br />

Poisson model.<br />

> data(bei)<br />

> fit E plot(E, ma<strong>in</strong> = "envelope for <strong>in</strong>homogeneous Poisson")<br />

L(r)<br />

0 20 40 60 80 100 120<br />

envelope for <strong>in</strong>homogeneous Poisson<br />

0 20 40 60 80 100 120<br />

r (metres)<br />

Copyright c○CSIRO 2008


124 Methods 6: simulation envelopes <strong>and</strong> goodness-<strong>of</strong>-fit tests<br />

17.1.6 Envelopes based on any simulation procedure<br />

Envelopes can also be computed us<strong>in</strong>g any user-specified procedure to generate the simulated<br />

realisations. This allows us to perform r<strong>and</strong>omisation tests, for example.<br />

The simulation procedure should be encoded as an R expression, which will be evaluated<br />

each time a simulation is required. For example if we type<br />

> sim data(cells)<br />

> e E plot(E, ma<strong>in</strong> = "envelope with fixed n")<br />

L(r)<br />

−0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30<br />

envelope with fixed n<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

r<br />

17.1.7 Envelopes based on a set <strong>of</strong> <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

Envelopes can also be computed from a user-supplied list <strong>of</strong> <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>, <strong>in</strong>stead <strong>of</strong> the<br />

simulated <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> generated by a chosen simulation procedure.<br />

This improves efficiency <strong>and</strong> consistency if, for example, we are go<strong>in</strong>g to calculate the envelopes<br />

<strong>of</strong> several different summary statistics.<br />

> data(cells)<br />

> SimPatList for (i <strong>in</strong> 1:1000) SimPatList[[i]] EK Ep


18 Methods 7: model-fitt<strong>in</strong>g us<strong>in</strong>g summary statistics<br />

Summary statistics can also be used to fit models to data.<br />

In the ‘method <strong>of</strong> moments’ we estimate a parameter θ by solv<strong>in</strong>g<br />

Eθ[S(X)] = S(x)<br />

where S(x) is the observed value <strong>of</strong> a statistic S for our data x, <strong>and</strong> the left side is the theoretical<br />

mean <strong>of</strong> S for the model governed by parameter θ.<br />

The analogue for <strong>po<strong>in</strong>t</strong> process models is to fit the model by match<strong>in</strong>g a summary statistic<br />

such as the K function to its theoretical value under the model.<br />

18.0.8 Cluster processes<br />

In a precious few cases, the K function <strong>of</strong> a <strong>po<strong>in</strong>t</strong> process is known exactly, as an analytic<br />

expression <strong>in</strong> terms <strong>of</strong> the model parameters. These happy cases <strong>in</strong>clude many Neyman-Scott<br />

cluster processes. For example, the K-function <strong>of</strong> the Thomas process with parameters θ =<br />

(κ,µ,σ) is<br />

Kθ(r) = πr 2 + 1 r2<br />

(1 − exp(−<br />

κ 4σ2)). (26)<br />

We can use this to fit a Thomas model to data. We determ<strong>in</strong>e the values <strong>of</strong> the parameters<br />

θ = (κ,µ,σ) to achieve the best match between Kθ(r) <strong>and</strong> the estimated K-function <strong>of</strong> the data,<br />

K(r). The best match is determ<strong>in</strong>ed by m<strong>in</strong>imis<strong>in</strong>g the discrepancy between the two functions<br />

over some range [a,b]:<br />

b <br />

<br />

D(θ) = K(r) q − Kθ(r) q<br />

<br />

<br />

dr (27)<br />

a<br />

where 0 ≤ a < b, <strong>and</strong> where p,q > 0 are <strong>in</strong>dices. This method was orig<strong>in</strong>ally advocated by Peter<br />

Diggle <strong>and</strong> collaborators, <strong>and</strong> is now known as the method <strong>of</strong> m<strong>in</strong>imum contrast. See [23].<br />

The comm<strong>and</strong> kppm fits cluster <strong>po<strong>in</strong>t</strong> process models by the method <strong>of</strong> m<strong>in</strong>imum contrast.<br />

To fit the Thomas model to the redwood data:<br />

> data(redwood)<br />

> fit fit<br />

Stationary cluster <strong>po<strong>in</strong>t</strong> process model<br />

Fitted to <strong>po<strong>in</strong>t</strong> pattern dataset ’redwood’<br />

Cluster model: Thomas process<br />

Fitted parameters:<br />

kappa sigma mu<br />

23.55389789 0.04704965 2.63226071<br />

p<br />

125<br />

Copyright c○CSIRO 2008


126 Methods 7: model-fitt<strong>in</strong>g us<strong>in</strong>g summary statistics<br />

> plot(fit)<br />

K(r)<br />

0.00 0.05 0.10 0.15 0.20<br />

fit<br />

K−function<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

r<br />

The plot shows the theoretical K function <strong>of</strong> the fitted Thomas process (fit), three nonparametric<br />

estimates <strong>of</strong> the K function (iso, trans, border) <strong>and</strong> the Poisson K function<br />

(theo).<br />

At present, the only cluster process models that can be fitted us<strong>in</strong>g kppm are the Thomas<br />

process <strong>and</strong> the Matérn cluster process. To fit the Matérn cluster process to the redwood data,<br />

> fitM plot(simulate(fit, nsim = 4))<br />

The comm<strong>and</strong> simulate is generic; here we have used the method simulate.kppm.<br />

18.1 Other models with known K function<br />

Apart from cluster processes, there are certa<strong>in</strong> other <strong>po<strong>in</strong>t</strong> process models for which the Kfunction<br />

is known as a function <strong>of</strong> the model parameters. M<strong>in</strong>imum contrast methods are also<br />

available for these models.<br />

One special case is the log-Gaussian Cox processes described <strong>in</strong> detail <strong>in</strong> [37]. To fit a<br />

log-Gaussian Cox process with exponential covariance function to the redwood data:<br />

> fit fit<br />

M<strong>in</strong>imum contrast fit (object <strong>of</strong> class "m<strong>in</strong>confit")<br />

Model: log-Gaussian Cox process<br />

Fitted by match<strong>in</strong>g theoretical K function to Kest(K)<br />

Parameters fitted by m<strong>in</strong>imum contrast ($par):<br />

sigma2 alpha<br />

1.0485493 0.0997963<br />

Copyright c○CSIRO 2008


18.2 Generic algorithm for m<strong>in</strong>imum contrast 127<br />

Derived parameters <strong>of</strong> log-Gaussian Cox process ($modelpar):<br />

sigma2 alpha mu<br />

1.0485493 0.0997963 3.6028597<br />

Converged successfully after 145 iterations.<br />

Doma<strong>in</strong> <strong>of</strong> <strong>in</strong>tegration: [ 0 , 0.25 ]<br />

Exponents: p= 2, q= 0.25<br />

The second argument to lgcp.estK gives <strong>in</strong>itial values for the model parameters σ 2 <strong>and</strong> α.<br />

The result <strong>of</strong> lgcp.estK is an object <strong>of</strong> class m<strong>in</strong>confit (represent<strong>in</strong>g a ‘m<strong>in</strong>imum contrast<br />

fit’). There are methods for pr<strong>in</strong>t<strong>in</strong>g <strong>and</strong> plott<strong>in</strong>g the fit. Simulation <strong>of</strong> these models has not<br />

yet been implemented <strong>in</strong> spatstat.<br />

18.2 Generic algorithm for m<strong>in</strong>imum contrast<br />

The comm<strong>and</strong> m<strong>in</strong>contrast is a generic fitt<strong>in</strong>g algorithm for the method <strong>of</strong> m<strong>in</strong>imum contrast.<br />

It can be used <strong>in</strong> any context where the theoretical function can be computed exactly from the<br />

model parameters. A basic call to m<strong>in</strong>contrast is:<br />

> m<strong>in</strong>contrast(observed, theoretical, starpar)<br />

where observed is an object <strong>of</strong> class "fv" conta<strong>in</strong><strong>in</strong>g the summary function calculated from<br />

the data; theoretical is a function which returns the theoretical value <strong>of</strong> the summary function<br />

for a given parameter value; <strong>and</strong> startpar is a vector <strong>of</strong> <strong>in</strong>itial values <strong>of</strong> the model parameters.<br />

For details, see the help file for m<strong>in</strong>contrast.<br />

18.2.1 Monte Carlo<br />

For the vast majority <strong>of</strong> <strong>po<strong>in</strong>t</strong> process models, the true K function Kθ(r) is not known analytically<br />

<strong>in</strong> terms <strong>of</strong> the parameter θ. In pr<strong>in</strong>ciple we could use Monte Carlo simulation to determ<strong>in</strong>e<br />

an approximation to Kθ(r), for any given θ, by generat<strong>in</strong>g a large number <strong>of</strong> simulated realisations<br />

<strong>of</strong> the process with parameter θ, comput<strong>in</strong>g the estimated K-function for each realisation,<br />

<strong>and</strong> tak<strong>in</strong>g the <strong>po<strong>in</strong>t</strong>wise sample average. It’s possible to do this <strong>in</strong> spatstat us<strong>in</strong>g the generic<br />

algorithm m<strong>in</strong>contrast. Details are not given here as it is rather fiddly at present, <strong>and</strong> will<br />

change soon.<br />

Copyright c○CSIRO 2008


128 Methods 8: adjust<strong>in</strong>g for <strong>in</strong>homogeneity<br />

19 Methods 8: adjust<strong>in</strong>g for <strong>in</strong>homogeneity<br />

If a <strong>po<strong>in</strong>t</strong> pattern is known or suspected to be <strong>spatial</strong>ly <strong>in</strong>homogeneous, then our statistical<br />

analysis <strong>of</strong> the pattern should take account <strong>of</strong> this <strong>in</strong>homogeneity.<br />

19.1 Inhomogeneous K function<br />

There is a modification <strong>of</strong> the K function that applies to <strong>in</strong>homogeneous processes [2]. If λ(u)<br />

is the true <strong>in</strong>tensity function <strong>of</strong> the <strong>po<strong>in</strong>t</strong> process X, then the idea is that each <strong>po<strong>in</strong>t</strong> xi will be<br />

weighted by wi = 1/λ(xi).<br />

The <strong>in</strong>homogeneous K-function is def<strong>in</strong>ed as<br />

K<strong>in</strong>hom(r) = E<br />

⎡<br />

⎣ 1<br />

λ(u)<br />

<br />

xj∈X<br />

1<br />

λ(xj) 1 {0 < ||u − xj|| ≤ r}<br />

⎤<br />

<br />

<br />

u ∈ X⎦<br />

(28)<br />

<br />

assum<strong>in</strong>g that this does not depend on location u. Thus, λ(u)K(r) is the expected total ‘weight’<br />

<strong>of</strong> all r<strong>and</strong>om <strong>po<strong>in</strong>t</strong>s with<strong>in</strong> a distance r <strong>of</strong> the <strong>po<strong>in</strong>t</strong> u, where the ‘weight’ <strong>of</strong> a <strong>po<strong>in</strong>t</strong> xi is 1/λ(xi).<br />

If the process is actually homogeneous, then λ(u) is constant <strong>and</strong> K<strong>in</strong>hom(r) reduces to the<br />

usual K function (21).<br />

It turns out that, for an <strong>in</strong>homogeneous Poisson process with <strong>in</strong>tensity function λ(u), the<br />

<strong>in</strong>homogeneous K function is<br />

exactly as for the homogeneous case.<br />

K<strong>in</strong>hom, pois(r) = πr 2<br />

The st<strong>and</strong>ard estimators <strong>of</strong> K can be extended to the <strong>in</strong>homogeneous K function:<br />

K<strong>in</strong>hom(r) =<br />

1<br />

area(W)<br />

<br />

i<br />

j=i<br />

(29)<br />

1 {||xi − xj|| ≤ r}<br />

λ(xi) e(xi,xj;r) (30)<br />

λ(xj)<br />

where e(u,v,r) is an edge correction weight as before, <strong>and</strong> λ(u) is an estimate <strong>of</strong> the <strong>in</strong>tensity<br />

function λ(u).<br />

There rema<strong>in</strong>s the question <strong>of</strong> how to estimate the <strong>in</strong>tensity function λ(u). It is usually<br />

advisable to obta<strong>in</strong> the <strong>in</strong>tensity estimate λ(u) by fitt<strong>in</strong>g a parametric model, to avoid overfitt<strong>in</strong>g.<br />

Here is an example for the tropical ra<strong>in</strong>forest data, us<strong>in</strong>g the covariate data to suggest a model<br />

for the <strong>in</strong>tensity.<br />

> data(bei)<br />

> fit lam Ki plot(Ki, ma<strong>in</strong> = "Inhomogeneous K function")<br />

Copyright c○CSIRO 2008


0 10000 20000 30000 40000 50000<br />

19.1 Inhomogeneous K function 129<br />

Inhomogeneous K function<br />

The plot suggests that, even after account<strong>in</strong>g for dependence on altitude <strong>and</strong> slope, the trees<br />

still appear to be clustered.<br />

The <strong>in</strong>tensity function λ(u) could also be estimated by kernel smooth<strong>in</strong>g the <strong>po<strong>in</strong>t</strong> pattern<br />

data. However, notice that the estimator (30) <strong>of</strong> the <strong>in</strong>homogeneous K function depends on<br />

the estimated <strong>in</strong>tensity values at the data <strong>po<strong>in</strong>t</strong>s, λ(xi). These are positively biased estimates<br />

<strong>of</strong> the true values λ(xi). In order to avoid bias, the value λ(xi) should be estimated by kernel<br />

smooth<strong>in</strong>g <strong>of</strong> the <strong>po<strong>in</strong>t</strong> pattern with the <strong>po<strong>in</strong>t</strong> xi deleted. This “leave-one-out” estimator is<br />

implemented <strong>in</strong> K<strong>in</strong>hom <strong>and</strong> is <strong>in</strong>voked when the argument lambda is not given:<br />

> Ki2 plot(Ki2, ma<strong>in</strong> = "K<strong>in</strong>hom us<strong>in</strong>g leave-one-out")<br />

lty col<br />

bord.modif 1 1<br />

border 2 2<br />

theo 3 3<br />

(the smooth<strong>in</strong>g parameter σ can also be controlled.)<br />

The <strong>in</strong>homogeneous analogue <strong>of</strong> the L-function is def<strong>in</strong>ed by<br />

<br />

L<strong>in</strong>hom(r) =<br />

K<strong>in</strong>hom(r)<br />

This can be computed us<strong>in</strong>g L<strong>in</strong>hom. For an <strong>in</strong>homogeneous Poisson process, L<strong>in</strong>hom(r) ≡ r.<br />

The <strong>in</strong>homogeneous analogue <strong>of</strong> the pair correlation function can be def<strong>in</strong>ed, similarly to the<br />

homogeneous case, as<br />

g<strong>in</strong>hom(r) = K′ <strong>in</strong>hom (r)<br />

.<br />

2πr<br />

It has the same <strong>in</strong>terpretation, namely, that g<strong>in</strong>hom(r) is the probability <strong>of</strong> observ<strong>in</strong>g a pair <strong>of</strong><br />

<strong>po<strong>in</strong>t</strong>s at certa<strong>in</strong> locations separated by a distance r, divided by the correspond<strong>in</strong>g probability<br />

for a Poisson process <strong>of</strong> the same (<strong>in</strong>homogeneous) <strong>in</strong>tensity.<br />

The <strong>in</strong>homogeneous pair correlation function is currently computed by call<strong>in</strong>g K<strong>in</strong>hom followed<br />

by pcf.fv (which does numerical differentiation):<br />

> g


130 Methods 8: adjust<strong>in</strong>g for <strong>in</strong>homogeneity<br />

19.2 Inhomogeneous cluster models<br />

The <strong>in</strong>homogeneous Poisson process was described <strong>in</strong> Section 13.1. We can also <strong>in</strong>troduce <strong>spatial</strong><br />

<strong>in</strong>homogeneity <strong>in</strong>to any <strong>of</strong> the non-Poisson models described <strong>in</strong> Section 15.<br />

In the case <strong>of</strong> Poisson cluster processes (Section 15.1) we can <strong>in</strong>troduce <strong>in</strong>homogeneity <strong>in</strong><br />

either the parent process or the <strong>of</strong>fspr<strong>in</strong>g processes.<br />

To make the parents <strong>in</strong>homogeneous, we simply generate the parent <strong>po<strong>in</strong>t</strong>s from an <strong>in</strong>homogeneous<br />

Poisson process with some <strong>in</strong>tensity function κ(u).<br />

To make the clusters <strong>in</strong>homogeneous, we use a clever construction by Waagepetersen [49].<br />

For a parent <strong>po<strong>in</strong>t</strong> at location (x0,y0), the <strong>of</strong>fspr<strong>in</strong>g are generated from a Poisson process with<br />

<strong>in</strong>tensity β(x,y) = µ(x,y)f(x − x0,y − y0), where f(u,v) is either the Gaussian probability<br />

density (for the Thomas process) or the uniform probability density <strong>in</strong> a disc (for the Matérn<br />

cluster process), <strong>and</strong> µ(x,y) is the reference or modulat<strong>in</strong>g <strong>in</strong>tensity. The number <strong>of</strong> <strong>of</strong>fspr<strong>in</strong>g<br />

from a given parent (x0,y0) is a Poisson r<strong>and</strong>om variable with mean<br />

<br />

<br />

B(x0,y0) = β(x,y)dxdy = f(x − x0,y − y0)µ(x,y)dxdy.<br />

The simulation algorithms rMatClust <strong>and</strong> rThomas allow these options. If the parent <strong>in</strong>tensity<br />

parameter kappa is given as a function(x,y) or a pixel image, then the parents are<br />

Poisson with <strong>in</strong>homogeneous <strong>in</strong>tensity kappa. If the <strong>of</strong>fspr<strong>in</strong>g mean parameter mu is given as a<br />

function(x,y) or a pixel image, then this determ<strong>in</strong>es an <strong>in</strong>homogeneous reference density for<br />

the clusters.<br />

> Z plot(rMatClust(10, 0.05, Z))<br />

rMatClust(10, 0.05, Z)<br />

19.3 Fitt<strong>in</strong>g <strong>in</strong>homogeneous models by m<strong>in</strong>imum contrast<br />

M<strong>in</strong>imum contrast methods can be applied to <strong>in</strong>homogeneous <strong>po<strong>in</strong>t</strong> process models.<br />

In pr<strong>in</strong>ciple we could fit any model (homogeneous or <strong>in</strong>homogeneous) by the method <strong>of</strong><br />

m<strong>in</strong>imum contrast us<strong>in</strong>g any summary statistic. However, the method works best when we<br />

have an exact formula for the true value <strong>of</strong> the summary function for the model, expressed as a<br />

function <strong>of</strong> the parameters <strong>of</strong> the model.<br />

Copyright c○CSIRO 2008


19.3 Fitt<strong>in</strong>g <strong>in</strong>homogeneous models by m<strong>in</strong>imum contrast 131<br />

Waagepetersen [49] <strong>po<strong>in</strong>t</strong>ed out that, if we take a Thomas process or Matérn cluster process<br />

(or <strong>in</strong> general a Neyman-Scott process) with homogeneous parent <strong>in</strong>tensity κ <strong>and</strong> <strong>in</strong>homogeneous<br />

cluster reference density µ(u), then the overall <strong>in</strong>tensity <strong>of</strong> the process is<br />

λ(u) = κµ(u)<br />

<strong>and</strong> the <strong>in</strong>homogeneous K-function is the same as it would be if µ were constant.<br />

Thus, we can fit a Thomas process or Matérn cluster process with <strong>in</strong>homogeneous clusters<br />

as follows:<br />

1. estimate the <strong>in</strong>homogeneous <strong>in</strong>tensity λ(u) <strong>of</strong> the process.<br />

2. derive an estimate <strong>of</strong> the <strong>in</strong>homogeneous K-function.<br />

3. use the method <strong>of</strong> m<strong>in</strong>imum contrast to estimate the parent <strong>in</strong>tensity κ <strong>and</strong> the cluster<br />

scale parameter (Gaussian st<strong>and</strong>ard deviation or disc radius), exactly as we would <strong>in</strong> the<br />

homogeneous case.<br />

The comm<strong>and</strong> kppm performs this algorithm us<strong>in</strong>g a parametric model for the trend:<br />

> data(bei)<br />

> fit fit<br />

Inhomogeneous cluster <strong>po<strong>in</strong>t</strong> process model<br />

Fitted to <strong>po<strong>in</strong>t</strong> pattern dataset ’bei’<br />

Trend formula:~elev + grad<br />

Fitted coefficients for trend formula:<br />

(Intercept) elev grad<br />

-8.55862210 0.02140987 5.84104065<br />

Cluster model: Thomas process<br />

Fitted parameters:<br />

kappa sigma<br />

0.0004290453 5.4110425537<br />

In this example,kppm first estimates the <strong>in</strong>tensity by fitt<strong>in</strong>g the modelppm(bei, ~elev+grad, covariate<br />

Then predict.ppm is used to compute the predicted <strong>in</strong>tensity at the data <strong>po<strong>in</strong>t</strong>s, <strong>and</strong> this is<br />

passed to K<strong>in</strong>hom to calculate the <strong>in</strong>homogeneous K function. The parameters <strong>of</strong> the Thomas<br />

process are estimated from the <strong>in</strong>homogeneous K function by m<strong>in</strong>imum contrast.<br />

The result <strong>of</strong> kppm can be pr<strong>in</strong>ted, plotted <strong>and</strong> simulated as before.<br />

Copyright c○CSIRO 2008


132 Gibbs models<br />

20 Gibbs models<br />

One way to construct a statistical model (<strong>in</strong> any field <strong>of</strong> statistics) is to write down its probability<br />

density. Advantages <strong>of</strong> do<strong>in</strong>g this are:<br />

• the functional form <strong>of</strong> the density reflects its probabilistic properties.<br />

• terms or factors <strong>in</strong> the density <strong>of</strong>ten have an <strong>in</strong>terpretation as ‘components’ <strong>of</strong> the model.<br />

• it is easy to <strong>in</strong>troduce terms that represent the dependence <strong>of</strong> the model on covariates, etc.<br />

This approach is useful provided the density can be written down, <strong>and</strong> provided the density<br />

is tractable.<br />

Spatial <strong>po<strong>in</strong>t</strong> process models that are constructed by writ<strong>in</strong>g down their probability densities<br />

are called ‘Gibbs processes’. Good references on Gibbs <strong>po<strong>in</strong>t</strong> processes are [47, 20].<br />

20.1 Probability densities<br />

It is possible to def<strong>in</strong>e probability densities for <strong>spatial</strong> <strong>po<strong>in</strong>t</strong> processes that live <strong>in</strong>side a bounded<br />

w<strong>in</strong>dow W.<br />

The probability density will be a function f(x) def<strong>in</strong>ed for each f<strong>in</strong>ite configuration x =<br />

{x1,... ,xn} <strong>of</strong> <strong>po<strong>in</strong>t</strong>s xi ∈ W for any n ≥ 0. Notice that the number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s n is not fixed,<br />

<strong>and</strong> may be zero. Apart from this peculiarity, probability densities for <strong>po<strong>in</strong>t</strong> processes behave<br />

much like probability densities <strong>in</strong> more familiar contexts.<br />

That’s all you need to know for applications. If you’re <strong>in</strong>terested <strong>in</strong> the mathematical<br />

technicalities, read on; otherwise, skip to section 20.2.<br />

A <strong>po<strong>in</strong>t</strong> process X <strong>in</strong>side W is def<strong>in</strong>ed to have probability density f if <strong>and</strong> only if, for any<br />

nonnegative <strong>in</strong>tegrable function h,<br />

E[h(X)] = e −|W | −|W |<br />

h(∅)f(∅) + e<br />

∞<br />

n=1<br />

<br />

1<br />

· · · h({x1,...,xn})f({x1,...,xn})dx1 · · · dxn<br />

n! W W<br />

(31)<br />

where |W | denotes the area <strong>of</strong> W.<br />

In particular, the probability that X conta<strong>in</strong>s exactly n <strong>po<strong>in</strong>t</strong>s is<br />

pn = P{n(X) = n} =<br />

e−|W |<br />

n!<br />

<br />

W<br />

<br />

· · · f({x1,... ,xn})dx1 · · · dxn<br />

W<br />

for n ≥ 1 <strong>and</strong> p0 = P{n(X) = 0} = e −|W | f(∅). Given that there are exactly n <strong>po<strong>in</strong>t</strong>s, the<br />

conditional jo<strong>in</strong>t density <strong>of</strong> the locations x1,... ,xn is f({x1,... ,xn})/pn.<br />

20.2 Poisson processes<br />

The uniform Poisson process with <strong>in</strong>tensity 1 has probability density f(x) ≡ 1.<br />

The uniform Poisson process <strong>in</strong> W with <strong>in</strong>tensity λ has probability density<br />

f(x) = α λ n(x)<br />

where n(x) is the number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>in</strong> the configuration x, <strong>and</strong> the constant α is<br />

Copyright c○CSIRO 2008<br />

α = e (1−λ)|W | .<br />

(32)


20.3 Pairwise <strong>in</strong>teraction models 133<br />

The <strong>in</strong>homogeneous Poisson process <strong>in</strong> W with <strong>in</strong>tensity function λ(u) has probability density<br />

where the constant α is<br />

f(x) = α<br />

n<br />

λ(xi). (33)<br />

i=1<br />

<br />

α = exp (1 − λ(u))du .<br />

W<br />

The densities (32) <strong>and</strong> (33) are products <strong>of</strong> terms associated with <strong>in</strong>dividual <strong>po<strong>in</strong>t</strong>s xi. This<br />

reflects the conditional <strong>in</strong>dependence property (PP4) <strong>of</strong> the Poisson process.<br />

20.3 Pairwise <strong>in</strong>teraction models<br />

In order to construct <strong>spatial</strong> <strong>po<strong>in</strong>t</strong> processes which exhibit <strong>in</strong>ter<strong>po<strong>in</strong>t</strong> <strong>in</strong>teraction (stochastic<br />

dependence between <strong>po<strong>in</strong>t</strong>s), we need to <strong>in</strong>troduce terms <strong>in</strong> the density that depend on more<br />

than one <strong>po<strong>in</strong>t</strong>. The simplest are pairwise <strong>in</strong>teraction models, which have probability densities<br />

<strong>of</strong> the form<br />

⎡ ⎤ ⎡<br />

n(x) <br />

f(x) = α ⎣ b(xi) ⎦ ⎣ <br />

⎤<br />

c(xi,xj) ⎦ (34)<br />

i=1<br />

where α is a normalis<strong>in</strong>g constant, b(u), u ∈ W is the ‘first order’ term, <strong>and</strong> c(u,v), u,v ∈ W<br />

is the ‘second order’ or ‘pairwise <strong>in</strong>teraction’ term. The pairwise <strong>in</strong>teraction term <strong>in</strong>troduces<br />

dependence between <strong>po<strong>in</strong>t</strong>s. The <strong>in</strong>teraction function must be symmetric, c(u,v) = c(v,u). In<br />

pr<strong>in</strong>ciple we are free to choose any functions b <strong>and</strong> c, provided the result<strong>in</strong>g density is <strong>in</strong>tegrable<br />

(the right side <strong>of</strong> (31) should be f<strong>in</strong>ite when h ≡ 1).<br />

20.3.1 Hard core process<br />

If we take b(u) ≡ β <strong>and</strong><br />

c(u,v) =<br />

i r<br />

0 if ||u − v|| ≤ r<br />

where ||u − v|| denotes the distance between u <strong>and</strong> v, <strong>and</strong> r > 0 is a fixed distance, then the<br />

density becomes<br />

<br />

αβn(x) if ||xi − xj|| > r for all i = j<br />

f(x) =<br />

0 otherwise<br />

This is the density <strong>of</strong> the Poisson process <strong>of</strong> <strong>in</strong>tensity β <strong>in</strong> W conditioned on the event that no<br />

two <strong>po<strong>in</strong>t</strong>s <strong>of</strong> the pattern lie closer than r units apart. It is known as the (classical) hard core<br />

process.<br />

(35)<br />

Copyright c○CSIRO 2008


134 Gibbs models<br />

20.3.2 Strauss process<br />

Hard core process<br />

Generalis<strong>in</strong>g the hard core process, suppose we take b(u) ≡ β <strong>and</strong><br />

c(u,v) =<br />

where γ is a parameter. Then the density becomes<br />

1 if ||u − v|| > r<br />

γ if ||u − v|| ≤ r<br />

f(x) = αβ n(x) γ s(x)<br />

where s(x) is the number <strong>of</strong> pairs <strong>of</strong> dist<strong>in</strong>ct <strong>po<strong>in</strong>t</strong>s <strong>in</strong> x that lie closer than r units apart.<br />

The parameter γ controls the ‘strength’ <strong>of</strong> <strong>in</strong>teraction between <strong>po<strong>in</strong>t</strong>s. If γ = 1 the model<br />

reduces to a Poisson process with <strong>in</strong>tensity β. If γ = 0 the model is a hard core process. For<br />

values 0 < γ < 1, the process exhibits <strong>in</strong>hibition (negative association) between <strong>po<strong>in</strong>t</strong>s.<br />

Strauss(γ = 0.2) Strauss(γ = 0.7)<br />

For γ > 1, the density (37) is not <strong>in</strong>tegrable. Hence the Strauss process is def<strong>in</strong>ed only for<br />

0 ≤ γ ≤ 1 <strong>and</strong> is a model for <strong>in</strong>hibition between <strong>po<strong>in</strong>t</strong>s. This is typical <strong>of</strong> most Gibbs models.<br />

Copyright c○CSIRO 2008<br />

(36)<br />

(37)


20.4 Higher-order <strong>in</strong>teractions 135<br />

20.3.3 Other pairwise <strong>in</strong>teraction models<br />

Other pairwise <strong>in</strong>teractions that are considered <strong>in</strong> spatstat <strong>in</strong>clude the Strauss-hard core <strong>in</strong>teraction<br />

(with hard core distance h > 0 <strong>and</strong> <strong>in</strong>teraction distance r > h)<br />

⎧<br />

⎨ 0 if ||u − v|| ≤ h<br />

c(u,v) = γ if h < ||u − v|| ≤ r ,<br />

⎩<br />

1 if ||u − v|| > r<br />

the s<strong>of</strong>t-core <strong>in</strong>teraction (with scale σ > 0 <strong>and</strong> <strong>in</strong>dex 0 < κ < 1)<br />

2/κ σ<br />

c(u,v) =<br />

,<br />

||u − v||<br />

the Diggle-Gates-Stibbard <strong>in</strong>teraction (with <strong>in</strong>teraction range ρ)<br />

2 π||u−v||<br />

s<strong>in</strong><br />

c(u,v) = 2ρ if ||u − v|| ≤ ρ<br />

1 if ||u − v|| > ρ ,<br />

the Diggle-Gratton <strong>in</strong>teraction (with hard core distance δ, <strong>in</strong>teraction distance ρ <strong>and</strong> <strong>in</strong>dex κ)<br />

⎧<br />

⎪⎨ 0<br />

<br />

if ||u − v|| ≤ δ<br />

κ<br />

||u−v||−δ<br />

c(u,v) =<br />

⎪⎩<br />

ρ−δ if δ < ||u − v|| ≤ ρ ,<br />

1 if ||u − v|| > ρ<br />

<strong>and</strong> the general piecewise constant <strong>in</strong>teraction <strong>in</strong> which c(||u − v||) is a step function <strong>of</strong> ||u − v||.<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

20.4 Higher-order <strong>in</strong>teractions<br />

Piecewise constant <strong>in</strong>teraction<br />

There are some useful Gibbs <strong>po<strong>in</strong>t</strong> process models which exhibit <strong>in</strong>teractions <strong>of</strong> higher order,<br />

that is, <strong>in</strong> which the probability density has contributions from m-tuples <strong>of</strong> <strong>po<strong>in</strong>t</strong>s, where m > 2.<br />

One example is the area-<strong>in</strong>teraction or Widom-Rowl<strong>in</strong>son process [11] with probability density<br />

f(x) = αβ n(x) γ −A(x)<br />

(38)<br />

where α is the normalis<strong>in</strong>g constant, β > 0 is an <strong>in</strong>tensity parameter, <strong>and</strong> γ > 0 is an <strong>in</strong>teraction<br />

parameter. Here A(x) denotes the area <strong>of</strong> the region obta<strong>in</strong>ed by draw<strong>in</strong>g a disc <strong>of</strong> radius r<br />

centred at each <strong>po<strong>in</strong>t</strong> xi, <strong>and</strong> tak<strong>in</strong>g the union <strong>of</strong> these discs. The value γ = 1 aga<strong>in</strong> corresponds<br />

to a Poisson process, while γ < 1 produces a regular process <strong>and</strong> γ > 1 a clustered process.<br />

This process has <strong>in</strong>teractions <strong>of</strong> all orders. It can be used as a model for moderate regularity or<br />

cluster<strong>in</strong>g.<br />

Copyright c○CSIRO 2008


136 Gibbs models<br />

20.5 Conditional <strong>in</strong>tensity<br />

The ma<strong>in</strong> tool for analys<strong>in</strong>g a Gibbs <strong>po<strong>in</strong>t</strong> process is its conditional <strong>in</strong>tensity λ(u,X). Intuitively<br />

this determ<strong>in</strong>es the conditional probability <strong>of</strong> f<strong>in</strong>d<strong>in</strong>g a <strong>po<strong>in</strong>t</strong> <strong>of</strong> the process at the location u given<br />

complete <strong>in</strong>formation about the rest <strong>of</strong> the process. For formal def<strong>in</strong>itions see [20]. Informally,<br />

the conditional probability <strong>of</strong> f<strong>in</strong>d<strong>in</strong>g a <strong>po<strong>in</strong>t</strong> <strong>of</strong> the process <strong>in</strong>side an <strong>in</strong>f<strong>in</strong>itesimal neighbourhood<br />

<strong>of</strong> the location u, given the complete <strong>po<strong>in</strong>t</strong> pattern at all other locations, is λ(u,X)du.<br />

u<br />

For <strong>po<strong>in</strong>t</strong> processes <strong>in</strong> a bounded w<strong>in</strong>dow, the conditional <strong>in</strong>tensity at a location u given the<br />

configuration x is related to the probability density f by<br />

λ(u,x) =<br />

f(x ∪ {u})<br />

f(x)<br />

(for u ∈ x), the ratio <strong>of</strong> the probability densities for the configuration x with <strong>and</strong> without the<br />

<strong>po<strong>in</strong>t</strong> u added.<br />

The homogeneous Poisson process with <strong>in</strong>tensity λ has conditional <strong>in</strong>tensity<br />

λ(u,x) = λ<br />

while the <strong>in</strong>homogeneous Poisson process with <strong>in</strong>tensity function λ(u) has conditional <strong>in</strong>tensity<br />

λ(u,x) = λ(u)<br />

. The conditional <strong>in</strong>tensity for a Poisson process does not depend on the configuration x, because<br />

the <strong>po<strong>in</strong>t</strong>s <strong>of</strong> a Poisson process are <strong>in</strong>dependent.<br />

For the general pairwise <strong>in</strong>teraction process (34) the conditional <strong>in</strong>tensity is<br />

For the hard core process,<br />

(39)<br />

n(x) <br />

λ(u,x) = b(u) c(u,xi). (40)<br />

i=1<br />

<br />

β if ||u − xi|| > r for all i<br />

λ(u,x) =<br />

0 otherwise<br />

which has the nice <strong>in</strong>terpretation that a <strong>po<strong>in</strong>t</strong> u is either ‘permitted’ or ‘not permitted’ depend<strong>in</strong>g<br />

on whether it satisfies the hard core requirement.<br />

For the Strauss process<br />

λ(u,x) = βγ t(u,x)<br />

(42)<br />

where t(u,x) = s(x ∪ {u}) − s(x) is the number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>of</strong> x that lie with<strong>in</strong> a distance r <strong>of</strong> the<br />

location u. For γ < 1, this has the <strong>in</strong>terpretation that a r<strong>and</strong>om <strong>po<strong>in</strong>t</strong> is less likely to occur at<br />

the location u if there are many <strong>po<strong>in</strong>t</strong>s <strong>in</strong> the neighbourhood.<br />

Copyright c○CSIRO 2008<br />

(41)


20.6 Simulat<strong>in</strong>g Gibbs models 137<br />

Strauss<br />

+<br />

For the area-<strong>in</strong>teraction process,<br />

λ(u,x) = βγ −B(u,x)<br />

area−<strong>in</strong>teraction<br />

where B(u,x) = A(x ∪ {u}) − A(x) is the area <strong>of</strong> that part <strong>of</strong> the disc <strong>of</strong> radius r centred on u<br />

that is not covered by discs <strong>of</strong> radius r centred at the other <strong>po<strong>in</strong>t</strong>s xi ∈ x. If the <strong>po<strong>in</strong>t</strong>s represent<br />

trees or plants, we may imag<strong>in</strong>e that each tree takes nutrients <strong>and</strong> water from the soil <strong>in</strong>side a<br />

circle <strong>of</strong> radius r. Then we may <strong>in</strong>terpret B(u,x) as the area <strong>of</strong> the ‘unclaimed zone’ where a<br />

new plant at location u would be able to draw nutrients <strong>and</strong> water without competition from<br />

other plants. For γ < 1 we can <strong>in</strong>terpret (43) as say<strong>in</strong>g that a r<strong>and</strong>om <strong>po<strong>in</strong>t</strong> is less likely to<br />

occur when the unclaimed area is small.<br />

The conditional <strong>in</strong>tensity <strong>of</strong> a <strong>po<strong>in</strong>t</strong> process determ<strong>in</strong>es the probability density, through (39).<br />

Hence we can use the conditional <strong>in</strong>tensity to def<strong>in</strong>e a <strong>po<strong>in</strong>t</strong> process. The conditional <strong>in</strong>tensity<br />

is the preferred modell<strong>in</strong>g tool for Gibbs processes: it has a direct <strong>in</strong>terpretation, <strong>and</strong> it is easier<br />

to h<strong>and</strong>le than the probability density.<br />

20.6 Simulat<strong>in</strong>g Gibbs models<br />

Gibbs models can be simulated by Markov cha<strong>in</strong> Monte Carlo algorithms. Indeed, MCMC<br />

algorithms were <strong>in</strong>vented to simulate Gibbs processes [36, 41].<br />

In brief, these algorithms simulate a Markov cha<strong>in</strong> whose states are <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>. The cha<strong>in</strong><br />

is designed so that its equilibrium distribution is the distribution <strong>of</strong> the <strong>po<strong>in</strong>t</strong> process we want<br />

to simulate. If the cha<strong>in</strong> were run for an <strong>in</strong>f<strong>in</strong>ite time, the state would converge <strong>in</strong> distribution<br />

to the desired <strong>po<strong>in</strong>t</strong> process. In practice the cha<strong>in</strong> is run for a long f<strong>in</strong>ite time. Further details<br />

are beyond the scope <strong>of</strong> this workshop; consult [37, 38] for more <strong>in</strong>formation.<br />

Currently spatstat <strong>of</strong>fers the function rmh which simulates Gibbs processes us<strong>in</strong>g the<br />

Metropolis-Hast<strong>in</strong>gs algorithm.<br />

> rmh(model, start, control)<br />

• model determ<strong>in</strong>es the <strong>po<strong>in</strong>t</strong> process model to be simulated (see help(rmhmodel)).<br />

• start determ<strong>in</strong>es the <strong>in</strong>itial state <strong>of</strong> the Markov cha<strong>in</strong> (see help(rmhstart)).<br />

• control specifies control parameters for runn<strong>in</strong>g the Markov cha<strong>in</strong>, such as the number<br />

<strong>of</strong> iteration steps (see help(rmhcontrol)).<br />

+<br />

(43)<br />

Copyright c○CSIRO 2008


138 Gibbs models<br />

In the simplest uses <strong>of</strong> rmh, the three arguments are lists <strong>of</strong> parameter values. To generate a<br />

simulated realisation <strong>of</strong> the Strauss process with parameters β = 2,γ = 0.7,r = 0.7 <strong>in</strong> a square<br />

<strong>of</strong> side 10,<br />

> mo X


21 Methods 9: fitt<strong>in</strong>g Gibbs models<br />

21.1 Maximum pseudolikelihood<br />

Maximum likelihood estimation is <strong>in</strong>tractable for most <strong>po<strong>in</strong>t</strong> process models. At the very least<br />

it requires Monte Carlo simulation to evaluate the likelihood (or the score <strong>and</strong> the Fisher <strong>in</strong>formation).<br />

A workable alternative, at least for <strong>in</strong>vestigative purposes, is to maximise the log pseudolikelihood<br />

log PL (θ;x) = <br />

<br />

log λ(xi;x) − λ(u,x)du. (44)<br />

i<br />

You may recognise this as be<strong>in</strong>g very similar to the likelihood (4) <strong>of</strong> the Poisson process. In<br />

general it is not a likelihood, but the analogue <strong>of</strong> the score equation<br />

∂<br />

log PL (θ) = 0<br />

∂θ<br />

is an unbiased estimat<strong>in</strong>g equation. Thus the maximum pseudolikelihood estimator is asymptotically<br />

unbiased, consistent <strong>and</strong> asymptotically normal under appropriate conditions.<br />

The ma<strong>in</strong> advantage <strong>of</strong> maximum pseudolikelihood is that, at least for popular Gibbs models,<br />

the conditional <strong>in</strong>tensity λ(u,x) is easily computable, so that the pseudolikelihood is easy to<br />

compute <strong>and</strong> to maximise. The ma<strong>in</strong> disadvantage is the bias <strong>and</strong> <strong>in</strong>efficiency <strong>of</strong> maximum<br />

pseudolikelihood <strong>in</strong> small samples.<br />

More computationally-<strong>in</strong>tensive estimation procedures typically use the maximum pseudolikelihood<br />

estimate as their <strong>in</strong>itial guess. We are implement<strong>in</strong>g such procedures <strong>in</strong> spatstat as<br />

well.<br />

21.2 Fitt<strong>in</strong>g Gibbs models <strong>in</strong> spatstat<br />

We have already met the function ppm for fitt<strong>in</strong>g Poisson <strong>po<strong>in</strong>t</strong> process models. In fact this<br />

function will fit a wide class <strong>of</strong> Gibbs models.<br />

ppm conta<strong>in</strong>s an implementation <strong>of</strong> the algorithm <strong>of</strong> Baddeley <strong>and</strong> Turner [3] for maximum<br />

pseudolikelihood (which extends the Berman-Turner device for Poisson processes to a general<br />

Gibbs process). The conditional <strong>in</strong>tensity <strong>of</strong> the model, λθ(u,x), must be logl<strong>in</strong>ear <strong>in</strong> the<br />

parameters θ:<br />

log λθ(u,x) = θ · S(u,x), (45)<br />

generalis<strong>in</strong>g (5), where S(u,x) is a real-valued or vector-valued function <strong>of</strong> location u <strong>and</strong> configuration<br />

x. Parameters θ appear<strong>in</strong>g <strong>in</strong> the logl<strong>in</strong>ear form (45) are called ‘regular’ parameters, <strong>and</strong><br />

all other parameters are ‘irregular’ parameters. For example, the Strauss process conditional<br />

<strong>in</strong>tensity (42) can be recast as<br />

W<br />

log λ(u,x) = log β + (log γ)t(u,x)<br />

so that θ = (log β,log γ) are regular parameters, but the <strong>in</strong>teraction distance r is an irregular<br />

parameter (technically called a ‘bloody nuisance parameter’).<br />

In spatstat we split the conditional <strong>in</strong>tensity <strong>in</strong>to first-order <strong>and</strong> higher-order terms:<br />

139<br />

log λθ(u,x) = η · S(u) + ϕ · V (u,x). (46)<br />

The ‘first order term’ S(u) describes <strong>spatial</strong> <strong>in</strong>homogeneity <strong>and</strong>/or covariate effects. The ‘higher<br />

order term’ V (u,x) describes <strong>in</strong>ter<strong>po<strong>in</strong>t</strong> <strong>in</strong>teraction.<br />

The model with conditional <strong>in</strong>tensity (46) is fitted by call<strong>in</strong>g ppm <strong>in</strong> the form<br />

Copyright c○CSIRO 2008


140 Methods 9: fitt<strong>in</strong>g Gibbs models<br />

ppm(X, ~ terms, V)<br />

The first argument X is the <strong>po<strong>in</strong>t</strong> pattern dataset. The second argument ~terms is a model<br />

formula, specify<strong>in</strong>g the first order term S(u) <strong>in</strong> (46), <strong>in</strong> the manner described <strong>in</strong> Section 13.<br />

Thus the first order term S(u) <strong>in</strong> (46) may take very general forms.<br />

The third argument V is an object <strong>of</strong> the special class "<strong>in</strong>teract" which describes the<br />

<strong>in</strong>ter<strong>po<strong>in</strong>t</strong> <strong>in</strong>teraction term V (u,x) <strong>in</strong> (46). It may be compared to the ‘family’ argument<br />

which determ<strong>in</strong>es the distribution <strong>of</strong> the responses <strong>in</strong> a l<strong>in</strong>ear model or generalised l<strong>in</strong>ear model.<br />

Only a limited number <strong>of</strong> canned <strong>in</strong>teractions are available <strong>in</strong> spatstat, because they must be<br />

constructed carefully to ensure that the <strong>po<strong>in</strong>t</strong> process exists.<br />

To fit the Strauss process to the cells data us<strong>in</strong>g ppm,<br />

> data(cells)<br />

> ppm(cells, ~1, Strauss(r = 0.1))<br />

Stationary Strauss process<br />

First order term:<br />

beta<br />

762.6005<br />

Interaction: Strauss process<br />

<strong>in</strong>teraction distance: 0.1<br />

Fitted <strong>in</strong>teraction parameter gamma: 0.008<br />

Relevant coefficients:<br />

Interaction<br />

-4.825006<br />

Here Strauss is a special function that creates an ‘<strong>in</strong>teraction’ object (class "<strong>in</strong>teract")<br />

describ<strong>in</strong>g the <strong>in</strong>teraction structure <strong>of</strong> the Strauss process. Notice that we had to specify the<br />

value <strong>of</strong> the irregular parameter r (more about that later).<br />

To fit the <strong>in</strong>homogeneous Strauss process with conditional <strong>in</strong>tensity<br />

λ(u,x) = b(u)γ t(u,x)<br />

where, say, b(u) is logl<strong>in</strong>ear <strong>in</strong> the Cartesian coord<strong>in</strong>ates,<br />

we simply type<br />

> ppm(cells, ~x + y, Strauss(r = 0.1))<br />

Nonstationary Strauss process<br />

Trend formula: ~x + y<br />

Fitted coefficients for trend formula:<br />

(Intercept) x y<br />

6.2922384 0.5269869 0.1576416<br />

Copyright c○CSIRO 2008<br />

log b((x,y)) = β0 + β1x + β2y


21.3 Inter<strong>po<strong>in</strong>t</strong> <strong>in</strong>teractions 141<br />

Interaction: Strauss process<br />

<strong>in</strong>teraction distance: 0.1<br />

Fitted <strong>in</strong>teraction parameter gamma: 0.0082<br />

Relevant coefficients:<br />

Interaction<br />

-4.805565<br />

To fit an <strong>in</strong>homogeneous Strauss process with log-quadratic first order term,<br />

> ppm(cells, ~polynom(x, y, 2), Strauss(r = 0.1))<br />

Nonstationary Strauss process<br />

Trend formula: ~polynom(x, y, 2)<br />

Fitted coefficients for trend formula:<br />

(Intercept) polynom(x, y, 2)[x] polynom(x, y, 2)[y]<br />

5.9747220 -0.9375707 3.4732733<br />

polynom(x, y, 2)[x^2] polynom(x, y, 2)[x.y] polynom(x, y, 2)[y^2]<br />

1.4970947 -0.1838987 -3.3696109<br />

Interaction: Strauss process<br />

<strong>in</strong>teraction distance: 0.1<br />

Fitted <strong>in</strong>teraction parameter gamma: 0.0081<br />

Relevant coefficients:<br />

Interaction<br />

-4.812711<br />

21.3 Inter<strong>po<strong>in</strong>t</strong> <strong>in</strong>teractions<br />

Instead <strong>of</strong> Strauss we may use any <strong>of</strong> the follow<strong>in</strong>g functions to create an <strong>in</strong>teraction:<br />

Poisson() the Poisson <strong>po<strong>in</strong>t</strong> process (the default)<br />

Strauss() the Strauss process<br />

StraussHard() the Strauss/hard core <strong>po<strong>in</strong>t</strong> process<br />

S<strong>of</strong>tcore() pairwise <strong>in</strong>teraction, s<strong>of</strong>t core potential<br />

PairPiece() pairwise <strong>in</strong>teraction, piecewise constant<br />

DiggleGratton() Diggle-Gratton potential<br />

LennardJones() Lennard-Jones potential<br />

AreaInter() area-<strong>in</strong>teraction process<br />

Geyer() Geyer’s saturation process<br />

BadGey() hybrid Geyer saturation process<br />

SatPiece() multiscale saturation process<br />

OrdThresh() Ord process, threshold potential<br />

Pairwise() pairwise <strong>in</strong>teraction, user-supplied potential<br />

Saturated() general saturated model, user-supplied potential<br />

Ord() Ord model, user-supplied potential<br />

(There are two additional ones for multitype <strong>po<strong>in</strong>t</strong> processes, described <strong>in</strong> section 27.3.2.)<br />

Copyright c○CSIRO 2008


142 Methods 9: fitt<strong>in</strong>g Gibbs models<br />

The area-<strong>in</strong>teraction model <strong>and</strong> the Geyer saturation model are quite h<strong>and</strong>y, as they can be<br />

used to model both cluster<strong>in</strong>g <strong>and</strong> regularity.<br />

> data(redwood)<br />

> ppm(redwood, ~1, Geyer(r = 0.07, sat = 2))<br />

Stationary Geyer saturation process<br />

First order term:<br />

beta<br />

12.39488<br />

Interaction: Geyer saturation process<br />

<strong>in</strong>teraction distance: 0.07<br />

saturation parameter: 2<br />

Fitted <strong>in</strong>teraction parameter gamma: 2.9004<br />

Relevant coefficients:<br />

Interaction<br />

1.064845<br />

> ppm(redwood, ~1, AreaInter(r = 0.03))<br />

Stationary Area-<strong>in</strong>teraction process<br />

First order term:<br />

beta<br />

36.6887<br />

Interaction: Area-<strong>in</strong>teraction process<br />

disc radius: 0.03<br />

Fitted <strong>in</strong>teraction parameter eta: 15.6968<br />

Relevant coefficients:<br />

Interaction<br />

2.753459<br />

The pr<strong>in</strong>tout for the area-<strong>in</strong>teraction model uses the “scale-free” parameter eta def<strong>in</strong>ed by<br />

η = γ πr2<br />

where γ <strong>and</strong> r are the parameters appear<strong>in</strong>g <strong>in</strong> the def<strong>in</strong>ition (43). Values <strong>of</strong> η greater than 1<br />

suggest cluster<strong>in</strong>g.<br />

For more detailed explanation <strong>of</strong> modell<strong>in</strong>g, see [5].<br />

21.4 Fitted <strong>po<strong>in</strong>t</strong> process models<br />

The result <strong>of</strong> the ppm call is an object <strong>of</strong> class "ppm" (‘<strong>po<strong>in</strong>t</strong> process model’). This is very closely<br />

analogous to a fitted l<strong>in</strong>ear model (lm) or fitted generalised l<strong>in</strong>ear model (glm).<br />

St<strong>and</strong>ard R operations that are def<strong>in</strong>ed for fitted <strong>po<strong>in</strong>t</strong> process models (i.e. that have methods<br />

for the class "ppm") <strong>in</strong>clude:<br />

Copyright c○CSIRO 2008


21.4 Fitted <strong>po<strong>in</strong>t</strong> process models 143<br />

pr<strong>in</strong>t pr<strong>in</strong>t basic <strong>in</strong>formation<br />

summary pr<strong>in</strong>t detailed summary <strong>in</strong>formation<br />

plot plot the fitted (conditional) <strong>in</strong>tensity<br />

predict fitted (conditional) <strong>in</strong>tensity<br />

fitted fitted (conditional) <strong>in</strong>tensity at data <strong>po<strong>in</strong>t</strong>s<br />

update re-fit the model<br />

coef extract the fitted coefficient vector θ<br />

vcov variance-covariance matrix <strong>of</strong> θ<br />

anova analysis <strong>of</strong> deviance<br />

logLik evaluate log-pseudolikelihood<br />

model.matrix extract design matrix<br />

formula extract trend formula <strong>of</strong> model<br />

terms extract terms <strong>in</strong> model formula<br />

(the methods for anova <strong>and</strong> vcov are only available for Poisson models). The follow<strong>in</strong>g<br />

functions are also available:<br />

step stepwise model selection<br />

drop1 one step backward <strong>in</strong> model selection<br />

model.images compute images <strong>of</strong> canonical covariates <strong>in</strong> model<br />

effectfun fitted <strong>in</strong>tensity as function <strong>of</strong> one covariate<br />

Plott<strong>in</strong>g a fitted model generates a series <strong>of</strong> image <strong>and</strong> contour plots <strong>of</strong><br />

• the fitted first order term exp(ˆη · S(u))<br />

• the fitted conditional <strong>in</strong>tensity λˆ θ (u,x) evaluated for the data pattern x<br />

For Poisson models, the two plots are equivalent, <strong>and</strong> give the fitted <strong>in</strong>tensity function.<br />

> fit par(mfrow = c(1, 2))<br />

> plot(fit, how = "image", ngrid = 256)<br />

Fitted trend<br />

400 600 800 1000 1400<br />

Fitted cif<br />

For non-Poisson models, it is also possible to extract <strong>and</strong> plot the <strong>in</strong>ter<strong>po<strong>in</strong>t</strong> <strong>in</strong>teraction<br />

function, us<strong>in</strong>g fit<strong>in</strong>.<br />

> model f plot(f)<br />

0 500 1000<br />

Copyright c○CSIRO 2008


144 Methods 9: fitt<strong>in</strong>g Gibbs models<br />

Pairwise <strong>in</strong>teraction<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

0 20 40 60 80 100 120<br />

Distance<br />

21.5 Simulation from fitted models<br />

A fitted Gibbs model can also be simulated automatically us<strong>in</strong>g rmh.<br />

> fit Xsim plot(Xsim, ma<strong>in</strong> = "Simulation from fitted Strauss model")<br />

Simulation from fitted Strauss model<br />

The envelope comm<strong>and</strong> will also generate simulation envelopes for a fitted model.<br />

> plot(envelope(fit, nsim = 39))<br />

Copyright c○CSIRO 2008


21.6 Deal<strong>in</strong>g with nuisance parameters 145<br />

K(r)<br />

0 500 1000 1500<br />

envelope(fit, nsim = 39)<br />

0 5 10 15 20<br />

r (one unit = 0.1 metres)<br />

21.6 Deal<strong>in</strong>g with nuisance parameters<br />

Irregular parameters, such as the <strong>in</strong>teraction radius r <strong>in</strong> the Strauss process, cannot be estimated<br />

directly us<strong>in</strong>g ppm. Indeed the statistical theory for estimat<strong>in</strong>g such parameters is unclear.<br />

For some special cases, a maximum likelihood estimator <strong>of</strong> the nuisance parameter is available.<br />

For example, for the ‘hard core process’ (Strauss process with <strong>in</strong>teraction parameter γ = 0)<br />

with <strong>in</strong>teraction radius r, the maximum likelihood estimator is the m<strong>in</strong>imum nearest-neighbour<br />

distance. Thus the follow<strong>in</strong>g is a reasonable approach to the cells dataset:<br />

> rhat rhat ppm(cells, ~1, Strauss(r = rhat))<br />

Stationary Strauss process<br />

First order term:<br />

beta<br />

301.0949<br />

Interaction: Strauss process<br />

<strong>in</strong>teraction distance: 0.0836293018068393<br />

Fitted <strong>in</strong>teraction parameter gamma: 0<br />

Relevant coefficients:<br />

Interaction<br />

-20.77031<br />

The analogue <strong>of</strong> pr<strong>of</strong>ile likelihood, pr<strong>of</strong>ile pseudolikelihood, provides a general solution which<br />

may or may not perform well. If θ = (φ,η) where φ denotes the nuisance parameters <strong>and</strong> η the<br />

regular parameters, def<strong>in</strong>e the pr<strong>of</strong>ile log pseudolikelihood by<br />

PLP(φ,x) = max log PL ((φ,η);x) .<br />

η<br />

The right h<strong>and</strong> side can be computed, for each fixed value <strong>of</strong> φ, by the algorithm ppm. Then we<br />

just have to maximise PLP(φ) over φ. This is done by the comm<strong>and</strong> pr<strong>of</strong>ilepl:<br />

Copyright c○CSIRO 2008


146 Methods 9: fitt<strong>in</strong>g Gibbs models<br />

> data(simdat)<br />

> df pfit pfit<br />

Pr<strong>of</strong>ile log pseudolikelihood values<br />

for model: ppm(simdat, ~1, <strong>in</strong>teraction = Strauss)<br />

fitted with rbord= 2<br />

Interaction: Strauss<br />

with irregular parameter ’r’ <strong>in</strong> [0.05, 2]<br />

Optimum value <strong>of</strong> irregular parameter: r = 0.275<br />

The result is an object <strong>of</strong> classpr<strong>of</strong>ilepl conta<strong>in</strong><strong>in</strong>g the pr<strong>of</strong>ile log pseudolikelihood function,<br />

the optimised value <strong>of</strong> the irregular parameter r, <strong>and</strong> the f<strong>in</strong>al fitted model. To plot the pr<strong>of</strong>ile<br />

log pseudolikelihood,<br />

> plot(pfit)<br />

log PL<br />

−17.5 −16.5 −15.5 −14.5<br />

ppm(simdat, ~1, <strong>in</strong>teraction = Strauss)<br />

0.0 0.5 1.0 1.5 2.0<br />

To extract the f<strong>in</strong>al fitted model,<br />

> pfit$fit<br />

Stationary Strauss process<br />

First order term:<br />

beta<br />

2.583110<br />

Interaction: Strauss process<br />

<strong>in</strong>teraction distance: 0.275<br />

Fitted <strong>in</strong>teraction parameter gamma: 0.5631<br />

Relevant coefficients:<br />

Interaction<br />

-0.5743608<br />

There is a summary method for these objects as well.<br />

Copyright c○CSIRO 2008<br />

r


21.7 Improvements over maximum pseudolikelihood 147<br />

21.7 Improvements over maximum pseudolikelihood<br />

Maximum pseudolikelihood is quick <strong>and</strong> dirty. There are statistically more efficient alternatives,<br />

but they are computationally <strong>in</strong>tensive.<br />

Currently we have implemented the easiest <strong>of</strong> these alternatives, the Huang-Ogata [30] onestep<br />

approximation to maximum likelihood. Start<strong>in</strong>g from the maximum pseudolikelihood estimate<br />

ˆ θPL, we simulate M <strong>in</strong>dependent realisations <strong>of</strong> the model with parameters ˆ θPL, evaluate<br />

the canonical sufficient statistics, <strong>and</strong> use them to form estimates <strong>of</strong> the score <strong>and</strong> Fisher <strong>in</strong>formation<br />

at θ = ˆ θPL. Then we take one Newton-Raphson step, updat<strong>in</strong>g the value <strong>of</strong> θ. The<br />

rationale is that the log-likelihood is approximately quadratic <strong>in</strong> a neighbourhood <strong>of</strong> the maximum<br />

pseudolikelihood estimator, so that one Newton-Raphson step is almost enough.<br />

To use the Huang-Ogata method <strong>in</strong>stead <strong>of</strong> maximum pseudolikelihood, add the argument<br />

method="ho".<br />

> fit fit<br />

Stationary Strauss process<br />

First order term:<br />

beta<br />

2.515173<br />

Interaction: Strauss process<br />

<strong>in</strong>teraction distance: 0.275<br />

Fitted <strong>in</strong>teraction parameter gamma: 0.676<br />

Relevant coefficients:<br />

Interaction<br />

-0.3916353<br />

> vcov(fit)<br />

[,1] [,2]<br />

[1,] 0.01058897 -0.01236823<br />

[2,] -0.01236823 0.03235970<br />

For models fitted by Huang-Ogata, the variance-covariance matrix returned by vcov is computed<br />

from the simulations.<br />

Copyright c○CSIRO 2008


148 Methods 10: validation <strong>of</strong> fitted Gibbs models<br />

22 Methods 10: validation <strong>of</strong> fitted Gibbs models<br />

Goodness-<strong>of</strong>-fit test<strong>in</strong>g <strong>and</strong> model validation for Poisson models were described <strong>in</strong> Section 14.<br />

Check<strong>in</strong>g a fitted Gibbs <strong>po<strong>in</strong>t</strong> process model is more difficult. There is little theory available to<br />

support goodness-<strong>of</strong>-fit tests <strong>and</strong> the like.<br />

As an example, consider the follow<strong>in</strong>g data:<br />

> data(residualspaper)<br />

> X plot(X)<br />

X<br />

We fit a Strauss process model with a log-quadratic <strong>in</strong>tensity term:<br />

> fit plot(envelope(fit, Lest, nsim = 19, global = TRUE))<br />

Copyright c○CSIRO 2008


22.1 Goodness-<strong>of</strong>-fit test<strong>in</strong>g for Gibbs processes 149<br />

L(r)<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

envelope(fit, Lest, nsim=19, global=TRUE)<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

r<br />

Let’s subtract the theoretical Poisson value L(r) = r to get a more readable plot:<br />

> plot(envelope(fit, Lest, nsim = 19, global = TRUE), . - r ~ r)<br />

cb<strong>in</strong>d(obs, mmean, hi, lo) − r<br />

−0.02 −0.01 0.00 0.01 0.02<br />

envelope(fit, Lest, nsim=19, global=TRUE)<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

r<br />

This is fairly consistent with a Strauss process.<br />

Copyright c○CSIRO 2008


150 Methods 10: validation <strong>of</strong> fitted Gibbs models<br />

22.2 Residuals for Gibbs processes<br />

22.2.1 Def<strong>in</strong>ition<br />

Residuals for a general Gibbs model were def<strong>in</strong>ed only recently [6, 1]. The total residual <strong>in</strong> a<br />

region B ⊂ R 2 is def<strong>in</strong>ed as<br />

<br />

R(B) = n(x ∩ B) − λ(u,x)du (47)<br />

B<br />

where aga<strong>in</strong> n(x ∩ B) is the observed number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>in</strong> the region B, <strong>and</strong> λ(u,x) is the<br />

conditional <strong>in</strong>tensity <strong>of</strong> the fitted model, evaluated for the data <strong>po<strong>in</strong>t</strong> pattern x. If the fitted<br />

model is correct, the residuals have mean zero.<br />

This def<strong>in</strong>ition is similar to the def<strong>in</strong>ition <strong>of</strong> residuals for Poisson processes (Section 14.2)<br />

except that the <strong>in</strong>tensity λ(u) <strong>of</strong> the fitted Poisson process has been replaced by the conditional<br />

<strong>in</strong>tensity λ(u,x) <strong>of</strong> the fitted Gibbs process evaluated for the data <strong>po<strong>in</strong>t</strong> pattern x.<br />

22.2.2 Residual plots<br />

Residuals for Gibbs processes can be plotted us<strong>in</strong>g the same techniques as <strong>in</strong> Section 14.2. Here<br />

is the four-panel plot:<br />

> diagnose.ppm(fit, type = "Pearson")<br />

Copyright c○CSIRO 2008


22.2 Residuals for Gibbs processes 151<br />

cumulative sum <strong>of</strong> Pearson residuals<br />

−2 −1.5 −1 −0.5 0 0.5<br />

0 0.2 0.4 0.6 0.8 1<br />

x coord<strong>in</strong>ate<br />

−4<br />

−2<br />

4<br />

5<br />

1<br />

2<br />

0<br />

3<br />

−2<br />

cumulative sum <strong>of</strong> Pearson residuals<br />

0.5 0 −0.5 −1 −1.5<br />

−1<br />

At the time <strong>of</strong> writ<strong>in</strong>g, spatstat does not yet display 2σ significance b<strong>and</strong>s for the lurk<strong>in</strong>g<br />

variable plots when the fitted model is not Poisson. The <strong>in</strong>terpretation <strong>of</strong> the lurk<strong>in</strong>g variable<br />

plots is a little more difficult without the significance b<strong>and</strong>s. One tends to place a little more<br />

emphasis on the smoothed residual field. The Pearson residuals should be approximately st<strong>and</strong>ardised,<br />

so that values which are much greater than 2 (<strong>in</strong> absolute value) suggest a lack <strong>of</strong><br />

fit.<br />

The four-panel plot above suggests that the model is a reasonable fit.<br />

22.2.3 Q–Q plots<br />

As we noted <strong>in</strong> Section 14.2.6, the four-panel residual plot <strong>and</strong> the lurk<strong>in</strong>g variable plot are<br />

useful for detect<strong>in</strong>g misspecification <strong>of</strong> the trend <strong>in</strong> a fitted model. They are not very useful for<br />

check<strong>in</strong>g misspecification <strong>of</strong> the <strong>in</strong>teraction <strong>in</strong> a fitted model.<br />

An extreme example is provided by the cells dataset. The residual plots for a uniform<br />

Poisson process fitted to the cells data suggest that this is a good model:<br />

> data(cells)<br />

> fitPois diagnose.ppm(fitPois)<br />

−5<br />

−4<br />

−3<br />

−3<br />

−5<br />

−4<br />

−2<br />

−1<br />

0<br />

2<br />

4<br />

6<br />

5<br />

7<br />

1<br />

3<br />

0<br />

−1<br />

0 0.2 0.4 0.6 0.8 1<br />

y coord<strong>in</strong>ate<br />

Copyright c○CSIRO 2008


152 Methods 10: validation <strong>of</strong> fitted Gibbs models<br />

cumulative sum <strong>of</strong> raw residuals<br />

−8 −4 0 2 4 6 8<br />

0 0.2 0.4 0.6 0.8 1<br />

x coord<strong>in</strong>ate<br />

cumulative sum <strong>of</strong> raw residuals<br />

8 6 4 2 0 −4 −8<br />

However, the K-function shows that the cells dataset is clearly not a Poisson pattern, but<br />

has strong <strong>in</strong>hibition:<br />

> par(mfrow = c(1, 2))<br />

> plot(cells)<br />

> plot(Kest(cells))<br />

> par(mfrow = c(1, 1))<br />

Copyright c○CSIRO 2008<br />

−10<br />

−5<br />

−5<br />

−10<br />

−15<br />

0<br />

10<br />

−5<br />

5<br />

5<br />

−20<br />

−10<br />

0<br />

0<br />

10<br />

−10<br />

0 0.2 0.4 0.6 0.8 1<br />

y coord<strong>in</strong>ate


22.2 Residuals for Gibbs processes 153<br />

cells<br />

K(r)<br />

0.00 0.05 0.10 0.15 0.20<br />

Kest(cells)<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

Interaction between <strong>po<strong>in</strong>t</strong>s <strong>in</strong> a <strong>po<strong>in</strong>t</strong> process corresponds roughly to the distribution <strong>of</strong> the<br />

responses <strong>in</strong> logl<strong>in</strong>ear regression. To validate the <strong>in</strong>teraction terms <strong>in</strong> a <strong>po<strong>in</strong>t</strong> process model, we<br />

should plot the distribution <strong>of</strong> the residuals. The appropriate tool is a Q–Q plot.<br />

> qqplot.ppm(fitPois, nsim = 39)<br />

data quantile<br />

−30 −20 −10 0 10 20 30 40<br />

−20 0 20 40<br />

Mean quantile <strong>of</strong> simulations<br />

Residuals: raw<br />

r<br />

Copyright c○CSIRO 2008


154 Methods 10: validation <strong>of</strong> fitted Gibbs models<br />

This shows a Q–Q plot <strong>of</strong> the smoothed residuals for a uniform Poisson model fitted to the<br />

cells data, with <strong>po<strong>in</strong>t</strong>wise 5% critical envelopes from simulations <strong>of</strong> the fitted model. This<br />

<strong>in</strong>dicates that the uniform Poisson model is grossly <strong>in</strong>appropriate for the cells data.<br />

Return<strong>in</strong>g to the model we fitted at the start <strong>of</strong> this chapter:<br />

> qqplot.ppm(fit, nsim = 39)<br />

data quantile<br />

−100 −50 0 50 100<br />

qqplot.ppm(fit, nsim=39)<br />

−100 −50 0 50 100 150<br />

Mean quantile <strong>of</strong> simulations<br />

Residuals: raw<br />

This shows a Q–Q plot <strong>of</strong> the smoothed residuals, with <strong>po<strong>in</strong>t</strong>wise 5% critical envelopes from<br />

simulations <strong>of</strong> the fitted model. This suggests that the Strauss model is reasonable.<br />

These validation techniques generalise <strong>and</strong> unify many exist<strong>in</strong>g exploratory methods. For<br />

particular models <strong>of</strong> <strong>in</strong>ter<strong>po<strong>in</strong>t</strong> <strong>in</strong>teraction, the Q–Q plot is closely related to the summary<br />

functions F, G <strong>and</strong> K. See [6].<br />

Copyright c○CSIRO 2008


22.2 Residuals for Gibbs processes 155<br />

PART V. MARKED POINT PATTERNS<br />

Part V <strong>of</strong> the workshop deals with marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>.<br />

Copyright c○CSIRO 2008


156 Marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

23 Marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

23.1 Marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

Each <strong>po<strong>in</strong>t</strong> <strong>in</strong> a <strong>spatial</strong> <strong>po<strong>in</strong>t</strong> pattern may carry additional <strong>in</strong>formation called a ‘mark’. For<br />

example, <strong>po<strong>in</strong>t</strong>s which are classified <strong>in</strong>to two or more different types (on/<strong>of</strong>f, case/control, species,<br />

colour, etc) may be regarded as marked <strong>po<strong>in</strong>t</strong>s, with a mark which identifies which type they<br />

are. Data record<strong>in</strong>g the locations <strong>and</strong> heights <strong>of</strong> trees <strong>in</strong> a forest can be regarded as a marked<br />

<strong>po<strong>in</strong>t</strong> pattern where the mark attached to a tree’s location is the tree height.<br />

In our current implementation, the mark attached to each <strong>po<strong>in</strong>t</strong> must be a s<strong>in</strong>gle value (which<br />

may be numeric, character, complex, logical, or factor). Many <strong>of</strong> the functions <strong>in</strong> spatstat<br />

h<strong>and</strong>le marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> <strong>in</strong> which the mark attached to each <strong>po<strong>in</strong>t</strong> is either<br />

• a cont<strong>in</strong>uous variate or “real number”. An example is the Longleaf P<strong>in</strong>es dataset<br />

(longleaf) <strong>in</strong> which each tree is marked with its diameter at breast height. The marks<br />

component must be a numeric vector such that marks[i] is the mark value associated<br />

with the ith <strong>po<strong>in</strong>t</strong>. We say the <strong>po<strong>in</strong>t</strong> pattern has cont<strong>in</strong>uous marks.<br />

• a categorical variate. An example is the Amacr<strong>in</strong>e Cells dataset (amacr<strong>in</strong>e) <strong>in</strong> which<br />

each cell is identified as either “on” or “<strong>of</strong>f”. Such <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> may be regarded as<br />

consist<strong>in</strong>g <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>of</strong> different “types”. The marks component must be a factor such<br />

that marks[i] is the label or type <strong>of</strong> the ith <strong>po<strong>in</strong>t</strong>. We call this a multitype <strong>po<strong>in</strong>t</strong> pattern<br />

<strong>and</strong> the levels <strong>of</strong> the factor are the possible types.<br />

longleaf<br />

amacr<strong>in</strong>e<br />

Note that, <strong>in</strong> some other packages, a <strong>po<strong>in</strong>t</strong> pattern dataset consist<strong>in</strong>g <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>of</strong> two different<br />

types (A <strong>and</strong> B say) is represented by two datasets, one represent<strong>in</strong>g the <strong>po<strong>in</strong>t</strong>s <strong>of</strong> type A <strong>and</strong><br />

another conta<strong>in</strong><strong>in</strong>g the <strong>po<strong>in</strong>t</strong>s <strong>of</strong> type B. In spatstat we take a different approach, <strong>in</strong> which<br />

all the <strong>po<strong>in</strong>t</strong>s are collected together <strong>in</strong> one <strong>po<strong>in</strong>t</strong> pattern, <strong>and</strong> the <strong>po<strong>in</strong>t</strong>s are then labelled by<br />

the type to which they belong. An advantage <strong>of</strong> this approach is that it is easy to deal with<br />

multitype <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> with more than 2 types. For example the classic Lans<strong>in</strong>g Woods dataset<br />

represents the positions <strong>of</strong> trees <strong>of</strong> 6 different species. This is available <strong>in</strong> spatstat as a s<strong>in</strong>gle<br />

dataset, a marked <strong>po<strong>in</strong>t</strong> pattern, with the marks hav<strong>in</strong>g 6 levels.<br />

23.2 Formulation<br />

A mark variable may be <strong>in</strong>terpreted as an additional coord<strong>in</strong>ate for the <strong>po<strong>in</strong>t</strong>: for example<br />

a <strong>po<strong>in</strong>t</strong> process <strong>of</strong> earthquake epicentre locations (longitude, latitude), with marks giv<strong>in</strong>g the<br />

Copyright c○CSIRO 2008


23.3 Methodological issues 157<br />

occurrence time <strong>of</strong> each earthquake, can alternatively be viewed as a <strong>po<strong>in</strong>t</strong> process <strong>in</strong> space-time<br />

with coord<strong>in</strong>ates (longitude, latitude, time).<br />

A marked <strong>po<strong>in</strong>t</strong> process <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>in</strong> space S with marks belong<strong>in</strong>g to a set M is mathematically<br />

def<strong>in</strong>ed as a <strong>po<strong>in</strong>t</strong> process <strong>in</strong> the cartesian product S × M. The space M <strong>of</strong> possible marks<br />

may be ‘anyth<strong>in</strong>g’. In current applications, typically the mark is either a categorical variable<br />

(so that the <strong>po<strong>in</strong>t</strong>s are grouped <strong>in</strong>to ‘types’) or a real number. Multivariate marks consist<strong>in</strong>g <strong>of</strong><br />

several such variables are also common.<br />

A marked <strong>po<strong>in</strong>t</strong> pattern is an unordered set<br />

y = {(x1,m1),... ,(xn,mn)}, xi ∈ W, mi ∈ M<br />

where xi are the locations <strong>and</strong> mi are the correspond<strong>in</strong>g marks.<br />

23.3 Methodological issues<br />

23.3.1 Should the data be treated as a marked <strong>po<strong>in</strong>t</strong> process?<br />

In a marked <strong>po<strong>in</strong>t</strong> process the <strong>po<strong>in</strong>t</strong>s are r<strong>and</strong>om. Treat<strong>in</strong>g the data as a <strong>po<strong>in</strong>t</strong> process is<br />

<strong>in</strong>appropriate if the locations are fixed, or if the locations are not part <strong>of</strong> the ‘response’.<br />

Example 16 Today’s maximum temperatures at 25 Australian cities are displayed on a map.<br />

This is not a <strong>po<strong>in</strong>t</strong> process <strong>in</strong> any useful sense. The cities are fixed locations. The temperatures<br />

are observations <strong>of</strong> a <strong>spatial</strong> variable at a fixed set <strong>of</strong> locations. See the R packages sp,<br />

spdep, spgwr for suitable methods.<br />

Example 17 A m<strong>in</strong>eral exploration dataset records the map coord<strong>in</strong>ates where 15 core samples<br />

were drilled, <strong>and</strong> for each core sample, the assayed concentration <strong>of</strong> iron <strong>in</strong> the sample.<br />

This should not be treated as a <strong>po<strong>in</strong>t</strong> process. The core sample locations were chosen by a<br />

geologist, <strong>and</strong> are part <strong>of</strong> the experimental design. The ma<strong>in</strong> <strong>in</strong>terest is <strong>in</strong> the iron concentration<br />

at these locations. This should probably be analysed as a geostatistical dataset. See the R<br />

packages geoR, geoRglm for suitable methods.<br />

23.3.2 Jo<strong>in</strong>t vs. conditional analysis<br />

There are more choices for analysis (<strong>and</strong> more traps) when marks are present. Schematically, if<br />

we write X for the <strong>po<strong>in</strong>t</strong>s <strong>and</strong> M for the marks, then a statistical model for the marked <strong>po<strong>in</strong>t</strong><br />

pattern could be formulated <strong>in</strong> several ways:<br />

• [X] [M|X] — ‘conditional on locations’ — <strong>po<strong>in</strong>t</strong>s X are first generated accord<strong>in</strong>g to a<br />

<strong>spatial</strong> <strong>po<strong>in</strong>t</strong> process, then marks M are ‘assigned’ to the <strong>po<strong>in</strong>t</strong>s by a r<strong>and</strong>om mechanism<br />

[M|X];<br />

• [M] [X|M] — ‘conditional on marks’ or ‘split by marks’ — marks M are first generated<br />

accord<strong>in</strong>g to some r<strong>and</strong>om mechanism [M], then they are placed at certa<strong>in</strong> locations X by<br />

<strong>po<strong>in</strong>t</strong> process(es) [X|M];<br />

• [X,M] — ‘jo<strong>in</strong>t’ — marked <strong>po<strong>in</strong>t</strong>s are generated accord<strong>in</strong>g to a marked <strong>po<strong>in</strong>t</strong> process.<br />

These approaches typically lead to different stochastic models <strong>and</strong> have different <strong>in</strong>ferential<br />

<strong>in</strong>terpretations. Correspond<strong>in</strong>gly, there are different null hypotheses that can be tested:<br />

Copyright c○CSIRO 2008


158 Marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

• r<strong>and</strong>om labell<strong>in</strong>g: given the locations X, the marks are conditionally <strong>in</strong>dependent <strong>and</strong><br />

identically distributed;<br />

• <strong>in</strong>dependence <strong>of</strong> components: the sub-processes Xm <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>of</strong> each mark m, are <strong>in</strong>dependent<br />

<strong>po<strong>in</strong>t</strong> processes;<br />

• complete <strong>spatial</strong> r<strong>and</strong>omness <strong>and</strong> <strong>in</strong>dependence (CSRI): the locations X are a uniform<br />

Poisson <strong>po<strong>in</strong>t</strong> process, <strong>and</strong> the marks are <strong>in</strong>dependent <strong>and</strong> identically distributed. (This<br />

implies both r<strong>and</strong>om labell<strong>in</strong>g <strong>and</strong> <strong>in</strong>dependence <strong>of</strong> components).<br />

These null hypotheses are not equivalent.<br />

The properties <strong>of</strong> r<strong>and</strong>om labell<strong>in</strong>g <strong>and</strong> <strong>in</strong>dependence <strong>of</strong> components are not equivalent. For<br />

example, take a <strong>po<strong>in</strong>t</strong> process X where nearest neighbour distances are always larger than a<br />

threshold r, <strong>and</strong> attach r<strong>and</strong>om marks to the <strong>po<strong>in</strong>t</strong>s. The result<strong>in</strong>g marked <strong>po<strong>in</strong>t</strong> process cannot<br />

be generated us<strong>in</strong>g the <strong>in</strong>dependence construction, because if <strong>po<strong>in</strong>t</strong>s with different marks are<br />

<strong>in</strong>dependent, they can come arbitrarily close to one another.<br />

Example 18 (Ant nests data) Two species <strong>of</strong> ants build nests <strong>in</strong> a desert. We want to <strong>in</strong>vestigate<br />

ecological <strong>in</strong>teraction between the species, <strong>and</strong> between different nests <strong>of</strong> the same species.<br />

The locations <strong>of</strong> all nests are mapped, <strong>and</strong> marked by the species.<br />

These data can be analysed as a marked <strong>po<strong>in</strong>t</strong> process consist<strong>in</strong>g <strong>of</strong> two different types <strong>of</strong><br />

<strong>po<strong>in</strong>t</strong>s. The ‘mark’ attached to each <strong>po<strong>in</strong>t</strong> is its species (a categorical variable). The most<br />

natural k<strong>in</strong>d <strong>of</strong> modell<strong>in</strong>g <strong>and</strong> analysis is either jo<strong>in</strong>t [X,M] or split by species [M] [X|M]. We<br />

could also treat one <strong>of</strong> the species as a covariate <strong>and</strong> analyse the other species conditional on it.<br />

Example 19 Trees <strong>in</strong> an orchard are exam<strong>in</strong>ed <strong>and</strong> their disease status (<strong>in</strong>fected/not <strong>in</strong>fected)<br />

is recorded. We are <strong>in</strong>terested <strong>in</strong> the <strong>spatial</strong> characteristics <strong>of</strong> the disease, such as contagion<br />

between neighbour<strong>in</strong>g trees.<br />

These data probably should not be treated as a <strong>po<strong>in</strong>t</strong> process. The response is ‘disease<br />

status’. We can th<strong>in</strong>k <strong>of</strong> disease status as a label applied to the trees after their locations have<br />

been determ<strong>in</strong>ed. S<strong>in</strong>ce we are <strong>in</strong>terested <strong>in</strong> the <strong>spatial</strong> correlation <strong>of</strong> disease status, the tree<br />

locations are effectively fixed covariate values. It would probably be best to treat these data<br />

as a discrete r<strong>and</strong>om field (<strong>of</strong> disease status values) observed at a f<strong>in</strong>ite known set <strong>of</strong> sites (the<br />

trees).<br />

23.3.3 Grey areas<br />

There are some ‘grey areas’ which permit several alternative choices <strong>of</strong> analysis. It could be<br />

appropriate either to analyse the locations <strong>and</strong> marks jo<strong>in</strong>tly (denoted [X,M]), or to analyse<br />

the marks conditional on the locations ([M|X]) or to analyse the locations given the marks<br />

([X|M]).<br />

One grey area occurs when the locations are r<strong>and</strong>om, but may be ancillary for the parameters<br />

<strong>of</strong> <strong>in</strong>terest.<br />

Example 20 Case-control study <strong>of</strong> cancer [22, 26]. The domicile locations <strong>of</strong> all new cases<br />

<strong>of</strong> a rare cancer are mapped. To allow for <strong>spatial</strong> variation <strong>in</strong> the density <strong>of</strong> the susceptible<br />

population, domicile locations are recorded for a r<strong>and</strong>om sample <strong>of</strong> (matched) controls.<br />

Copyright c○CSIRO 2008


23.3 Methodological issues 159<br />

This can be analysed either as a marked <strong>po<strong>in</strong>t</strong> pattern (where the mark is the case/control<br />

label) or, by condition<strong>in</strong>g on locations, as a r<strong>and</strong>om field <strong>of</strong> case/control values attached to the<br />

known domicile locations.<br />

Chorley−Ribble Data<br />

Copyright c○CSIRO 2008


160 H<strong>and</strong>l<strong>in</strong>g marked <strong>po<strong>in</strong>t</strong> pattern data<br />

24 H<strong>and</strong>l<strong>in</strong>g marked <strong>po<strong>in</strong>t</strong> pattern data<br />

This section expla<strong>in</strong>s how to create a marked <strong>po<strong>in</strong>t</strong> pattern dataset <strong>in</strong> spatstat, <strong>and</strong> how to<br />

manipulate it.<br />

24.1 Creat<strong>in</strong>g datasets<br />

In spatstat version 1, each <strong>po<strong>in</strong>t</strong> <strong>in</strong> a <strong>po<strong>in</strong>t</strong> pattern can be marked with a s<strong>in</strong>gle value (i.e.<br />

one mark value per <strong>po<strong>in</strong>t</strong>). The marks are stored <strong>in</strong> a vector, <strong>of</strong> the same length as the number<br />

<strong>of</strong> <strong>po<strong>in</strong>t</strong>s. The marks can be <strong>of</strong> any atomic type: numeric, <strong>in</strong>teger, character, factor, logical or<br />

complex.<br />

A marked <strong>po<strong>in</strong>t</strong> pattern dataset can be created us<strong>in</strong>g any <strong>of</strong> the follow<strong>in</strong>g tools:<br />

ppp create <strong>po<strong>in</strong>t</strong> pattern dataset<br />

as.ppp convert other data to <strong>po<strong>in</strong>t</strong> pattern<br />

superimpose comb<strong>in</strong>e several <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> <strong>in</strong>to a marked <strong>po<strong>in</strong>t</strong> pattern<br />

marks extract marks from a <strong>po<strong>in</strong>t</strong> pattern<br />

marks ppp(x, y, ..., marks = m)<br />

where x, y <strong>and</strong> m are vectors <strong>of</strong> equal length conta<strong>in</strong><strong>in</strong>g the (x,y) coord<strong>in</strong>ates <strong>and</strong> the correspond<strong>in</strong>g<br />

mark values, <strong>and</strong> ... are arguments that determ<strong>in</strong>e the w<strong>in</strong>dow for the <strong>po<strong>in</strong>t</strong> pattern.<br />

Tip: If the marks are <strong>in</strong>tended to be a categorical variable (represent<strong>in</strong>g the types<br />

<strong>in</strong> a multitype <strong>po<strong>in</strong>t</strong> pattern),<br />

• ensure that m is stored as a factor <strong>in</strong> R.<br />

• when the <strong>po<strong>in</strong>t</strong> pattern X has been created, check that it is multitype us<strong>in</strong>g<br />

is.multitype(X).<br />

• check that the factor levels are as you <strong>in</strong>tended, us<strong>in</strong>glevels(m) orlevels(marks(X))<br />

where X is the marked <strong>po<strong>in</strong>t</strong> pattern. If the factor levels are character str<strong>in</strong>gs,<br />

they will be sorted <strong>in</strong>to alphabetical order by default.<br />

• be careful when perform<strong>in</strong>g equality/<strong>in</strong>equality comparisons <strong>in</strong>volv<strong>in</strong>g a factor.<br />

Particular danger occurs when the factor levels are str<strong>in</strong>gs that represent<br />

<strong>in</strong>tegers.<br />

The comm<strong>and</strong> as.ppp will convert data <strong>in</strong> another format (for example, a 2-column or 3column<br />

matrix or data frame) to a <strong>po<strong>in</strong>t</strong> pattern object <strong>of</strong> class "ppp". The third column <strong>of</strong> a<br />

matrix or data frame will be <strong>in</strong>terpreted as conta<strong>in</strong><strong>in</strong>g the marks.<br />

> mydata as.ppp(mydata, square(1))<br />

Copyright c○CSIRO 2008


24.2 Inspect<strong>in</strong>g a marked <strong>po<strong>in</strong>t</strong> pattern 161<br />

marked planar <strong>po<strong>in</strong>t</strong> pattern: 10 <strong>po<strong>in</strong>t</strong>s<br />

multitype, with levels = a b c<br />

w<strong>in</strong>dow: rectangle = [0, 1] x [0, 1] units<br />

If <strong>po<strong>in</strong>t</strong> pattern data are stored <strong>in</strong> a text file, the comm<strong>and</strong> scanpp will read the data <strong>and</strong><br />

create a <strong>po<strong>in</strong>t</strong> pattern object <strong>of</strong> class "ppp". The argument multitype=TRUE will ensure that<br />

the mark values are <strong>in</strong>terpreted as a factor.<br />

> X amacr<strong>in</strong>e<br />

marked planar <strong>po<strong>in</strong>t</strong> pattern: 294 <strong>po<strong>in</strong>t</strong>s<br />

multitype, with levels = <strong>of</strong>f on<br />

w<strong>in</strong>dow: rectangle = [0, 1.6012] x [0, 1] units (one unit = 662 microns)<br />

> summary(amacr<strong>in</strong>e)<br />

Copyright c○CSIRO 2008


162 H<strong>and</strong>l<strong>in</strong>g marked <strong>po<strong>in</strong>t</strong> pattern data<br />

Marked planar <strong>po<strong>in</strong>t</strong> pattern: 294 <strong>po<strong>in</strong>t</strong>s<br />

Average <strong>in</strong>tensity 184 <strong>po<strong>in</strong>t</strong>s per square unit (one unit = 662 microns)<br />

Multitype:<br />

frequency proportion <strong>in</strong>tensity<br />

<strong>of</strong>f 142 0.483 88.7<br />

on 152 0.517 94.9<br />

W<strong>in</strong>dow: rectangle = [0, 1.6012] x [0, 1] units<br />

W<strong>in</strong>dow area = 1.60121 square units<br />

Unit <strong>of</strong> length: 662 microns<br />

> plot(amacr<strong>in</strong>e)<br />

<strong>of</strong>f on<br />

1 2<br />

amacr<strong>in</strong>e<br />

You can also convert a marked <strong>po<strong>in</strong>t</strong> pattern <strong>in</strong>to a data frame for closer <strong>in</strong>spection <strong>of</strong> the<br />

coord<strong>in</strong>ates <strong>and</strong> mark values:<br />

> as.data.frame(amacr<strong>in</strong>e)<br />

x y marks<br />

1 0.0224 0.0243 on<br />

2 0.0243 0.1028 on<br />

3 0.1626 0.1477 on<br />

........<br />

The marks can be extracted us<strong>in</strong>g the function marks:<br />

> data(longleaf)<br />

> m


24.3 Manipulat<strong>in</strong>g data 163<br />

24.3 Manipulat<strong>in</strong>g data<br />

24.3.1 Manipulat<strong>in</strong>g marks<br />

The follow<strong>in</strong>g tools can manipulate the marks <strong>in</strong> a <strong>po<strong>in</strong>t</strong> pattern:<br />

marks extract marks<br />

marks data(lans<strong>in</strong>g)<br />

> d a marks(lans<strong>in</strong>g) data(amacr<strong>in</strong>e)<br />

> Y plot(split(amacr<strong>in</strong>e))<br />

split(amacr<strong>in</strong>e)<br />

<strong>of</strong>f on<br />

24.3.3 Cutt<strong>in</strong>g the numerical scale <strong>in</strong>to b<strong>and</strong>s<br />

For a <strong>po<strong>in</strong>t</strong> pattern with numeric marks, the marks can be converted to a factor, us<strong>in</strong>g a method<br />

for the generic function cut. The user specifies a series <strong>of</strong> cut-<strong>po<strong>in</strong>t</strong>s on the numerical scale; all<br />

mark values between two cut-<strong>po<strong>in</strong>t</strong>s are given the same label.<br />

Copyright c○CSIRO 2008


164 H<strong>and</strong>l<strong>in</strong>g marked <strong>po<strong>in</strong>t</strong> pattern data<br />

For example, the Longleaf P<strong>in</strong>es data are the locations <strong>of</strong> trees marked with their diameter<br />

at breast height, dbh, <strong>in</strong> centimetres. By convention we def<strong>in</strong>e “adult” trees to be those with<br />

dbh greater than 30 centimetres. To obta<strong>in</strong> the bivariate <strong>po<strong>in</strong>t</strong> pattern <strong>of</strong> adult <strong>and</strong> juvenile<br />

trees,<br />

> data(longleaf)<br />

> longleaf<br />

marked planar <strong>po<strong>in</strong>t</strong> pattern: 584 <strong>po<strong>in</strong>t</strong>s<br />

marks are numeric, <strong>of</strong> type ’double’<br />

w<strong>in</strong>dow: rectangle = [0, 200] x [0, 200] metres<br />

> X X<br />

marked planar <strong>po<strong>in</strong>t</strong> pattern: 584 <strong>po<strong>in</strong>t</strong>s<br />

multitype, with levels = juvenile adult<br />

w<strong>in</strong>dow: rectangle = [0, 200] x [0, 200] metres<br />

> par(mfrow = c(1, 2))<br />

> plot(longleaf)<br />

0 20 40 60 80<br />

0.000000 1.722522 3.445045 5.167567 6.890090<br />

> plot(X, ma<strong>in</strong> = "cut(longleaf)")<br />

juvenile adult<br />

1 2<br />

> par(mfrow = c(1, 1))<br />

Copyright c○CSIRO 2008<br />

longleaf cut(longleaf)


25 Methods 11: exploratory tools for marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

This section covers some tools for exploratory data analysis <strong>of</strong> marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>. Most <strong>of</strong><br />

the tools have been developed for the special case <strong>of</strong> multitype <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> (i.e. where the<br />

marks are categorical).<br />

25.1 Intensity<br />

The Lans<strong>in</strong>g Woods data give the locations <strong>of</strong> 6 species <strong>of</strong> trees <strong>in</strong> a forest <strong>in</strong> Michigan. Elementary<br />

estimates <strong>of</strong> the frequency distribution <strong>of</strong> species, <strong>and</strong> the <strong>in</strong>tensity <strong>of</strong> each species, are<br />

available from summary.ppp.<br />

> data(lans<strong>in</strong>g)<br />

> summary(lans<strong>in</strong>g)<br />

Marked planar <strong>po<strong>in</strong>t</strong> pattern: 2251 <strong>po<strong>in</strong>t</strong>s<br />

Average <strong>in</strong>tensity 2250 <strong>po<strong>in</strong>t</strong>s per square unit (one unit = 924 feet)<br />

*Pattern conta<strong>in</strong>s duplicated <strong>po<strong>in</strong>t</strong>s*<br />

Multitype:<br />

frequency proportion <strong>in</strong>tensity<br />

blackoak 135 0.0600 135<br />

hickory 703 0.3120 703<br />

maple 514 0.2280 514<br />

misc 105 0.0466 105<br />

redoak 346 0.1540 346<br />

whiteoak 448 0.1990 448<br />

W<strong>in</strong>dow: rectangle = [0, 1] x [0, 1] units<br />

W<strong>in</strong>dow area = 1 square unit<br />

Unit <strong>of</strong> length: 924 feet<br />

It’s sensible to exam<strong>in</strong>e the sub-<strong>patterns</strong> <strong>of</strong> different types separately, us<strong>in</strong>g split.ppp.<br />

> plot(split(lans<strong>in</strong>g))<br />

165<br />

Copyright c○CSIRO 2008


166 Methods 11: exploratory tools for marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

split(lans<strong>in</strong>g)<br />

blackoak hickory maple<br />

misc redoak whiteoak<br />

It would be useful to compute <strong>and</strong> plot a separate estimate <strong>of</strong> <strong>in</strong>tensity for each type <strong>of</strong> tree.<br />

This is possible us<strong>in</strong>g the functions density.splitppp <strong>and</strong> plot.list<strong>of</strong>. They are <strong>in</strong>voked<br />

simply by typ<strong>in</strong>g<br />

> plot(density(split(lans<strong>in</strong>g)), ribbon = FALSE)<br />

density(split(lans<strong>in</strong>g))<br />

blackoak hickory maple<br />

misc redoak whiteoak<br />

The relative proportions <strong>of</strong> <strong>in</strong>tensity can then be computed us<strong>in</strong>g eval.im:<br />

> Y attach(Y)<br />

Copyright c○CSIRO 2008


0 20 40 60 80<br />

25.2 Numeric marks: distribution <strong>and</strong> trend 167<br />

> pBlackoak plot(pBlackoak)<br />

> detach(Y)<br />

pBlackoak<br />

0.05 0.1 0.15<br />

Parametric estimates <strong>of</strong> <strong>in</strong>tensity can be obta<strong>in</strong>ed us<strong>in</strong>g ppm, fitt<strong>in</strong>g a Poisson model with<br />

an <strong>in</strong>tensity function that may depend on location <strong>and</strong>/or on the marks. See below.<br />

25.2 Numeric marks: distribution <strong>and</strong> trend<br />

For a <strong>po<strong>in</strong>t</strong> pattern with marks that are numeric (real numbers or <strong>in</strong>tegers) or logical values,<br />

the mark values can be extracted us<strong>in</strong>g the marks function <strong>and</strong> <strong>in</strong>spected us<strong>in</strong>g the histogram<br />

or kernel density estimate:<br />

> data(longleaf)<br />

> hist(marks(longleaf))<br />

Histogram <strong>of</strong> marks(longleaf)<br />

Copyright c○CSIRO 2008


168 Methods 11: exploratory tools for marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

To assess <strong>spatial</strong> trend <strong>in</strong> the marks, one way is to form a kernel regression smoother. The<br />

smoothed mark value at location u ∈ R2 is<br />

<br />

i m(u) =<br />

miκ(u − xi)<br />

κ(u − xi)<br />

<br />

i<br />

where k is the smooth<strong>in</strong>g kernel, <strong>and</strong> mi is the mark value at data <strong>po<strong>in</strong>t</strong> xi. This is computed<br />

by smooth.ppp:<br />

> plot(smooth.ppp(longleaf))<br />

smooth.ppp(longleaf)<br />

15 20 25 30 35 40<br />

You can also use cut.ppp followed by split.ppp to look for <strong>spatial</strong> <strong>in</strong>homogeneity <strong>of</strong> the<br />

marks:<br />

> data(spruces)<br />

> plot(split(cut(spruces, breaks = 3)))<br />

split(cut(spruces, breaks = 3))<br />

(0.16,0.23] (0.23,0.3] (0.3,0.37]<br />

25.3 Simple summaries <strong>of</strong> neighbour<strong>in</strong>g marks<br />

We are <strong>of</strong>ten <strong>in</strong>terested <strong>in</strong> the marks that are attached to the close neighbours <strong>of</strong> a typical <strong>po<strong>in</strong>t</strong>.<br />

For a multitype <strong>po<strong>in</strong>t</strong> pattern, the function marktable compiles a cont<strong>in</strong>gency table <strong>of</strong> the<br />

marks <strong>of</strong> all <strong>po<strong>in</strong>t</strong>s with<strong>in</strong> a given radius <strong>of</strong> each data <strong>po<strong>in</strong>t</strong>:<br />

Copyright c○CSIRO 2008


25.4 Summary functions 169<br />

> data(amacr<strong>in</strong>e)<br />

> M M[1:10, ]<br />

mark<br />

<strong>po<strong>in</strong>t</strong> <strong>of</strong>f on<br />

1 1 1<br />

2 2 2<br />

3 4 3<br />

4 3 1<br />

5 4 1<br />

6 2 3<br />

7 3 2<br />

8 1 1<br />

9 3 1<br />

10 3 2<br />

More general summaries <strong>of</strong> the marks <strong>of</strong> neighbours can be obta<strong>in</strong>ed us<strong>in</strong>g the function<br />

markstat. For example, to compute the average diameter <strong>of</strong> the 5 closest neighbours <strong>of</strong> each<br />

tree <strong>in</strong> the Longleaf P<strong>in</strong>es dataset,<br />

> md md[1:10]<br />

[1] 43.40 43.40 48.58 21.70 48.38 53.32 40.28 29.82 24.92 21.70<br />

25.4 Summary functions<br />

The summary functions F, G, J <strong>and</strong> K (<strong>and</strong> other functions derived from K, such as L <strong>and</strong> the<br />

pair correlation function) have been extended to multitype <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>.<br />

25.4.1 A pair <strong>of</strong> types<br />

Assume the multitype <strong>po<strong>in</strong>t</strong> process X is stationary. Let Xj denote the sub-pattern <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>of</strong><br />

type j, with <strong>in</strong>tensity λj. Then for any pair <strong>of</strong> types i <strong>and</strong> j,<br />

• Fj(r) is the empty space function for Xj.<br />

• Gij(r) is the distribution function <strong>of</strong> the distance from a <strong>po<strong>in</strong>t</strong> <strong>of</strong> type i to the nearest<br />

<strong>po<strong>in</strong>t</strong> <strong>of</strong> type j<br />

• Kij(r) is 1/λj times the expected number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>of</strong> type j with<strong>in</strong> a distance r <strong>of</strong> a<br />

typical <strong>po<strong>in</strong>t</strong> <strong>of</strong> type i.<br />

• Lij(r) is the correspond<strong>in</strong>g L-function<br />

<br />

Kij(r)<br />

Lij(r) = .<br />

π<br />

• gij(r) is the correspond<strong>in</strong>g analogue <strong>of</strong> the pair correlation function<br />

where K ′ ij (r) is the derivative <strong>of</strong> Kij.<br />

gij(r) = K′ ij (r)<br />

2πr<br />

Copyright c○CSIRO 2008


170 Methods 11: exploratory tools for marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

• Jij is def<strong>in</strong>ed as<br />

Jij(r) =<br />

1 − Gij(r)<br />

1 − Fj(r) .<br />

The functions Gij,Kij,Lij,gij,Jij are called “cross-type” or “i-to-j” summary functions. They<br />

are computed <strong>in</strong> spatstat by Gcross, Kcross, Lcross, pcfcross <strong>and</strong> Jcross respectively.<br />

> data(amacr<strong>in</strong>e)<br />

> amacr<strong>in</strong>e<br />

> plot(Gcross(amacr<strong>in</strong>e, "on", "<strong>of</strong>f"))<br />

Gcross["on", "<strong>of</strong>f"](r)<br />

0.0 0.2 0.4 0.6 0.8<br />

Gcross(amacr<strong>in</strong>e, "on", "<strong>of</strong>f")<br />

0.00 0.01 0.02 0.03 0.04 0.05 0.06<br />

r (one unit = 662 microns)<br />

The <strong>in</strong>terpretation <strong>of</strong> the cross-type summary functions is similar, but not identical, to that<br />

<strong>of</strong> the orig<strong>in</strong>al functions F, G, K etc:<br />

• if Xj is a uniform Poisson process (CSR), then Fj(r) = 1 − exp(−λjπr 2 ).<br />

• if Xj is a uniform Poisson process (CSR) <strong>and</strong> is <strong>in</strong>dependent <strong>of</strong> Xi, then Gij(r) = 1 −<br />

exp(−λjπr 2 ).<br />

• if Xi <strong>and</strong> Xj are <strong>in</strong>dependent, then Kij(r) = πr 2 <strong>and</strong> so Lij(r) = r <strong>and</strong> gij(r) = 1.<br />

• if Xi <strong>and</strong> Xj are <strong>in</strong>dependent, then Jij(r) = 1.<br />

Here ‘<strong>in</strong>dependent’ means that the two <strong>po<strong>in</strong>t</strong> processes are probabilistically <strong>in</strong>dependent.<br />

25.4.2 All pairs <strong>of</strong> types<br />

The comm<strong>and</strong> alltypes enables the user to compute the cross-type summary functions between<br />

all pairs <strong>of</strong> types simultaneously. For example, to compute Gij(r) for all i <strong>and</strong> j <strong>in</strong> the amacr<strong>in</strong>e<br />

cells data, we would use alltypes(amacr<strong>in</strong>e, "G"). The result is automatically displayed as<br />

an array <strong>of</strong> plot panels.<br />

> plot(alltypes(amacr<strong>in</strong>e, "G"))<br />

Copyright c○CSIRO 2008


25.4 Summary functions 171<br />

<strong>of</strong>f<br />

on<br />

km , rs , theo<br />

km , rs , theo<br />

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7<br />

0.0 0.2 0.4 0.6 0.8<br />

array <strong>of</strong> G functions for amacr<strong>in</strong>e.<br />

<strong>of</strong>f on<br />

0.00 0.01 0.02 0.03 0.04 0.05 0.06<br />

r (one unit = 662 microns)<br />

0.00 0.01 0.02 0.03 0.04 0.05 0.06<br />

r (one unit = 662 microns)<br />

km , rs , theo<br />

km , rs , theo<br />

0.0 0.2 0.4 0.6 0.8<br />

0.0 0.2 0.4 0.6<br />

0.00 0.01 0.02 0.03 0.04 0.05 0.06<br />

r (one unit = 662 microns)<br />

0.00 0.01 0.02 0.03 0.04 0.05 0.06<br />

r (one unit = 662 microns)<br />

The result <strong>of</strong> alltypes is a ‘function array’ (object <strong>of</strong> class "fasp") which can be <strong>in</strong>dexed<br />

by row <strong>and</strong> column subscripts. If the <strong>po<strong>in</strong>t</strong> pattern has a large number <strong>of</strong> possible types, you<br />

can compute the array <strong>of</strong> all possible pairwise G functions, then use the subscript operator to<br />

<strong>in</strong>spect a subset <strong>of</strong> the array.<br />

> data(lans<strong>in</strong>g)<br />

> a plot(a[2:3, 2:3])<br />

Copyright c○CSIRO 2008


172 Methods 11: exploratory tools for marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

hickory<br />

maple<br />

km , rs , theo<br />

km , rs , theo<br />

0.0 0.2 0.4 0.6 0.8<br />

0.0 0.2 0.4 0.6 0.8<br />

array <strong>of</strong> G functions for lans<strong>in</strong>g.<br />

0.000 0.005 0.010 0.015 0.020 0.025 0.030<br />

r (one unit = 924 feet)<br />

0.000 0.005 0.010 0.015 0.020 0.025 0.030<br />

r (one unit = 924 feet)<br />

25.4.3 One type to any type<br />

hickory maple<br />

Also def<strong>in</strong>ed are the “i-to-any” summaries<br />

km , rs , theo<br />

km , rs , theo<br />

0.0 0.2 0.4 0.6 0.8<br />

0.0 0.2 0.4 0.6 0.8<br />

0.000 0.005 0.010 0.015 0.020 0.025 0.030<br />

r (one unit = 924 feet)<br />

0.000 0.005 0.010 0.015 0.020 0.025 0.030<br />

r (one unit = 924 feet)<br />

• Gi•(r), the distribution function <strong>of</strong> the distance from a <strong>po<strong>in</strong>t</strong> <strong>of</strong> type i to the nearest other<br />

<strong>po<strong>in</strong>t</strong> <strong>of</strong> any type;<br />

• Ki•(r) is 1/λ times the expected number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>of</strong> any type with<strong>in</strong> a distance r <strong>of</strong> a<br />

typical <strong>po<strong>in</strong>t</strong> <strong>of</strong> type i. Here λ = <br />

j λj is the <strong>in</strong>tensity <strong>of</strong> the entire process X.<br />

• Li•(r) is the correspond<strong>in</strong>g L-function<br />

• Ji• def<strong>in</strong>ed by<br />

<br />

Ki•(r)<br />

Li•(r) = .<br />

π<br />

Ji•(r) =<br />

1 − Gi•<br />

1 − F(r)<br />

These are comput<strong>in</strong>g by Gdot, Kdot, Ldot <strong>and</strong> Jdot respectively, or us<strong>in</strong>g alltypes.<br />

> plot(Gdot(amacr<strong>in</strong>e, "on"))<br />

Copyright c○CSIRO 2008


25.4 Summary functions 173<br />

Gdot["on"](r)<br />

0.0 0.2 0.4 0.6 0.8<br />

Gdot(amacr<strong>in</strong>e, "on")<br />

0.00 0.01 0.02 0.03 0.04 0.05 0.06<br />

r (one unit = 662 microns)<br />

> plot(alltypes(amacr<strong>in</strong>e, "Gdot"))<br />

array <strong>of</strong> Gdot functions for amacr<strong>in</strong>e.<br />

<strong>of</strong>f<br />

on<br />

km , rs , theo<br />

km , rs , theo<br />

0.0 0.2 0.4 0.6 0.8<br />

0.0 0.2 0.4 0.6 0.8<br />

0.00 0.01 0.02 0.03 0.04 0.05 0.06<br />

r (one unit = 662 microns)<br />

0.00 0.01 0.02 0.03 0.04 0.05 0.06<br />

r (one unit = 662 microns)<br />

25.4.4 Plott<strong>in</strong>g <strong>and</strong> manipulat<strong>in</strong>g function arrays<br />

A function array (object <strong>of</strong> class "fasp") can be pr<strong>in</strong>ted <strong>and</strong> plotted us<strong>in</strong>g methods for this<br />

class. It can also be manipulated <strong>in</strong> various ways.<br />

The plot method is similar to plot.fv <strong>and</strong> allows the function values to be transformed:<br />

> aG fisher plot(aG, fisher(.) ~ fisher(theo))<br />

Copyright c○CSIRO 2008


174 Methods 11: exploratory tools for marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

<strong>of</strong>f<br />

on<br />

fisher(cb<strong>in</strong>d(km, rs, theo))<br />

fisher(cb<strong>in</strong>d(km, rs, theo))<br />

0.0 0.5 1.0 1.5<br />

0.0 0.5 1.0 1.5<br />

array <strong>of</strong> G functions for amacr<strong>in</strong>e.<br />

<strong>of</strong>f on<br />

0.0 0.5 1.0 1.5<br />

fisher(theo)<br />

0.0 0.5 1.0 1.5<br />

fisher(theo)<br />

fisher(cb<strong>in</strong>d(km, rs, theo))<br />

fisher(cb<strong>in</strong>d(km, rs, theo))<br />

0.0 0.5 1.0 1.5<br />

0.0 0.5 1.0 1.5<br />

0.0 0.5 1.0 1.5<br />

fisher(theo)<br />

0.0 0.5 1.0 1.5<br />

fisher(theo)<br />

As mentioned above, the function array can be <strong>in</strong>dexed by array subscripts.<br />

> data(lans<strong>in</strong>g)<br />

> a b aGfish


25.6 R<strong>and</strong>omisation tests 175<br />

> markcorr(X, f)<br />

where X is a <strong>po<strong>in</strong>t</strong> pattern <strong>and</strong> f is an R language function. For example, for the amacr<strong>in</strong>e<br />

data, the natural function f is f(m1,m2) = 1 {m1 = m2} which we encode as<br />

> eqfun M plot(M)<br />

m(r)<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

M<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

r (one unit = 662 microns)<br />

25.6 R<strong>and</strong>omisation tests<br />

Simulation envelopes <strong>of</strong> summary functions can be used to test various null hypotheses for<br />

marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>.<br />

25.6.1 Poisson null<br />

The null hypothesis <strong>of</strong> a homogeneous Poisson marked <strong>po<strong>in</strong>t</strong> process can be tested by direct<br />

simulation, us<strong>in</strong>g envelope as before. For example, us<strong>in</strong>g the cross-type K function as the test<br />

statistic,<br />

> data(amacr<strong>in</strong>e)<br />

> E plot(E, ma<strong>in</strong> = "test <strong>of</strong> marked Poisson model")<br />

Copyright c○CSIRO 2008


176 Methods 11: exploratory tools for marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

Kcross["on", "<strong>of</strong>f"](r)<br />

0.00 0.05 0.10 0.15 0.20<br />

test <strong>of</strong> marked Poisson model<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

r (one unit = 662 microns)<br />

Notice that the arguments i <strong>and</strong> j here do not match any <strong>of</strong> the formal arguments <strong>of</strong><br />

envelope, so they are passed toKcross. This has the effect <strong>of</strong> call<strong>in</strong>gKcross(X, i="on", j="<strong>of</strong>f")<br />

for each <strong>of</strong> the simulated <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> X. Each simulated pattern is generated by the homogeneous<br />

Poisson <strong>po<strong>in</strong>t</strong> process with <strong>in</strong>tensities estimated from the dataset amacr<strong>in</strong>e.<br />

25.6.2 Independence <strong>of</strong> components<br />

It’s also possible to test other null hypotheses by a r<strong>and</strong>omisation test. We discussed two popular<br />

null hypotheses:<br />

• r<strong>and</strong>om labell<strong>in</strong>g: given the locations X, the marks are conditionally <strong>in</strong>dependent <strong>and</strong><br />

identically distributed;<br />

• <strong>in</strong>dependence <strong>of</strong> components: the sub-processes Xm <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>of</strong> each mark m, are <strong>in</strong>dependent<br />

<strong>po<strong>in</strong>t</strong> processes.<br />

In a r<strong>and</strong>omisation test <strong>of</strong> the <strong>in</strong>dependence-<strong>of</strong>-components hypothesis, the simulated <strong>patterns</strong><br />

X are generated from the dataset by splitt<strong>in</strong>g the data <strong>in</strong>to sub-<strong>patterns</strong> <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>of</strong> one<br />

type, <strong>and</strong> r<strong>and</strong>omly shift<strong>in</strong>g these sub-<strong>patterns</strong>, <strong>in</strong>dependently <strong>of</strong> each other. The shift<strong>in</strong>g is<br />

performed by rshift:<br />

> E plot(E, ma<strong>in</strong> = "test <strong>of</strong> <strong>in</strong>dependent components")<br />

Copyright c○CSIRO 2008


25.6 R<strong>and</strong>omisation tests 177<br />

Kcross["on", "<strong>of</strong>f"](r)<br />

0.00 0.05 0.10 0.15 0.20<br />

test <strong>of</strong> <strong>in</strong>dependent components<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

r (one unit = 662 microns)<br />

The <strong>in</strong>dependence-<strong>of</strong>-components hypothesis seems to be accepted <strong>in</strong> this example.<br />

Under the <strong>in</strong>dependence hypothesis,<br />

Kij(r) = πr 2<br />

Gij(r) = Fj(r)<br />

Jij(r) ≡ 1.<br />

while the “i-to-any” functions have complicated values. Thus, we would normally use Kij or Jij<br />

to construct a test statistic for <strong>in</strong>dependence <strong>of</strong> components.<br />

25.6.3 R<strong>and</strong>om labell<strong>in</strong>g<br />

In a r<strong>and</strong>omisation test <strong>of</strong> the r<strong>and</strong>om labell<strong>in</strong>g null hypothesis, the simulated <strong>patterns</strong> X are<br />

generated from the dataset by hold<strong>in</strong>g the <strong>po<strong>in</strong>t</strong> locations fixed, <strong>and</strong> r<strong>and</strong>omly resampl<strong>in</strong>g the<br />

marks, either with replacement (<strong>in</strong>dependent r<strong>and</strong>om sampl<strong>in</strong>g) or without replacement (r<strong>and</strong>omly<br />

permut<strong>in</strong>g the marks). The resampl<strong>in</strong>g operation is performed by rlabel.<br />

Under r<strong>and</strong>om labell<strong>in</strong>g,<br />

Ji•(r) = J(r)<br />

Ki•(r) = K(r)<br />

Gi•(r) = G(r)<br />

(where G,K,J are the summary functions for the <strong>po<strong>in</strong>t</strong> process without marks) while the other,<br />

cross-type functions have complicated values. Thus, we would normally use someth<strong>in</strong>g like<br />

Ki•(r) − K(r) to construct a test statistic for r<strong>and</strong>om labell<strong>in</strong>g.<br />

To do this, cook up a little function to evaluate Ji•(r) − J(r):<br />

> Jdif


178 Methods 11: exploratory tools for marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

Jdot["on"](r) − J(r)<br />

−0.4 −0.2 0.0 0.2 0.4 0.6<br />

test <strong>of</strong> r<strong>and</strong>om labell<strong>in</strong>g<br />

0.00 0.01 0.02 0.03 0.04 0.05<br />

r (one unit = 662 microns)<br />

The r<strong>and</strong>om labell<strong>in</strong>g hypothesis also seems to be accepted.<br />

25.6.4 Arrays <strong>of</strong> envelopes<br />

To compute a simulation envelope for the function Kij for each pair <strong>of</strong> types i <strong>and</strong> j, use<br />

alltypes with the argument envelope=TRUE.<br />

> aE plot(aE, sqrt(./pi) - r ~ r, ylab = "L(r)-r")<br />

<strong>of</strong>f<br />

on<br />

L(r)−r<br />

L(r)−r<br />

−0.03 −0.02 −0.01 0.00 0.01<br />

−0.010 −0.005 0.000 0.005<br />

array <strong>of</strong> envelopes <strong>of</strong> Kcross functions for amacr<strong>in</strong>e.<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

r (one unit = 662 microns)<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

r (one unit = 662 microns)<br />

Copyright c○CSIRO 2008<br />

<strong>of</strong>f on<br />

L(r)−r<br />

L(r)−r<br />

−0.010 −0.005 0.000 0.005<br />

−0.03 −0.02 −0.01 0.00 0.01<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

r (one unit = 662 microns)<br />

0.00 0.05 0.10 0.15 0.20 0.25<br />

r (one unit = 662 microns)


26 Methods 12: multitype Poisson models<br />

This section covers multitype Poisson process models: basic properties, simulation, <strong>and</strong> fitt<strong>in</strong>g<br />

models to data.<br />

26.1 Theory<br />

26.1.1 Complete <strong>spatial</strong> r<strong>and</strong>omness <strong>and</strong> <strong>in</strong>dependence<br />

A uniform Poisson marked <strong>po<strong>in</strong>t</strong> process <strong>in</strong> R 2 with marks <strong>in</strong> M can be def<strong>in</strong>ed <strong>in</strong> the follow<strong>in</strong>g<br />

equivalent ways.<br />

179<br />

• r<strong>and</strong>omly marked Poisson process (Poisson [X], iid [M|X]): a Poisson <strong>po<strong>in</strong>t</strong> process <strong>of</strong><br />

locations X with <strong>in</strong>tensity β is first generated. Then each <strong>po<strong>in</strong>t</strong> xi is labelled with a<br />

r<strong>and</strong>om mark mi, <strong>in</strong>dependently <strong>of</strong> other <strong>po<strong>in</strong>t</strong>s, with distribution P {Mi = m} = pm for<br />

m ∈ M.<br />

• superposition <strong>of</strong> <strong>in</strong>dependent Poisson processes (iid [M], Poisson [X|M]): for each possible<br />

mark m ∈ M, a Poisson process Xm is generated, with <strong>in</strong>tensity βm. The <strong>po<strong>in</strong>t</strong>s <strong>of</strong> Xm<br />

are tagged with the mark m. Then the processes Xm with different marks m ∈ M are<br />

superimposed, to yield a marked <strong>po<strong>in</strong>t</strong> process.<br />

• Poisson marked <strong>po<strong>in</strong>t</strong> process (jo<strong>in</strong>tly Poisson [X,M]): a Poisson process on R 2 × M is<br />

generated, with <strong>in</strong>tensity function λ(u,m) = βm at location u <strong>and</strong> mark m.<br />

These constructions are equivalent when βm = pmβ. See the lovely book by K<strong>in</strong>gman [32].<br />

S<strong>in</strong>ce the established term CSR (‘complete <strong>spatial</strong> r<strong>and</strong>omness’) is used to refer to the uniform<br />

Poisson <strong>po<strong>in</strong>t</strong> process, I propose that the uniform marked Poisson <strong>po<strong>in</strong>t</strong> process should be called<br />

‘complete <strong>spatial</strong> r<strong>and</strong>omness <strong>and</strong> <strong>in</strong>dependence’ (CSRI).<br />

26.1.2 Inhomogeneous Poisson marked <strong>po<strong>in</strong>t</strong> processes<br />

A <strong>in</strong>homogeneous Poisson marked <strong>po<strong>in</strong>t</strong> process Y with ‘jo<strong>in</strong>t’ <strong>in</strong>tensity λ(u,m) for locations u<br />

<strong>and</strong> mark values m is simply def<strong>in</strong>ed as an <strong>in</strong>homogeneous Poisson <strong>po<strong>in</strong>t</strong> process on R 2 × M<br />

with <strong>in</strong>tensity function λ(u,m).<br />

Let’s restrict attention to the case <strong>of</strong> categorical marks, where M is f<strong>in</strong>ite. Then the process<br />

Y has the follow<strong>in</strong>g properties:<br />

• The locations X, obta<strong>in</strong>ed by remov<strong>in</strong>g the marks, constitute an <strong>in</strong>homogeneous Poisson<br />

process <strong>in</strong> R 2 with <strong>in</strong>tensity function<br />

β(u) = <br />

λ(u,m).<br />

m<br />

• Conditional on the locations X, the marks attached to the <strong>po<strong>in</strong>t</strong>s are <strong>in</strong>dependent. For a<br />

<strong>po<strong>in</strong>t</strong> xi the conditional distribution <strong>of</strong> the mark mi is P{Mi = m} = λ(xi,m)/β(xi).<br />

• The sub-process Xm <strong>of</strong> <strong>po<strong>in</strong>t</strong>s with mark m, is an <strong>in</strong>homogeneous Poisson <strong>po<strong>in</strong>t</strong> process<br />

with <strong>in</strong>tensity βm(u) = λ(u,m).<br />

• The sub-processes Xm <strong>of</strong> <strong>po<strong>in</strong>t</strong>s with different marks m are <strong>in</strong>dependent processes.<br />

Copyright c○CSIRO 2008


180 Methods 12: multitype Poisson models<br />

26.2 Simulation<br />

Realisations <strong>of</strong> Poisson marked <strong>po<strong>in</strong>t</strong> processes can be generated us<strong>in</strong>g rmpoispp. The first<br />

argument <strong>of</strong> this comm<strong>and</strong> specifies the <strong>in</strong>tensity or <strong>in</strong>tensity function λ(u,m). It can be a<br />

constant, a vector <strong>of</strong> constants, or an R function.<br />

> par(mfrow = c(1, 2))<br />

> Xunif plot(Xunif, ma<strong>in</strong> = "CSRI, <strong>in</strong>tensity A=100, B=100")<br />

> Xunif plot(Xunif, ma<strong>in</strong> = "CSRI, <strong>in</strong>tensity A=100, B=20")<br />

> par(mfrow = c(1, 1))<br />

CSRI, <strong>in</strong>tensity A=100, B=100 CSRI, <strong>in</strong>tensity A=100, B=20<br />

> X1 lamb X2 par(mfrow = c(1, 2))<br />

> plot(X1, ma<strong>in</strong> = "")<br />

> plot(X2, ma<strong>in</strong> = "")<br />

> par(mfrow = c(1, 1))<br />

Copyright c○CSIRO 2008


26.3 Fitt<strong>in</strong>g Poisson models 181<br />

26.3 Fitt<strong>in</strong>g Poisson models<br />

Poisson marked <strong>po<strong>in</strong>t</strong> process models may be fitted to <strong>po<strong>in</strong>t</strong> pattern data us<strong>in</strong>g ppm. Currently<br />

the methods are only available for multitype <strong>po<strong>in</strong>t</strong> processes (categorical marks).<br />

26.3.1 Probability densities<br />

Let W ⊂ R 2 be the study region, <strong>and</strong> M the (f<strong>in</strong>ite) set <strong>of</strong> possible marks. Then a marked <strong>po<strong>in</strong>t</strong><br />

pattern is a set<br />

y = {(x1,m1),... ,(xn,mn)}, xi ∈ W, mi ∈ M, n ≥ 0<br />

<strong>of</strong> pairs (xi,mi) <strong>of</strong> locations xi with marks mi. It can be viewed as a <strong>po<strong>in</strong>t</strong> pattern <strong>in</strong> the<br />

Cartesian product W × M.<br />

The probability density <strong>of</strong> a marked <strong>po<strong>in</strong>t</strong> process is a function f(y) def<strong>in</strong>ed for all marked<br />

<strong>po<strong>in</strong>t</strong> <strong>patterns</strong> y <strong>in</strong>clud<strong>in</strong>g the empty pattern ∅.<br />

The process with probability density f(y) ≡ 1 is the uniform Poisson marked <strong>po<strong>in</strong>t</strong> process<br />

with <strong>in</strong>tensity 1 for each mark. That is, for this model, the sub-process <strong>of</strong> <strong>po<strong>in</strong>t</strong>s with mark<br />

mi = m is a uniform Poisson process with <strong>in</strong>tensity 1. If the marks are removed, we obta<strong>in</strong> a<br />

Poisson <strong>po<strong>in</strong>t</strong> process with <strong>in</strong>tensity equal to |M|, the number <strong>of</strong> possible types.<br />

The uniform Poisson marked <strong>po<strong>in</strong>t</strong> process with <strong>in</strong>tensity λ(u,m) = βm has probability<br />

density<br />

<br />

<br />

f(y) = exp (1 − βm)|W |<br />

m∈M<br />

n(y)<br />

<br />

i=1<br />

<br />

<br />

<br />

= exp (1 − βm)|W |<br />

m∈M<br />

m∈M<br />

βmi<br />

β nm(y)<br />

m<br />

where nm(y) is the number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>in</strong> y hav<strong>in</strong>g mark value m.<br />

The <strong>in</strong>homogeneous Poisson marked <strong>po<strong>in</strong>t</strong> process with <strong>in</strong>tensity function λ(u,m), at location<br />

u ∈ W <strong>and</strong> mark m ∈ M, has probability density<br />

Copyright c○CSIRO 2008


182 Methods 12: multitype Poisson models<br />

f(y) = exp<br />

26.3.2 Maximum likelihood<br />

<br />

m∈M<br />

<br />

W<br />

(1 − λ(u,m)du<br />

n(y)<br />

<br />

λ(xi,mi). (48)<br />

For the multitype Poisson process with <strong>in</strong>tensity function λ(u,m) at location u ∈ W <strong>and</strong> mark<br />

m ∈ M, the loglikelihood is, up to a constant,<br />

log L =<br />

n<br />

log λ(xi,mi) − <br />

<br />

i=1<br />

m∈M<br />

W<br />

i=1<br />

λ(u,m)du. (49)<br />

where mi is the mark attached to data <strong>po<strong>in</strong>t</strong> xi. This is formally equivalent to the loglikelihood<br />

<strong>of</strong> a Poisson logl<strong>in</strong>ear regression, so the Berman-Turner algorithm can aga<strong>in</strong> be used to maximise<br />

the loglikelihood.<br />

26.3.3 Model-fitt<strong>in</strong>g <strong>in</strong> spatstat<br />

Poisson marked <strong>po<strong>in</strong>t</strong> process models are fitted to data us<strong>in</strong>g ppm.<br />

The trend formula <strong>in</strong> the call to ppm may <strong>in</strong>volve the reserved name marks as a variable.<br />

This refers to the marks <strong>of</strong> the <strong>po<strong>in</strong>t</strong>s. S<strong>in</strong>ce the marks are categorical, marks is treated as a<br />

factor variable for modell<strong>in</strong>g purposes.<br />

To fit the homogeneous multitype Poisson process (CSRI), equation (50), we call<br />

> ppm(X, ~marks)<br />

The formula ~marks <strong>in</strong>dicates that the trend depends only on the marks, <strong>and</strong> not on <strong>spatial</strong><br />

location; s<strong>in</strong>ce marks is a factor, the trend has a separate constant value for each level <strong>of</strong> marks.<br />

This is the model (50).<br />

Note that if we had typed<br />

> ppm(X, ~1)<br />

this would have fitted the special case <strong>of</strong> CSRI where the <strong>in</strong>tensities βm are equal, βm ≡ α say,<br />

for all possible marks. That model is only appropriate if we believe that all mark values are<br />

equally likely.<br />

For the Lans<strong>in</strong>g Woods data, the m<strong>in</strong>imal model that makes sense is (50), so we call<br />

> ppm(lans<strong>in</strong>g, ~marks)<br />

Stationary multitype Poisson process<br />

Possible marks:<br />

blackoak hickory maple misc redoak whiteoak<br />

Trend formula: ~marks<br />

Intensities:<br />

beta_blackoak beta_hickory beta_maple beta_misc beta_redoak<br />

135 703 514 105 346<br />

beta_whiteoak<br />

448<br />

Copyright c○CSIRO 2008


26.3 Fitt<strong>in</strong>g Poisson models 183<br />

S<strong>in</strong>ce lans<strong>in</strong>g is a multitype <strong>po<strong>in</strong>t</strong> pattern (its marks are categorical), the variable marks <strong>in</strong><br />

the formula is a factor. The model has one parameter/coefficient for each level <strong>of</strong> the factor, i.e.<br />

one coefficient for each type <strong>of</strong> <strong>po<strong>in</strong>t</strong>. In other words, this is the homogeneous Poisson marked<br />

<strong>po<strong>in</strong>t</strong> process with <strong>in</strong>tensity βm for <strong>po<strong>in</strong>t</strong>s <strong>of</strong> mark m.<br />

You’ll notice that the parameter estimates βm co<strong>in</strong>cide with those obta<strong>in</strong>ed fromsummary.ppp<br />

above. That is a consequence <strong>of</strong> the fact that the maximum likelihood estimates (obta<strong>in</strong>ed by<br />

ppm) are also the method-<strong>of</strong>-moments estimates (obta<strong>in</strong>ed by summary.ppp).<br />

A more complicated example is<br />

> ppm(lans<strong>in</strong>g, ~marks + x)<br />

Nonstationary multitype Poisson process<br />

Possible marks:<br />

blackoak hickory maple misc redoak whiteoak<br />

Trend formula: ~marks + x<br />

Fitted coefficients for trend formula:<br />

(Intercept) markshickory marksmaple marksmisc marksredoak<br />

4.94294727 1.65008211 1.33694849 -0.25131442 0.94116400<br />

markswhiteoak x<br />

1.19951845 -0.07581624<br />

This is the marked Poisson process whose <strong>in</strong>tensity function λ((x,y,m)) at location (x,y)<br />

<strong>and</strong> mark m satisfies<br />

log λ((x,y,m)) = αm + βx<br />

where α1,... ,α6 <strong>and</strong> β are parameters. The <strong>in</strong>tensity is logl<strong>in</strong>ear <strong>in</strong> x, with a different <strong>in</strong>tercept<br />

for each mark, but the same slope (“parallel logl<strong>in</strong>ear regression”). In the pr<strong>in</strong>tout above, the<br />

fitted slope parameter β is ˆ β =-0.07581624. As discussed <strong>in</strong> Section 13.3 on page 82, the fitted<br />

coefficients αm for the categorical mark are <strong>in</strong>terpreted <strong>in</strong> the light <strong>of</strong> the ‘contrasts’ <strong>in</strong> force.<br />

The default is the treatment contrasts, <strong>and</strong> the first level <strong>of</strong> the mark is blackoak, so <strong>in</strong> this<br />

case the fitted coefficient for m=blackoak is 4.942947, while the fitted coefficient for m=hickory<br />

is 4.942947 + 1.650082 = 6.593029 <strong>and</strong> so on.<br />

> ppm(lans<strong>in</strong>g, ~marks * x)<br />

Nonstationary multitype Poisson process<br />

Possible marks:<br />

blackoak hickory maple misc redoak whiteoak<br />

Trend formula: ~marks * x<br />

Fitted coefficients for trend formula:<br />

(Intercept) markshickory marksmaple marksmisc marksredoak<br />

5.2378062 1.4424915 0.6795604 -0.8482907 0.6916392<br />

markswhiteoak x markshickory:x marksmaple:x marksmisc:x<br />

1.0901772 -0.7063987 0.4511157 1.3243326 1.2138278<br />

marksredoak:x markswhiteoak:x<br />

0.5380413 0.2421379<br />

Copyright c○CSIRO 2008


184 Methods 12: multitype Poisson models<br />

The symbol * here is an ‘<strong>in</strong>teraction’ <strong>in</strong> the usual sense for l<strong>in</strong>ear models. The fitted model<br />

is the marked Poisson process with<br />

log λ((x,y,m)) = αm + βmx<br />

where α1,... ,α6 <strong>and</strong> β1,...,β6 are parameters. The <strong>in</strong>tensity is logl<strong>in</strong>ear <strong>in</strong> x with a different<br />

slope <strong>and</strong> <strong>in</strong>tercept for each mark.<br />

The result <strong>of</strong> ppm is aga<strong>in</strong> an object <strong>of</strong> class "ppm" represent<strong>in</strong>g a fitted <strong>po<strong>in</strong>t</strong> process model.<br />

To plot the fitted <strong>in</strong>tensity <strong>and</strong> conditional <strong>in</strong>tensity <strong>of</strong> the fitted model, use plot.ppm. For a<br />

multitype <strong>po<strong>in</strong>t</strong> process you will get a separate plot for each possible mark value.<br />

More complicated examples are:<br />

> ppm(lans<strong>in</strong>g, ~marks * polynom(x, y, 2))<br />

> ppm(lans<strong>in</strong>g, ~marks * harmonic(x, y, 2))<br />

Copyright c○CSIRO 2008


27 Methods 13: Gibbs models for multitype <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

Gibbs <strong>po<strong>in</strong>t</strong> process models (section 20) are also available for marked <strong>po<strong>in</strong>t</strong> processes, <strong>and</strong> can<br />

be fitted to data us<strong>in</strong>g ppm. Currently the methods are only implemented for multitype <strong>po<strong>in</strong>t</strong><br />

processes (categorical marks), so we restrict attention to this case.<br />

27.1 Gibbs models<br />

Much <strong>of</strong> the theory <strong>of</strong> Gibbs models described <strong>in</strong> Section 20 carries over immediately to multitype<br />

<strong>po<strong>in</strong>t</strong> processes.<br />

27.1.1 Conditional <strong>in</strong>tensity<br />

The conditional <strong>in</strong>tensity λ(u,X) <strong>of</strong> an (unmarked) <strong>po<strong>in</strong>t</strong> process X at a location u was def<strong>in</strong>ed<br />

<strong>in</strong> section 20.5. Roughly speak<strong>in</strong>g λ(u,x)du is the conditional probability <strong>of</strong> f<strong>in</strong>d<strong>in</strong>g a <strong>po<strong>in</strong>t</strong><br />

near u, given that the rest <strong>of</strong> the <strong>po<strong>in</strong>t</strong> process X co<strong>in</strong>cides with x.<br />

For a marked <strong>po<strong>in</strong>t</strong> process Y the conditional <strong>in</strong>tensity is a function λ((u,m),Y) giv<strong>in</strong>g a<br />

value at a location u for each possible mark m. For a f<strong>in</strong>ite set <strong>of</strong> marks M, we can <strong>in</strong>terpret<br />

λ((u,m),y)du as the conditional probability f<strong>in</strong>d<strong>in</strong>g a <strong>po<strong>in</strong>t</strong> with mark m near u, given the rest<br />

<strong>of</strong> the marked <strong>po<strong>in</strong>t</strong> process.<br />

The conditional <strong>in</strong>tensity is related to the probability density f(y) by<br />

λ((u,m),y) =<br />

f(y ∪ {u})<br />

f(y)<br />

for (u,m) ∈ y.<br />

For Poisson processes, the conditional <strong>in</strong>tensity λ((u,m),y) co<strong>in</strong>cides with the <strong>in</strong>tensity<br />

function λ(u,m) <strong>and</strong> does not depend on the configuration y. For example, the homogeneous<br />

Poisson multitype <strong>po<strong>in</strong>t</strong> process or “CSRI” (Section 26.1.1) has conditional <strong>in</strong>tensity<br />

λ((u,m),y) = βm<br />

where βm ≥ 0 are constants which can be <strong>in</strong>terpreted <strong>in</strong> several equivalent ways (section 20.5).<br />

The sub-process consist<strong>in</strong>g <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>of</strong> type m only is Poisson with <strong>in</strong>tensity βm. The process<br />

obta<strong>in</strong>ed by ignor<strong>in</strong>g the types, <strong>and</strong> comb<strong>in</strong><strong>in</strong>g all the <strong>po<strong>in</strong>t</strong>s, is Poisson with <strong>in</strong>tensity β =<br />

<br />

m βm. The marks attached to the <strong>po<strong>in</strong>t</strong>s are i.i.d. with distribution pm = βm/β.<br />

27.1.2 Pairwise <strong>in</strong>teractions<br />

A multitype pairwise <strong>in</strong>teraction process is a Gibbs process with probability density <strong>of</strong> the form<br />

⎡<br />

n(y) <br />

f(y) = α⎣<br />

bmi (xi)<br />

⎤ ⎡<br />

⎦ ⎣ <br />

cmi,mj (xi,xj)<br />

⎤<br />

⎦ (51)<br />

i=1<br />

where bm(u),m ∈ M are functions determ<strong>in</strong><strong>in</strong>g the ‘first order trend’ for <strong>po<strong>in</strong>t</strong>s <strong>of</strong> each type,<br />

<strong>and</strong> cm,m ′(u,v),m,m′ ∈ M are functions determ<strong>in</strong><strong>in</strong>g the <strong>in</strong>teraction between a pair <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>of</strong><br />

given types m <strong>and</strong> m ′ . The <strong>in</strong>teraction functions must be symmetric, cm,m ′(u,v) = cm,m ′(v,u)<br />

<strong>and</strong> cm,m ′ ≡ cm ′ ,m. The conditional <strong>in</strong>tensity is<br />

i


186 Methods 13: Gibbs models for multitype <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

27.1.3 Pairwise <strong>in</strong>teractions not depend<strong>in</strong>g on marks<br />

The simplest examples <strong>of</strong> multitype pairwise <strong>in</strong>teraction processes are those <strong>in</strong> which the <strong>in</strong>teraction<br />

term cm,m ′(u,v) does not depend on the marks m,m′ . For example, we can take any <strong>of</strong><br />

the <strong>in</strong>teraction functions c(u,v) described <strong>in</strong> section 20.3 <strong>and</strong> use it to construct a marked <strong>po<strong>in</strong>t</strong><br />

process.<br />

Such processes can be constructed equivalently as follows [8]:<br />

• an unmarked Gibbs process is generated with first order term b(u) = <br />

m∈M bm(u) <strong>and</strong><br />

pairwise <strong>in</strong>teraction c(u,v).<br />

• each <strong>po<strong>in</strong>t</strong> xi <strong>of</strong> this unmarked process is labelled with a mark mi with probability distribution<br />

P{mi = m} = bi(xi)/b(xi) <strong>in</strong>dependent <strong>of</strong> other <strong>po<strong>in</strong>t</strong>s.<br />

If additionally the <strong>in</strong>tensity functions are constant, bm(u) ≡ βm, then such a <strong>po<strong>in</strong>t</strong> process<br />

has the r<strong>and</strong>om labell<strong>in</strong>g property.<br />

27.1.4 Mark-dependent pairwise <strong>in</strong>teractions<br />

Various complex k<strong>in</strong>ds <strong>of</strong> behaviour can be created by postulat<strong>in</strong>g a pairwise <strong>in</strong>teraction that<br />

does depend on the marks.<br />

A simple example is the multitype hard core process <strong>in</strong> which βm(u) ≡ β <strong>and</strong><br />

cm,m ′(u,v) =<br />

1 if ||u − v|| > rm,m ′<br />

0 if ||u − v|| ≤ rm,m ′<br />

where rm,m ′ = rm ′ ,m > 0 is the hard core distance for type m with type m ′ . In this process, two<br />

<strong>po<strong>in</strong>t</strong>s <strong>of</strong> type m <strong>and</strong> m ′ respectively can never come closer than the distance rm,m ′.<br />

By sett<strong>in</strong>g rm,m ′ = 0 for a particular pair <strong>of</strong> marks m,m′ we effectively remove the <strong>in</strong>teraction<br />

term between <strong>po<strong>in</strong>t</strong>s <strong>of</strong> these types. If there are only two types, say M = {1,2},<br />

then sett<strong>in</strong>g r1,2 = 0 implies that the sub-processes X1 <strong>and</strong> X2, consist<strong>in</strong>g <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>of</strong> types<br />

1 <strong>and</strong> 2 respectively, are <strong>in</strong>dependent <strong>po<strong>in</strong>t</strong> processes. In other words the process satisfies the<br />

<strong>in</strong>dependence-<strong>of</strong>-components property.<br />

The multitype Strauss process has pairwise <strong>in</strong>teraction term<br />

cm,m ′(u,v) =<br />

1 if ||u − v|| > rm,m ′<br />

γm,m ′ if ||u − v|| ≤ rm,m ′<br />

where rm,m ′ > 0 are <strong>in</strong>teraction radii as above, <strong>and</strong> γm,m ′ ≥ 0 are <strong>in</strong>teraction parameters.<br />

In contrast to the unmarked Strauss process, which is well-def<strong>in</strong>ed only when its <strong>in</strong>teraction<br />

parameter γ is between 0 <strong>and</strong> 1, the multitype Strauss process allows some <strong>of</strong> the <strong>in</strong>teraction<br />

parameters γm,m ′ to exceed 1 for m = m′ , provided one <strong>of</strong> the relevant types has a hard core<br />

(γm,m = 0 or γm ′ ,m ′ = 0).<br />

If there are only two types, say M = {1,2}, then sett<strong>in</strong>g γ1,2 = 1 implies that the subprocesses<br />

X1 <strong>and</strong> X2, consist<strong>in</strong>g <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>of</strong> types 1 <strong>and</strong> 2 respectively, are <strong>in</strong>dependent Strauss<br />

processes.<br />

The multitype Strauss-hard core process has pairwise <strong>in</strong>teraction term<br />

cm,m ′(u,v) =<br />

⎧<br />

⎨<br />

⎩<br />

0 if ||u − v|| < hm,m ′<br />

γm,m ′ if hm,m ′ ≤ ||u − v|| ≤ rm,m ′<br />

1 if ||u − v|| > rm,m ′<br />

where rm,m ′ > 0 are <strong>in</strong>teraction distances <strong>and</strong> γm,m ′ ≥ 0 are <strong>in</strong>teraction parameters as above,<br />

<strong>and</strong> hm,m ′ are hard core distances satisfy<strong>in</strong>g hm,m ′ = hm ′ ,m <strong>and</strong> 0 < hm,m ′ < rm,m ′.<br />

Copyright c○CSIRO 2008<br />

(53)<br />

(54)<br />

(55)


27.2 Pseudolikelihood for multitype Gibbs processes 187<br />

27.2 Pseudolikelihood for multitype Gibbs processes<br />

Models can be fitted by maximum pseudolikelihood. For a multitype Gibbs <strong>po<strong>in</strong>t</strong> process with<br />

conditional <strong>in</strong>tensity λ((u,m);y), the log pseudolikelihood is<br />

n(y) <br />

log PL = log λ((xi,mi);y) − <br />

<br />

i=1<br />

m∈M<br />

W<br />

λ((u,m);y)du. (56)<br />

The pseudolikelihood can be maximised us<strong>in</strong>g an extension <strong>of</strong> the Berman-Turner device [3].<br />

27.3 Fitt<strong>in</strong>g Gibbs models to multitype data<br />

Marked <strong>po<strong>in</strong>t</strong> process models may be fitted to <strong>po<strong>in</strong>t</strong> pattern data us<strong>in</strong>g ppm. Currently the<br />

methods are only available for multitype <strong>po<strong>in</strong>t</strong> processes (categorical marks).<br />

27.3.1 Interactions not depend<strong>in</strong>g on marks<br />

The model-fitt<strong>in</strong>g function ppm expects an argument <strong>in</strong>teraction that specifies the <strong>in</strong>ter<strong>po<strong>in</strong>t</strong><br />

<strong>in</strong>teraction structure <strong>of</strong> the <strong>po<strong>in</strong>t</strong> process. The default is ‘no <strong>in</strong>teraction’, correspond<strong>in</strong>g to a<br />

Poisson process.<br />

On page 141 there is a list <strong>of</strong> <strong>in</strong>ter<strong>po<strong>in</strong>t</strong> <strong>in</strong>teractions for modell<strong>in</strong>g unmarked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>.<br />

These <strong>in</strong>teractions can also be used, without modification, to fit models to multitype <strong>po<strong>in</strong>t</strong><br />

<strong>patterns</strong>.<br />

For example<br />

> ppm(lans<strong>in</strong>g, ~marks, Strauss(0.07))<br />

fits a multitype version <strong>of</strong> the Strauss process (section 20.3.2) <strong>in</strong> which the conditional <strong>in</strong>tensity<br />

is<br />

λ((u,m),y) = βmγ t(u,y) . (57)<br />

Here βm are constants which account for the unequal abundance <strong>of</strong> the different species <strong>of</strong> tree.<br />

The other quantities are the same as <strong>in</strong> (42). The <strong>in</strong>teraction between two trees is assumed to be<br />

the same for all species, <strong>and</strong> is controlled by the <strong>in</strong>teraction parameter γ <strong>and</strong> <strong>in</strong>teraction radius<br />

r = 0.07. For example, this <strong>in</strong>cludes the case γ = 0 where no two trees (whatever species they<br />

belong to) come closer than 0.07 units apart, a ‘multitype hard core process’.<br />

27.3.2 Interactions depend<strong>in</strong>g on marks<br />

There are two additional <strong>in</strong>ter<strong>po<strong>in</strong>t</strong> <strong>in</strong>teractions def<strong>in</strong>ed <strong>in</strong> spatstat for multitype <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>:<br />

MultiStrauss the multitype Strauss process<br />

MultiStraussHard multitype hybrid hard core / Strauss process<br />

In these models, the <strong>in</strong>teraction between two <strong>po<strong>in</strong>t</strong>s depends on the types <strong>of</strong> the <strong>po<strong>in</strong>t</strong>s as<br />

well as their separation.<br />

In the multitype Strauss process (54), for each pair <strong>of</strong> types i <strong>and</strong> j there is an <strong>in</strong>teraction<br />

radius rij <strong>and</strong> <strong>in</strong>teraction parameter γij. In simple terms, each pair <strong>of</strong> <strong>po<strong>in</strong>t</strong>s, with marks i <strong>and</strong> j<br />

say, contributes an <strong>in</strong>teraction term γi,j if the distance between them is less than the <strong>in</strong>teraction<br />

distance ri,j. These parameters must satisfy rij = rji <strong>and</strong> γij = γji. The conditional <strong>in</strong>tensity is<br />

<br />

λ((u,i),y) = βi<br />

j<br />

γ ti,j(u,y)<br />

i,j<br />

(58)<br />

Copyright c○CSIRO 2008


188 Methods 13: Gibbs models for multitype <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

where ti,j(u,y) is the number <strong>of</strong> <strong>po<strong>in</strong>t</strong>s <strong>in</strong> y, with mark equal to j, ly<strong>in</strong>g with<strong>in</strong> a distance ri,j<br />

<strong>of</strong> the location u.<br />

To fit the stationary multitype Strauss process to the dataset betacells, we must specify<br />

the matrix <strong>of</strong> <strong>in</strong>teraction radii rij:<br />

> data(betacells)<br />

> r ppm(betacells, ~1, MultiStrauss(c("<strong>of</strong>f", "on"), r), rbord = 60)<br />

Stationary Multitype Strauss process<br />

Possible marks:<br />

<strong>of</strong>f on<br />

First order terms:<br />

beta_<strong>of</strong>f beta_on<br />

0.0001373652 0.0001373652<br />

Interaction: Pairwise <strong>in</strong>teraction family<br />

Interaction: Multitype Strauss process<br />

2 types <strong>of</strong> <strong>po<strong>in</strong>t</strong>s<br />

Possible types:<br />

[1] "<strong>of</strong>f" "on"<br />

Interaction radii:<br />

<strong>of</strong>f on<br />

<strong>of</strong>f 30 60<br />

on 60 30<br />

Fitted <strong>in</strong>teraction parameters gamma_ij:<br />

<strong>of</strong>f on<br />

<strong>of</strong>f 0.0000 0.8303<br />

on 0.8303 0.0000<br />

Relevant coefficients:<br />

mark<strong>of</strong>fx<strong>of</strong>f mark<strong>of</strong>fxon markonxon<br />

-17.2378706 -0.1860184 -17.2138383<br />

To fit a nonstationary multitype Strauss process with log-cubic polynomial trend:<br />

> ppm(betacells, ~polynom(x, y, 3), MultiStrauss(c("<strong>of</strong>f", "on"),<br />

+ r), rbord = 60)<br />

For more detailed explanation <strong>and</strong> examples <strong>of</strong> modell<strong>in</strong>g <strong>and</strong> the <strong>in</strong>terpretation <strong>of</strong> model<br />

formulae for <strong>po<strong>in</strong>t</strong> processes, see [5].<br />

Copyright c○CSIRO 2008


27.3 Fitt<strong>in</strong>g Gibbs models to multitype data 189<br />

27.3.3 Plott<strong>in</strong>g the fitted <strong>in</strong>teraction<br />

The fitted pairwise <strong>in</strong>teraction <strong>in</strong> a <strong>po<strong>in</strong>t</strong> process model can be plotted us<strong>in</strong>g fit<strong>in</strong>. The value<br />

returned by fit<strong>in</strong> is a function array (class "fasp").<br />

> model plot(fit<strong>in</strong>(model))<br />

<strong>of</strong>f<br />

on<br />

Pairwise <strong>in</strong>teraction<br />

Pairwise <strong>in</strong>teraction<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

0 20 40 60<br />

Distance<br />

0 20 40 60<br />

Fitted pairwise <strong>in</strong>teractions<br />

<strong>of</strong>f on<br />

Distance<br />

Pairwise <strong>in</strong>teraction<br />

Pairwise <strong>in</strong>teraction<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

0 20 40 60<br />

Distance<br />

0 20 40 60<br />

Distance<br />

Copyright c○CSIRO 2008


190 L<strong>in</strong>e segment data<br />

28 L<strong>in</strong>e segment data<br />

spatstat also has some facilities for h<strong>and</strong>l<strong>in</strong>g <strong>spatial</strong> <strong>patterns</strong> <strong>of</strong> l<strong>in</strong>e segments.<br />

For example, the copper dataset <strong>in</strong> spatstat conta<strong>in</strong>s a dataset copper$L<strong>in</strong>es that records<br />

the locations <strong>of</strong> geological faults <strong>in</strong> a survey region.<br />

> data(copper)<br />

> L L plot(L)<br />

L<br />

A <strong>spatial</strong> pattern <strong>of</strong> l<strong>in</strong>e segments is represented by an object <strong>of</strong> class "psp". It consists <strong>of</strong><br />

a list <strong>of</strong> l<strong>in</strong>e segments (given by the coord<strong>in</strong>ates <strong>of</strong> their two end<strong>po<strong>in</strong>t</strong>s), <strong>and</strong> a w<strong>in</strong>dow <strong>in</strong> which<br />

the l<strong>in</strong>e segments were observed. The l<strong>in</strong>e segments may also carry marks.<br />

Objects <strong>of</strong> class "psp" can be created by the function psp or obta<strong>in</strong>ed by convert<strong>in</strong>g other<br />

data us<strong>in</strong>g the function as.psp.<br />

Capabilities available for this class <strong>in</strong>clude:<br />

[.psp subset operator (also performs clipp<strong>in</strong>g)<br />

marks.psp extract marks<br />

end<strong>po<strong>in</strong>t</strong>s.psp extract mid<strong>po<strong>in</strong>t</strong>s <strong>of</strong> l<strong>in</strong>e segments<br />

mid<strong>po<strong>in</strong>t</strong>s.psp compute mid<strong>po<strong>in</strong>t</strong>s <strong>of</strong> l<strong>in</strong>e segments<br />

lengths.psp compute lengths <strong>of</strong> l<strong>in</strong>e segments<br />

angles.psp compute angles <strong>of</strong> orientation for l<strong>in</strong>e segments<br />

rotate.psp rotate a l<strong>in</strong>e segment pattern<br />

shift.psp shift a l<strong>in</strong>e segment pattern<br />

aff<strong>in</strong>e.psp apply aff<strong>in</strong>e transformation<br />

pairdist.psp distances between l<strong>in</strong>e segments<br />

crossdist.psp distances between l<strong>in</strong>e segments<br />

nndist.psp closest distances between l<strong>in</strong>e segments<br />

density.psp kernel-smoothed <strong>in</strong>tensity image<br />

cross<strong>in</strong>g.psp f<strong>in</strong>d <strong>in</strong>tersection <strong>po<strong>in</strong>t</strong>s between l<strong>in</strong>e segments<br />

selfcross<strong>in</strong>g.psp f<strong>in</strong>d <strong>in</strong>tersection <strong>po<strong>in</strong>t</strong>s between l<strong>in</strong>e segments<br />

unitname.psp determ<strong>in</strong>e units <strong>of</strong> length<br />

rescale.psp change units <strong>of</strong> length<br />

rshift.psp apply r<strong>and</strong>om shift to each l<strong>in</strong>e segment<br />

Copyright c○CSIRO 2008


There are also the usual methods<br />

plot.psp plot a l<strong>in</strong>e segment pattern<br />

pr<strong>in</strong>t.psp pr<strong>in</strong>t <strong>in</strong>formation on a l<strong>in</strong>e segment pattern<br />

summary.psp compute summary <strong>of</strong> a l<strong>in</strong>e segment pattern<br />

> summary(L)<br />

146 l<strong>in</strong>e segments<br />

Lengths:<br />

M<strong>in</strong>. 1st Qu. Median Mean 3rd Qu. Max.<br />

0.09242 6.61400 12.18000 15.02000 19.95000 65.48000<br />

Total length: 2192.57251480451 km<br />

Length per unit area: 0.196937548404655<br />

Angles (radians):<br />

M<strong>in</strong>. 1st Qu. Median Mean 3rd Qu. Max.<br />

0.008107 0.549500 1.747000 1.378000 2.113000 2.912000<br />

W<strong>in</strong>dow: polygonal boundary<br />

s<strong>in</strong>gle connected closed polygon with 4 vertices<br />

enclos<strong>in</strong>g rectangle: [-158.23, -0.19] x [-0.335, 70.11] km<br />

W<strong>in</strong>dow area = 11133.3 square km<br />

Unit <strong>of</strong> length: 1 km<br />

> plot(distmap(L))<br />

> plot(L, add = TRUE)<br />

0 20 40 60<br />

distmap(L)<br />

−150 −100 −50 0<br />

0 2 4 6 8 10 12 14<br />

191<br />

Copyright c○CSIRO 2008


192 Further <strong>in</strong>formation on spatstat<br />

29 Further <strong>in</strong>formation on spatstat<br />

Help files<br />

For <strong>in</strong>formation on a particular comm<strong>and</strong> <strong>in</strong> spatstat, consult the onl<strong>in</strong>e help file by typ<strong>in</strong>g<br />

help(comm<strong>and</strong>). The help files are detailed <strong>and</strong> extensive. The complete manual is over 500<br />

pages.<br />

For examples <strong>of</strong> the use <strong>of</strong> a particular comm<strong>and</strong>, read the examples section <strong>in</strong> the help file,<br />

or type example(comm<strong>and</strong>) to see the examples executed.<br />

Quick reference<br />

Typehelp(spatstat) for a quick-reference overview <strong>of</strong> all the functions available <strong>in</strong> the package.<br />

For a demonstration <strong>of</strong> many <strong>of</strong> the capabilities <strong>of</strong> spatstat, type demo(spatstat).<br />

For a visual display <strong>of</strong> all the datasets supplied <strong>in</strong> spatstat, type demo(data).<br />

Website<br />

The websitewww.spatstat.orgconta<strong>in</strong>s <strong>in</strong>formation on recent updates to the package, frequentlyasked<br />

questions, bug fixes, literature <strong>and</strong> other developments.<br />

Modell<strong>in</strong>g<br />

For examples on fitt<strong>in</strong>g <strong>po<strong>in</strong>t</strong> process models, see [5].<br />

Citation<br />

If you use spatstat <strong>in</strong> a research publication, it would be much appreciated if you could cite<br />

the paper [4], or mention spatstat <strong>in</strong> the acknowledgements.<br />

In do<strong>in</strong>g so, you will help us to justify the expenditure <strong>of</strong> time <strong>and</strong> effort on ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g <strong>and</strong><br />

develop<strong>in</strong>g the package.<br />

Citation details are also available <strong>in</strong> the package by typ<strong>in</strong>gcitation(package="spatstat").<br />

Queries <strong>and</strong> requests<br />

If you have difficulty <strong>in</strong> gett<strong>in</strong>g the package to do what you want, or if you have a suggestion for<br />

additional features that could be added, please contact the package authors,adrian@maths.uwa.edu.au<br />

<strong>and</strong>r.turner@auckl<strong>and</strong>.ac.nz, or email the R special <strong>in</strong>terest group <strong>in</strong> <strong>spatial</strong> <strong>and</strong> geographical<br />

statistics, r-sig-geo@stat.math.ethz.ch.<br />

Copyright c○CSIRO 2008


REFERENCES 193<br />

References<br />

[1] A. Baddeley, J. Møller, <strong>and</strong> A.G. Pakes. Properties <strong>of</strong> residuals for <strong>spatial</strong> <strong>po<strong>in</strong>t</strong> processes.<br />

Annals <strong>of</strong> the Institute <strong>of</strong> <strong>Statistical</strong> Mathematics, 60:627–649, 2008.<br />

[2] A. Baddeley, J. Møller, <strong>and</strong> R. Waagepetersen. Non- <strong>and</strong> semiparametric estimation <strong>of</strong> <strong>in</strong>teraction<br />

<strong>in</strong> <strong>in</strong>homogeneous <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>. Statistica Neerl<strong>and</strong>ica, 54(3):329–350, November<br />

2000.<br />

[3] A. Baddeley <strong>and</strong> R. Turner. Practical maximum pseudolikelihood for <strong>spatial</strong> <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

(with discussion). Australian <strong>and</strong> New Zeal<strong>and</strong> Journal <strong>of</strong> Statistics, 42(3):283–322, 2000.<br />

[4] A. Baddeley <strong>and</strong> R. Turner. Spatstat: an R package for analyz<strong>in</strong>g <strong>spatial</strong> <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>.<br />

Journal <strong>of</strong> <strong>Statistical</strong> S<strong>of</strong>tware, 12(6):1–42, 2005. URL: www.jstats<strong>of</strong>t.org, ISSN: 1548-<br />

7660.<br />

[5] A. Baddeley <strong>and</strong> R. Turner. Modell<strong>in</strong>g <strong>spatial</strong> <strong>po<strong>in</strong>t</strong> <strong>patterns</strong> <strong>in</strong> R. In A. Baddeley, P. Gregori,<br />

J. Mateu, R. Stoica, <strong>and</strong> D. Stoyan, editors, Case Studies <strong>in</strong> Spatial Po<strong>in</strong>t Pattern<br />

Modell<strong>in</strong>g, number 185 <strong>in</strong> Lecture Notes <strong>in</strong> Statistics, pages 23–74. Spr<strong>in</strong>ger-Verlag, New<br />

York, 2006. ISBN: 0-387-28311-0.<br />

[6] A. Baddeley, R. Turner, J. Møller, <strong>and</strong> M. Hazelton. Residual analysis for <strong>spatial</strong> <strong>po<strong>in</strong>t</strong><br />

processes (with discussion). Journal <strong>of</strong> the Royal <strong>Statistical</strong> Society, series B, 67(5):617–666,<br />

2005.<br />

[7] A.J. Baddeley. Spatial sampl<strong>in</strong>g <strong>and</strong> censor<strong>in</strong>g. In O.E. Barndorff-Nielsen, W.S. Kendall,<br />

<strong>and</strong> M.N.M. van Lieshout, editors, Stochastic Geometry: Likelihood <strong>and</strong> Computation, chapter<br />

2, pages 37–78. Chapman <strong>and</strong> Hall, London, 1998.<br />

[8] A.J. Baddeley <strong>and</strong> J. Møller. Nearest-neighbour Markov <strong>po<strong>in</strong>t</strong> processes <strong>and</strong> r<strong>and</strong>om sets.<br />

International <strong>Statistical</strong> Review, 57:89–121, 1989.<br />

[9] A.J. Baddeley, R.A. Moyeed, C.V. Howard, <strong>and</strong> A. Boyde. Analysis <strong>of</strong> a three-dimensional<br />

<strong>po<strong>in</strong>t</strong> pattern with replication. Applied Statistics, 42(4):641–668, 1993.<br />

[10] A.J. Baddeley <strong>and</strong> B.W. Silverman. A cautionary example on the use <strong>of</strong> second-order<br />

methods for analyz<strong>in</strong>g <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>. Biometrics, 40:1089–1094, 1984.<br />

[11] A.J. Baddeley <strong>and</strong> M.N.M. van Lieshout. Area-<strong>in</strong>teraction <strong>po<strong>in</strong>t</strong> processes. Annals <strong>of</strong> the<br />

Institute <strong>of</strong> <strong>Statistical</strong> Mathematics, 47:601–619, 1995.<br />

[12] G. Barnard. Contribution to discussion <strong>of</strong> “The spectral analysis <strong>of</strong> <strong>po<strong>in</strong>t</strong> processes” by<br />

M.S. Bartlett. Journal <strong>of</strong> the Royal <strong>Statistical</strong> Society, series B, 25:294, 1963.<br />

[13] M. Bell <strong>and</strong> G. Grunwald. Mixed models for the analysis <strong>of</strong> replicated <strong>spatial</strong> <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>.<br />

Biostatistics, 5:633–648, 2004.<br />

[14] M. Berman <strong>and</strong> T.R. Turner. Approximat<strong>in</strong>g <strong>po<strong>in</strong>t</strong> process likelihoods with GLIM. Applied<br />

Statistics, 41:31–38, 1992.<br />

[15] J. Besag <strong>and</strong> P.J. Diggle. Simple Monte Carlo tests for <strong>spatial</strong> pattern. Applied Statistics,<br />

26:327–333, 1977.<br />

[16] J.E. Besag <strong>and</strong> P. Clifford. Generalized Monte Carlo significance tests. Biometrika, 76:633–<br />

642, 1989.<br />

Copyright c○CSIRO 2008


194 REFERENCES<br />

[17] R. Biv<strong>and</strong>, E.J. Pebesma, <strong>and</strong> V. Gómez-Rubio. Applied <strong>spatial</strong> data analysis with R.<br />

Spr<strong>in</strong>ger, 2008.<br />

[18] D.R. Brill<strong>in</strong>ger. Comparative aspects <strong>of</strong> the study <strong>of</strong> ord<strong>in</strong>ary time series <strong>and</strong> <strong>of</strong> <strong>po<strong>in</strong>t</strong><br />

processes. In P.R. Krishnaiah, editor, Developments <strong>in</strong> Statistics, pages 33–133. Academic<br />

Press, 1978.<br />

[19] N.A.C. Cressie. Statistics for Spatial Data. John Wiley <strong>and</strong> Sons, New York, 1991.<br />

[20] D.J. Daley <strong>and</strong> D. Vere-Jones. An Introduction to the Theory <strong>of</strong> Po<strong>in</strong>t Processes. Spr<strong>in</strong>ger<br />

Verlag, New York, 1988.<br />

[21] P.J. Diggle. <strong>Statistical</strong> analysis <strong>of</strong> <strong>spatial</strong> <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>. Academic Press, London, 1983.<br />

[22] P.J. Diggle. A <strong>po<strong>in</strong>t</strong> process modell<strong>in</strong>g approach to raised <strong>in</strong>cidence <strong>of</strong> a rare phenomenon<br />

<strong>in</strong> the vic<strong>in</strong>ity <strong>of</strong> a prespecified <strong>po<strong>in</strong>t</strong>. Journal <strong>of</strong> the Royal <strong>Statistical</strong> Society, series A,<br />

153:349–362, 1990.<br />

[23] P.J. Diggle. <strong>Statistical</strong> Analysis <strong>of</strong> Spatial Po<strong>in</strong>t Patterns. Arnold, second edition, 2003.<br />

[24] P.J. Diggle, N. Lange, <strong>and</strong> F. M. Benes. Analysis <strong>of</strong> variance for replicated <strong>spatial</strong> <strong>po<strong>in</strong>t</strong><br />

<strong>patterns</strong> <strong>in</strong> cl<strong>in</strong>ical neuroanatomy. Journal <strong>of</strong> the American <strong>Statistical</strong> Association, 86:618–<br />

625, 1991.<br />

[25] P.J. Diggle, J. Mateu, <strong>and</strong> H.E. Clough. A comparison between parametric <strong>and</strong> nonparametric<br />

approaches to the analysis <strong>of</strong> replicated <strong>spatial</strong> <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>. Advances <strong>in</strong><br />

Applied Probability (SGSA), 32:331–343, 2000.<br />

[26] P.J. Diggle <strong>and</strong> B. Rowl<strong>in</strong>gson. A conditional approach to <strong>po<strong>in</strong>t</strong> process modell<strong>in</strong>g <strong>of</strong><br />

elevated risk. Journal <strong>of</strong> the Royal <strong>Statistical</strong> Society, series A (Statistics <strong>in</strong> Society),<br />

157(3):433–440, 1994.<br />

[27] M. Dwass. Modified r<strong>and</strong>omization tests for nonparametric hypotheses. Annals <strong>of</strong> Mathematical<br />

Statistics, 28:181–187, 1957.<br />

[28] A.C.A. Hope. A simplified Monte Carlo significance test procedure. Journal <strong>of</strong> the Royal<br />

<strong>Statistical</strong> Society, series B, 30:582–598, 1968.<br />

[29] C.V. Howard, S. Reid, A.J. Baddeley, <strong>and</strong> A. Boyde. Unbiased estimation <strong>of</strong> particle density<br />

<strong>in</strong> the t<strong>and</strong>em-scann<strong>in</strong>g reflected light microscope. Journal <strong>of</strong> Microscopy, 138:203–212,<br />

1985.<br />

[30] F. Huang <strong>and</strong> Y. Ogata. Improvements <strong>of</strong> the maximum pseudo-likelihood estimators<br />

<strong>in</strong> various <strong>spatial</strong> statistical models. Journal <strong>of</strong> Computational <strong>and</strong> Graphical Statistics,<br />

8(3):510–530, 1999.<br />

[31] J. Illian, A. Pentt<strong>in</strong>en, H. Stoyan, <strong>and</strong> D. Stoyan. <strong>Statistical</strong> analysis <strong>and</strong> modell<strong>in</strong>g <strong>of</strong><br />

<strong>spatial</strong> <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>. Wiley, 2008.<br />

[32] J.F.C. K<strong>in</strong>gman. Poisson Processes. Oxford University Press, 1993.<br />

[33] G.M. Laslett. Censor<strong>in</strong>g <strong>and</strong> edge effects <strong>in</strong> areal <strong>and</strong> l<strong>in</strong>e transect sampl<strong>in</strong>g <strong>of</strong> rock jo<strong>in</strong>t<br />

traces. Mathematical Geology, 14:125–140, 1982.<br />

Copyright c○CSIRO 2008


REFERENCES 195<br />

[34] P.A.W. Lewis. Recent results <strong>in</strong> the statistical analysis <strong>of</strong> univariate <strong>po<strong>in</strong>t</strong> processes. In<br />

P.A.W. Lewis, editor, Stochastic <strong>po<strong>in</strong>t</strong> processes, pages 1–54. Wiley, New York, 1972.<br />

[35] J.K. L<strong>in</strong>dsey. The analysis <strong>of</strong> stochastic processes us<strong>in</strong>g GLIM. Spr<strong>in</strong>ger, Berl<strong>in</strong>, 1992.<br />

[36] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, <strong>and</strong> E. Teller. Equation <strong>of</strong><br />

state calculations by fast comput<strong>in</strong>g mach<strong>in</strong>es. Journal <strong>of</strong> Chemical Physics, 21:1087–1092,<br />

1953.<br />

[37] J. Møller <strong>and</strong> R.P. Waagepetersen. <strong>Statistical</strong> Inference <strong>and</strong> Simulation for Spatial Po<strong>in</strong>t<br />

Processes. Chapman <strong>and</strong> Hall/CRC, Boca Raton, 2003.<br />

[38] J. Møller <strong>and</strong> R.P. Waagepetersen. Modern statistics for <strong>spatial</strong> <strong>po<strong>in</strong>t</strong> processes. Research<br />

Report R-2006-12, <strong>Department</strong> <strong>of</strong> Mathematical Sciences, Aalborg University, April 2006.<br />

Submitted for publication.<br />

[39] Y. Ogata. <strong>Statistical</strong> models for earthquake occurrences <strong>and</strong> residual analysis for <strong>po<strong>in</strong>t</strong><br />

processes. Journal <strong>of</strong> the American <strong>Statistical</strong> Association, 83:9–27, 1988.<br />

[40] B.D. Ripley. Modell<strong>in</strong>g <strong>spatial</strong> <strong>patterns</strong> (with discussion). Journal <strong>of</strong> the Royal <strong>Statistical</strong><br />

Society, series B, 39:172–212, 1977.<br />

[41] B.D. Ripley. Simulat<strong>in</strong>g <strong>spatial</strong> <strong>patterns</strong>: dependent samples from a multivariate density.<br />

Applied Statistics, 28:109–112, 1979.<br />

[42] B.D. Ripley. Spatial Statistics. John Wiley <strong>and</strong> Sons, New York, 1981.<br />

[43] B.D. Ripley. <strong>Statistical</strong> Inference for Spatial Processes. Cambridge University Press, 1988.<br />

[44] A. Särkkä. Pseudo-likelihood approach for pair potential estimation <strong>of</strong> Gibbs processes.<br />

Number 22 <strong>in</strong> Jyväskylä Studies <strong>in</strong> Computer Science, Economics <strong>and</strong> Statistics. University<br />

<strong>of</strong> Jyväskylä, 1993.<br />

[45] D. Stoyan <strong>and</strong> P. Grabarnik. Second-order characteristics for stochastic structures connected<br />

with Gibbs <strong>po<strong>in</strong>t</strong> processes. Mathematische Nachrichten, 151:95–100, 1991.<br />

[46] D. Stoyan <strong>and</strong> H. Stoyan. Fractals, R<strong>and</strong>om Shapes <strong>and</strong> Po<strong>in</strong>t Fields. John Wiley <strong>and</strong><br />

Sons, Chichester, 1995.<br />

[47] M.N.M. van Lieshout. Markov Po<strong>in</strong>t Processes <strong>and</strong> their Applications. Imperial College<br />

Press, 2000.<br />

[48] M.N.M. van Lieshout <strong>and</strong> A.J. Baddeley. A nonparametric measure <strong>of</strong> <strong>spatial</strong> <strong>in</strong>teraction<br />

<strong>in</strong> <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>. Statistica Neerl<strong>and</strong>ica, 50:344–361, 1996.<br />

[49] R. Waagepetersen. An estimat<strong>in</strong>g function approach to <strong>in</strong>ference for <strong>in</strong>homogeneous<br />

Neyman-Scott processes. Submitted for publication, 2006.<br />

Copyright c○CSIRO 2008


Index<br />

analysis <strong>of</strong> deviance, 88<br />

area-<strong>in</strong>teraction process, 135<br />

b<strong>in</strong>ary mask, 28, 41<br />

circular w<strong>in</strong>dows, 39<br />

classes, 27<br />

<strong>in</strong> R, 27<br />

<strong>in</strong> spatstat, 27<br />

clickppp, 25<br />

cluster models<br />

fitt<strong>in</strong>g, 125, 130<br />

<strong>in</strong>homogeneous, 130<br />

fitt<strong>in</strong>g, 130<br />

complete <strong>spatial</strong> r<strong>and</strong>omness, 72<br />

<strong>and</strong> <strong>in</strong>dependence, 157, 179<br />

def<strong>in</strong>ition, 72<br />

Kolmogorov-Smirnov test, 75<br />

quadrat count<strong>in</strong>g test, 74<br />

conditional <strong>in</strong>tensity, 136<br />

for marked <strong>po<strong>in</strong>t</strong> processes, 185<br />

contrasts, 82, 183<br />

covariate effects, 9<br />

covariates, 7, 16, 82<br />

<strong>in</strong> ppm, 82<br />

Cox process, 99<br />

CSRI, 157, 179<br />

conditional <strong>in</strong>tensity, 185<br />

fitt<strong>in</strong>g to data, 182<br />

simulat<strong>in</strong>g, 180<br />

data entry, 33<br />

at the term<strong>in</strong>al, 34<br />

basic, 33, 34<br />

check<strong>in</strong>g, 36<br />

from file, 34<br />

GIS formats, 42<br />

marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>, 160<br />

marks, 35<br />

<strong>po<strong>in</strong>t</strong>-<strong>and</strong>-click, 25<br />

datasets<br />

<strong>in</strong>spect<strong>in</strong>g, 20<br />

provided <strong>in</strong> spatstat, 25<br />

dispatch<strong>in</strong>g, 27<br />

distance methods, 102<br />

distances<br />

empty space, 102, 103<br />

196<br />

nearest neighbour, 102, 109<br />

pairwise, 102, 111<br />

distmap, 102<br />

edge effects, 104<br />

empty space distances, 102, 103<br />

empty space function, 104<br />

envelopes, 119<br />

<strong>and</strong> Monte Carlo tests, 119<br />

for any fitted model, 123<br />

for any simulation procedure, 124<br />

<strong>in</strong> spatstat, 120<br />

<strong>of</strong> summary functions, 119<br />

exploratory data analysis, 21<br />

for marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>, 165<br />

fitted model, 142<br />

goodness-<strong>of</strong>-fit, 90, 148<br />

<strong>in</strong>terpretation <strong>of</strong> coefficients, 82<br />

methods for, 84<br />

residuals, 91, 150<br />

simulation <strong>of</strong>, 89<br />

fitt<strong>in</strong>g models<br />

by Huang-Ogata method, 147<br />

kppm, 125, 130<br />

maximum pseudolikelihood, 139<br />

to marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>, 182, 187<br />

via summary statistics, 125<br />

fv, 32<br />

geometrical transformations, 50<br />

Gibbs models, 132<br />

area-<strong>in</strong>teraction, 135<br />

Diggle-Gates-Stibbard, 135<br />

Diggle-Gratton, 135<br />

fitt<strong>in</strong>g, 139<br />

by Huang-Ogata method, 147<br />

maximum pseudolikelihood, 139<br />

ppm, 139<br />

fitt<strong>in</strong>g to marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>, 187<br />

goodness-<strong>of</strong>-fit, 148<br />

hard core process, 133<br />

<strong>in</strong> spatstat, 141<br />

<strong>in</strong>f<strong>in</strong>ite order <strong>in</strong>teraction, 135<br />

multitype, 185<br />

maximum pseudolikelihood, 187<br />

multitype pairwise <strong>in</strong>teraction, 185


INDEX 197<br />

pairwise <strong>in</strong>teraction, 135<br />

residuals, 150<br />

simulation, 137<br />

simulation <strong>of</strong> fitted model, 144<br />

s<strong>of</strong>t core, 135<br />

Strauss process, 134<br />

Strauss-hard core, 135<br />

GIS formats, 42<br />

goodness-<strong>of</strong>-fit, 90<br />

for fitted Gibbs model, 148<br />

for Poisson models, 90<br />

hard core process, 133<br />

multitype, 186<br />

Huang-Ogata method, 147<br />

im, 27, 54<br />

images, 54<br />

comput<strong>in</strong>g with, 59<br />

creat<strong>in</strong>g, 54<br />

from raw data, 54<br />

exploratory <strong>in</strong>spection <strong>of</strong>, 58<br />

extract<strong>in</strong>g subset, 58<br />

plott<strong>in</strong>g, 56<br />

returned by a function, 55<br />

<strong>in</strong>dependence <strong>of</strong> components, 157, 176<br />

<strong>in</strong>tensity<br />

function, 67<br />

kernel estimator, 67<br />

homogeneous, 66<br />

<strong>in</strong>homogeneous, 67<br />

<strong>in</strong>vestigation <strong>of</strong>, 66<br />

measure, 67<br />

<strong>of</strong> marked <strong>po<strong>in</strong>t</strong> process, 165<br />

<strong>in</strong>teraction, 8, 11<br />

distance methods, 102<br />

<strong>in</strong> spatstat, 141<br />

multitype, 185, 187<br />

<strong>in</strong> spatstat, 187<br />

plott<strong>in</strong>g a fitted <strong>in</strong>teraction, 189<br />

Q–Q plot, 153<br />

simple models, 98<br />

summary functions, 102<br />

K function, 22, 111<br />

for multitype <strong>po<strong>in</strong>t</strong> pattern, 169<br />

<strong>in</strong>homogeneous, 128<br />

kernel estimator <strong>of</strong> <strong>in</strong>tensity, 67, 68<br />

kernel smooth<strong>in</strong>g <strong>of</strong> marks, 167<br />

Kolmogorov-Smirnov test<br />

<strong>of</strong> CSR, 75<br />

<strong>of</strong> <strong>in</strong>homogeneous Poisson, 91<br />

kppm, 125, 130<br />

l<strong>in</strong>e segments, 190<br />

lurk<strong>in</strong>g variable plot, 93<br />

maptools package, 42<br />

mark correlation function, 174<br />

marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

cutt<strong>in</strong>g marks <strong>in</strong>to b<strong>and</strong>s, 163<br />

data entry, 160<br />

exploratory data analysis, 165<br />

explor<strong>in</strong>g marks, 167<br />

<strong>in</strong>spect<strong>in</strong>g, 161<br />

jo<strong>in</strong>t <strong>and</strong> conditional analysis, 157<br />

manipulat<strong>in</strong>g, 163<br />

methodological issues, 157<br />

model-fitt<strong>in</strong>g, 182, 187<br />

probabilistic formulation, 156<br />

r<strong>and</strong>omisation tests, 157<br />

separat<strong>in</strong>g <strong>in</strong>to types, 163<br />

summary functions, 169<br />

marked <strong>po<strong>in</strong>t</strong> process<br />

<strong>in</strong>tensity, 165<br />

marks, 6, 15, 156<br />

categorical, 35<br />

data entry, 33, 35<br />

exploratory data analysis, 167<br />

manipulat<strong>in</strong>g, 163<br />

operations on, 49<br />

smooth<strong>in</strong>g, 167<br />

<strong>spatial</strong> trend <strong>in</strong>, 167<br />

versus covariates, 15<br />

markstat, 169<br />

marktable, 168<br />

Matern cluster process, 98<br />

maximum likelihood, 79<br />

maximum pseudolikelihood, 139, 187<br />

for multitype Gibbs models, 187<br />

improvements over, 147<br />

methods, 27<br />

default method, 29<br />

dispatch, 27<br />

m<strong>in</strong>imum contrast, 125<br />

model validation, 90, 148<br />

Monte Carlo test, 119<br />

<strong>po<strong>in</strong>t</strong>wise, 120<br />

simultaneous, 121<br />

Copyright c○CSIRO 2008


198 INDEX<br />

multitype hard core process, 186<br />

multitype <strong>po<strong>in</strong>t</strong> pattern, 10, 11, 22, 35<br />

multitype <strong>po<strong>in</strong>t</strong> <strong>patterns</strong><br />

separat<strong>in</strong>g <strong>in</strong>to types, 163<br />

summary functions, 169<br />

multitype Strauss process, 186<br />

nearest neighbour distances, 102, 109<br />

nndist, 102<br />

nuisance parameters, 145<br />

ow<strong>in</strong>, 27, 39<br />

pairdist, 102<br />

pairwise distances, 102, 111<br />

pairwise <strong>in</strong>teraction process, 133<br />

<strong>po<strong>in</strong>t</strong> pattern, 6<br />

marked, 156<br />

marks, 6, 15<br />

multitype, 10, 11<br />

needs w<strong>in</strong>dow, 47<br />

<strong>po<strong>in</strong>t</strong> process model for, 13<br />

st<strong>and</strong>ard model, 14<br />

<strong>po<strong>in</strong>t</strong> process, 13<br />

<strong>po<strong>in</strong>t</strong> process models<br />

area-<strong>in</strong>teraction, 135<br />

Diggle-Gates-Stibbard, 135<br />

Diggle-Gratton, 135<br />

Gibbs, 132<br />

hard core, 133<br />

<strong>in</strong>f<strong>in</strong>ite order <strong>in</strong>teraction, 135<br />

pairwise <strong>in</strong>teraction, 133, 135<br />

s<strong>of</strong>t core, 135<br />

Strauss, 134<br />

Strauss-hard core, 135<br />

Poisson cluster processes, 98<br />

Poisson models<br />

fitt<strong>in</strong>g, 80<br />

goodness-<strong>of</strong>-fit, 90<br />

homogeneous, 72<br />

<strong>in</strong>homogeneous, 79<br />

log-likelihood, 80<br />

marked, 179<br />

maximum likelihood, 79<br />

residuals, 91<br />

Poisson <strong>po<strong>in</strong>t</strong> process<br />

homogeneous<br />

def<strong>in</strong>ition, 72<br />

simulation, 72<br />

<strong>in</strong>homogeneous<br />

Copyright c○CSIRO 2008<br />

def<strong>in</strong>ition, 79<br />

fitt<strong>in</strong>g, 80<br />

likelihood, 80<br />

motivation, 79<br />

simulation, 79<br />

Poisson-derived models, 98<br />

polygonal w<strong>in</strong>dows, 28, 40<br />

ppm, 84, 142<br />

marked Gibbs <strong>po<strong>in</strong>t</strong> process models, 187<br />

marked Poisson <strong>po<strong>in</strong>t</strong> process models,<br />

182<br />

methods for, 84<br />

ppp, 27<br />

comb<strong>in</strong><strong>in</strong>g several, 53<br />

extract<strong>in</strong>g subset, 48<br />

format, 46<br />

geometrical transformations, 50<br />

<strong>in</strong> arbitrary w<strong>in</strong>dow, 44<br />

manipulat<strong>in</strong>g, 46<br />

needs w<strong>in</strong>dow, 47<br />

operations on, 47<br />

r<strong>and</strong>om perturbations, 50<br />

ways to make, 37<br />

probability density, 132<br />

pr<strong>of</strong>ile pseudolikelihood, 145<br />

pseudolikelihood, 139<br />

pr<strong>of</strong>ile pseudolikelihood, 145<br />

quadrat count<strong>in</strong>g, 21, 67<br />

quadrat count<strong>in</strong>g test<br />

<strong>of</strong> CSR, 74<br />

quadrat test<br />

<strong>of</strong> <strong>in</strong>homogeneous Poisson, 90<br />

R, 17<br />

contributed packages, 18<br />

for <strong>spatial</strong> data formats, 42<br />

for <strong>spatial</strong> statistics, 18<br />

where to get, 17<br />

r<strong>and</strong>om labell<strong>in</strong>g, 157, 177<br />

r<strong>and</strong>om perturbations, 50<br />

r<strong>and</strong>om th<strong>in</strong>n<strong>in</strong>g, 79<br />

r<strong>and</strong>omisation tests, 157, 175<br />

for marked <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>, 175<br />

rectangular w<strong>in</strong>dows, 28, 39<br />

residuals, 91, 150<br />

for fitted Gibbs model, 150<br />

for Poisson models, 91<br />

lurk<strong>in</strong>g variable plot, 93


INDEX 199<br />

Q–Q plot, 151<br />

smoothed residual field, 93<br />

return value, 30<br />

rpoispp, 72, 79<br />

runif<strong>po<strong>in</strong>t</strong>, 73<br />

sequential models, 100<br />

shapefiles, 42<br />

shapefiles package, 42<br />

simulation<br />

<strong>of</strong> fitted Gibbs model, 144<br />

<strong>of</strong> fitted Poisson model, 89<br />

smoothed residual field, 93<br />

sp package, 42<br />

spatstat, 19, 192<br />

cit<strong>in</strong>g, 19<br />

gett<strong>in</strong>g started, 19<br />

<strong>in</strong>stall<strong>in</strong>g, 19<br />

split, 24<br />

st<strong>and</strong>ard model, 14<br />

Strauss process, 134<br />

fitt<strong>in</strong>g to data, 140<br />

multitype, 186<br />

summary functions, 102<br />

<strong>and</strong> Monte Carlo tests, 119<br />

critique, 117<br />

edge effects, 104<br />

envelopes, 119<br />

F, 104<br />

for multitype <strong>po<strong>in</strong>t</strong> <strong>patterns</strong>, 169<br />

G, 109<br />

<strong>in</strong>ference us<strong>in</strong>g, 119<br />

<strong>in</strong>homogeneous K, 128<br />

J, 114<br />

K, 111<br />

L, 112<br />

mark correlation, 174<br />

model-fitt<strong>in</strong>g with, 125<br />

pair correlation, 112<br />

tests<br />

χ 2 quadrat count<strong>in</strong>g, 74<br />

Kolmogorov-Smirnov, 75, 91<br />

Monte Carlo, 119<br />

th<strong>in</strong>n<strong>in</strong>g, 99<br />

Thomas process, 98<br />

tips, 27, 31, 36, 48, 103, 106, 121, 160<br />

treatment contrasts, 82<br />

unitname, 37<br />

units <strong>of</strong> length, 37<br />

validation, 90, 148<br />

w<strong>in</strong>dows, 39<br />

b<strong>in</strong>ary mask, 28, 41<br />

circular, 39<br />

GIS formats, 42<br />

needed <strong>in</strong> any <strong>po<strong>in</strong>t</strong> pattern, 47<br />

operations on, 44<br />

polygonal, 28, 40<br />

rectangular, 28, 39<br />

returned by functions, 43<br />

χ 2 quadrat count<strong>in</strong>g test, 74<br />

Copyright c○CSIRO 2008

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!