1 Analysis of Point (Event) Data II Describing the Spatial ... - Capita

Analysis of Point (Event) Data 

II 

Frequency 

1 

0.9 

0.8 

0.7 

0.6 

0.5 

CE/ENVE 424 

G 

E 

F 

0.4 

0.3 

0.2 

0.1 

0 

0 2 4 6 8 10 

Distance 

Describing the Spatial Pattern of Events 

We can describe the spatial pattern of a 

events (point) dataset using: 

• Summary statistics 

• Density based analysis 

• Simple (quadrant count) 

• Kernels and kernel functions 

• Distance based analysis 

• Nearest neighbor 

• Distance functions 

• G (event-to-event) 

• F (point-to-event) 

• K (multiple radius distances)

Relating Intensity Patterns 

The intensity patterns derived using the methods previously discussed provide 

a meaningful analysis endpoint, particularly when comparing patterns among 

data sets 

However, we’re frequently interested in more explicit methods for describing the 

a point spatial pattern 

The most common approach is to test against complete spatial 

randomness (CSR): 

• Is the event pattern significantly more clustered than what 

would be expected from a random distribution? 

• Is the event pattern more uniformly spaced than would be 

expected from a random distribution? 

Complete Spatial Randomness 

A process is considered random if its intensity (average 

number of events per unit area) is constant over the region 

of interest. 

1) the chance of a given x,y point existing is equal to the 

chance any other point existing (uniform probability 

distribution) 

2) the existence of a x,y point is independent of the 

existence of any other point 

These two conditions constitute an independent random 

process (IRP) or complete spatial randomness (CSR)

Complete Spatial Randomness 

CSR is a baseline hypothesis (null hypothesis) against which is assessed 

whether an observed pattern is evenly spaced, clustered, or random. 

In testing for CSR, we define a model for CSR. We could simulate a the 

pattern for number of events over a region of interest using the model. We 

can then compare the spatial distribution of the modeled patterns with our 

observed pattern. 

The standard model to use in testing spatial point patterns follow a Poisson 

distribution. 

The probability of observing k events in one unit area in our region is 

approximated by: 

6 

k 

λ 

λ 

Mean = Variance 

5 

− 

P( 

k) 

= 

Quadrant Count 

e 

k! 

n 

λ = e ≈ 2. 

718 

a 

So the ratio of 

mean/variance can 

be used to 

determine if the 

pattern is random 

Frequency 

4 

3 

2 

1 

0 

1 2 3 4 5 6 7 

Bin 

Recall from the last lecture, a quadrant count is conducted by superimposing a 

regular grid over data, counting the number of events in each grid cell and divide 

the count by its cell area to get intensity. 

47 grid cells 

Mean cell count 

47 

µ = = 1. 

175 

40 

variance 

mean 

= 

Variance: 

1 ∑= 

n 

2 

s = µ 

n i 1 

85. 

775 

= = 

40 

2. 

1444 

1. 

175 

= 

1. 

825 

( ) 2 

k − 

2. 

1444 

A s 2 to µ ratio greater than 1 indicates 

clustering

Nearest Neighbor 

7 

The expected value of mean nearest neighbor is: E( 

d ) 

1 

9 

8 

20 

4 

3 

5 

2 

6 

11 

10 

12 

event 

nearest 

neighbor dmin 

1 3 10 

2 5 2 

3 5 1 

4 3 1.5 

5 3 1 

6 5 1.5 

7 8 1 

8 7 1 

9 7 2 

10 11 1 

11 10 1 

12 10 1.5 

G Function 

20 

The G Function provides a 

cumulative frequency 

distribution of the nearest 

neighbor event-event distances 

7 

1 

9 

8 

. [ d ( s ) d] 

no min i < 

G( d ) = 

n 

4 

3 

5 

2 

6 

11 

10 

12 

Frequency 

1 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

d 

n 

∑ 

i= 

= 1 

min 

d 

24 . 5 

= = 

12 

min 

n 

( s ) 

2. 

04 

i 

1 

= 

2 λ 

Calculating the ratio of our observed mean 

nearest neighbor distance and the expected 

distance provides a measure of clustering 

E 

0 

0 2 4 6 8 10 

Distance 

dmin 

R = 

1/ 

2 

R = 

[ G( 

d ) ] 

0. 

5 

λ 

2. 

04 

= 0. 

71 

12 /( 20X 

20) 

A ratio less than 1 

indicates clustering 

CSR Expected G(d): 

= 1− 

e 

−λπd 

Distance G(d) Count G(d) Freq. E(G) Freq E(G) Count 

0 0 0.000 0.000 0.000 

1 6 0.500 0.090 1.079 

2 5 0.917 0.314 2.690 

3 0 0.917 0.572 3.093 

4 0 0.917 0.779 2.482 

5 0 0.917 0.905 1.519 

6 0 0.917 0.966 0.734 

7 0 0.917 0.990 0.285 

8 0 0.917 0.998 0.090 

9 0 0.917 1.000 0.023 

10 1 1.000 1.000 0.005 

2

F Function 

The F Function provides a 

cumulative frequency 

distribution of the nearest 

neighbor point-event distances 

7 

1 

9 

8 

. [ dmin 

( p , S ) d ] 

no 

i < 

F( d ) = 

m 

K Function 

n 

λ = 

a 

4 

3 

5 

2 

6 

11 

10 

12 

Frequency 

CSR Expected F(d): 

1 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

E 

[ F( 

d ) ] 

G E 

= 1− 

e 

F 

0 

0 2 4 6 8 10 

Distance 

−λπd 

The K function uses all distances between events and provides a measure 

of spatial dependence over a wider range of scales. 

K( 

d) 

n 

∑ 

i= 

= 1 

no 

. [ S ∈C( 

s , d ) ] 

nλ 

i 

[ K( 

d ) ] 

E = 

λπ 2 

d 

λ 

2 

= πd 

( d ) 

d 

K 

L( d ) = − 

π 

2

L Function 

7 

1 

9 

8 

4 

3 

5 

2 

6 

11 

10 

12 

Distance K(d) L(d) E(K(d)) E(L(d)) 

0 0.000 0.000 0.000 0 

1 16.667 1.303 3.142 0 

2 30.556 1.119 12.566 0 

3 30.556 0.119 28.274 0 

4 30.556 -0.881 50.265 0 

5 30.556 -1.881 78.540 0 

6 30.556 -2.881 113.097 0 

7 30.556 -3.881 153.938 0 

8 30.556 -4.881 201.062 0 

9 30.556 -5.881 254.469 0 

10 33.333 -6.743 314.159 0 

Multi Variant Analysis 

An L(d) of 0 is expected 

An L(d) above the “zero line” indicates there are 

more events at that separation distance than 

expected 

An L(d) below the “zero line” indicates there are 

fewer events at that separation distance than 

expected 

Sometimes you have multiple data sets and wish to know if the 

spatial patterns among them are similar. 

Cross functions are variants on the previously discussed 

distance-based analysis methods. 

2 

1 

0 

0 2 4 6 8 10 

-1 

For G(d), the distance of interest is between events in one dataset 

and events in another dataset 

For K(d), counts the number of events one dataset based on 

distances from events in another dataset 

L(d) 

-2 

-3 

-4 

-5 

-6 

-7 

Distance

Incorporating Temporal Dimension 

Intensity is defined as the number of events per unit area in unit time 

Are events clustered in space and time? 

Distance is a statistical distance in units of physical distance X time 

distance. 

One method for determining whether there is space-time dependence 

is to calculate 

1) a K(d,t) for space*time distance 

2) a K(d) and K(t) separately 

3) K(d) * K(t) 

4) K(d,t) – K(d)*K(t) 

Examples 

Phytopathology Vol. 92, No. 4, 2002 361-377 

Background 

Florida has been substantially impacted by Asiatic citrus canker, a disease 

that can cause defoliation, dieback and fruit drop 

Objectives 

Does removing healthy citrus trees within a 38 m radius of infected trees 

curtail further spread of the disease 

Methods Used 

K function for testing complete spatial randomness (CSR)

American Journal of Tropical Medicine and Hygiene Vol. 58, No 3, 1998 287-298 

Background 

Dengue fever, a viral disease transmitted by mosquitoes, can spread 

rapidly and is without a vaccine. Cases tend to be clustered because it is assumed 

that female mosquitoes rarely travel further than 50-100 m in their lifetime. 

Objectives 

What is the spatial-temporal pattern of the diesease 

Methods Used 

K function for testing complete spatial randomness (CSR)

Projects and Articles 

Begin searching for articles to get ideas 

Also try to understand the availability of data 

This weekend, the class website will be updated with data sources, links 

to online journals, references to other journals 

If you come across a good resource, please let me know and I’ll add it to 

the website

1 Analysis of Point (Event) Data II Describing the Spatial ... - Capita

Create successful ePaper yourself

Delete template?

Save as template?