25.07.2013 Views

1 Analysis of Point (Event) Data II Describing the Spatial ... - Capita

1 Analysis of Point (Event) Data II Describing the Spatial ... - Capita

1 Analysis of Point (Event) Data II Describing the Spatial ... - Capita

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Analysis</strong> <strong>of</strong> <strong>Point</strong> (<strong>Event</strong>) <strong>Data</strong><br />

<strong>II</strong><br />

Frequency<br />

1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

CE/ENVE 424<br />

G<br />

E<br />

F<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

0 2 4 6 8 10<br />

Distance<br />

<strong>Describing</strong> <strong>the</strong> <strong>Spatial</strong> Pattern <strong>of</strong> <strong>Event</strong>s<br />

We can describe <strong>the</strong> spatial pattern <strong>of</strong> a<br />

events (point) dataset using:<br />

• Summary statistics<br />

• Density based analysis<br />

• Simple (quadrant count)<br />

• Kernels and kernel functions<br />

• Distance based analysis<br />

• Nearest neighbor<br />

• Distance functions<br />

• G (event-to-event)<br />

• F (point-to-event)<br />

• K (multiple radius distances)


Relating Intensity Patterns<br />

The intensity patterns derived using <strong>the</strong> methods previously discussed provide<br />

a meaningful analysis endpoint, particularly when comparing patterns among<br />

data sets<br />

However, we’re frequently interested in more explicit methods for describing <strong>the</strong><br />

a point spatial pattern<br />

The most common approach is to test against complete spatial<br />

randomness (CSR):<br />

• Is <strong>the</strong> event pattern significantly more clustered than what<br />

would be expected from a random distribution?<br />

• Is <strong>the</strong> event pattern more uniformly spaced than would be<br />

expected from a random distribution?<br />

Complete <strong>Spatial</strong> Randomness<br />

A process is considered random if its intensity (average<br />

number <strong>of</strong> events per unit area) is constant over <strong>the</strong> region<br />

<strong>of</strong> interest.<br />

1) <strong>the</strong> chance <strong>of</strong> a given x,y point existing is equal to <strong>the</strong><br />

chance any o<strong>the</strong>r point existing (uniform probability<br />

distribution)<br />

2) <strong>the</strong> existence <strong>of</strong> a x,y point is independent <strong>of</strong> <strong>the</strong><br />

existence <strong>of</strong> any o<strong>the</strong>r point<br />

These two conditions constitute an independent random<br />

process (IRP) or complete spatial randomness (CSR)


Complete <strong>Spatial</strong> Randomness<br />

CSR is a baseline hypo<strong>the</strong>sis (null hypo<strong>the</strong>sis) against which is assessed<br />

whe<strong>the</strong>r an observed pattern is evenly spaced, clustered, or random.<br />

In testing for CSR, we define a model for CSR. We could simulate a <strong>the</strong><br />

pattern for number <strong>of</strong> events over a region <strong>of</strong> interest using <strong>the</strong> model. We<br />

can <strong>the</strong>n compare <strong>the</strong> spatial distribution <strong>of</strong> <strong>the</strong> modeled patterns with our<br />

observed pattern.<br />

The standard model to use in testing spatial point patterns follow a Poisson<br />

distribution.<br />

The probability <strong>of</strong> observing k events in one unit area in our region is<br />

approximated by:<br />

6<br />

k<br />

λ<br />

λ<br />

Mean = Variance<br />

5<br />

−<br />

P(<br />

k)<br />

=<br />

Quadrant Count<br />

e<br />

k!<br />

n<br />

λ = e ≈ 2.<br />

718<br />

a<br />

So <strong>the</strong> ratio <strong>of</strong><br />

mean/variance can<br />

be used to<br />

determine if <strong>the</strong><br />

pattern is random<br />

Frequency<br />

4<br />

3<br />

2<br />

1<br />

0<br />

1 2 3 4 5 6 7<br />

Bin<br />

Recall from <strong>the</strong> last lecture, a quadrant count is conducted by superimposing a<br />

regular grid over data, counting <strong>the</strong> number <strong>of</strong> events in each grid cell and divide<br />

<strong>the</strong> count by its cell area to get intensity.<br />

47 grid cells<br />

Mean cell count<br />

47<br />

µ = = 1.<br />

175<br />

40<br />

variance<br />

mean<br />

=<br />

Variance:<br />

1 ∑=<br />

n<br />

2<br />

s = µ<br />

n i 1<br />

85.<br />

775<br />

= =<br />

40<br />

2.<br />

1444<br />

1.<br />

175<br />

=<br />

1.<br />

825<br />

( ) 2<br />

k −<br />

2.<br />

1444<br />

A s 2 to µ ratio greater than 1 indicates<br />

clustering


Nearest Neighbor<br />

7<br />

The expected value <strong>of</strong> mean nearest neighbor is: E(<br />

d )<br />

1<br />

9<br />

8<br />

20<br />

4<br />

3<br />

5<br />

2<br />

6<br />

11<br />

10<br />

12<br />

event<br />

nearest<br />

neighbor dmin<br />

1 3 10<br />

2 5 2<br />

3 5 1<br />

4 3 1.5<br />

5 3 1<br />

6 5 1.5<br />

7 8 1<br />

8 7 1<br />

9 7 2<br />

10 11 1<br />

11 10 1<br />

12 10 1.5<br />

G Function<br />

20<br />

The G Function provides a<br />

cumulative frequency<br />

distribution <strong>of</strong> <strong>the</strong> nearest<br />

neighbor event-event distances<br />

7<br />

1<br />

9<br />

8<br />

. [ d ( s ) d]<br />

no min i <<br />

G( d ) =<br />

n<br />

4<br />

3<br />

5<br />

2<br />

6<br />

11<br />

10<br />

12<br />

Frequency<br />

1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

d<br />

n<br />

∑<br />

i=<br />

= 1<br />

min<br />

d<br />

24 . 5<br />

= =<br />

12<br />

min<br />

n<br />

( s )<br />

2.<br />

04<br />

i<br />

1<br />

=<br />

2 λ<br />

Calculating <strong>the</strong> ratio <strong>of</strong> our observed mean<br />

nearest neighbor distance and <strong>the</strong> expected<br />

distance provides a measure <strong>of</strong> clustering<br />

E<br />

0<br />

0 2 4 6 8 10<br />

Distance<br />

dmin<br />

R =<br />

1/<br />

2<br />

R =<br />

[ G(<br />

d ) ]<br />

0.<br />

5<br />

λ<br />

2.<br />

04<br />

= 0.<br />

71<br />

12 /( 20X<br />

20)<br />

A ratio less than 1<br />

indicates clustering<br />

CSR Expected G(d):<br />

= 1−<br />

e<br />

−λπd<br />

Distance G(d) Count G(d) Freq. E(G) Freq E(G) Count<br />

0 0 0.000 0.000 0.000<br />

1 6 0.500 0.090 1.079<br />

2 5 0.917 0.314 2.690<br />

3 0 0.917 0.572 3.093<br />

4 0 0.917 0.779 2.482<br />

5 0 0.917 0.905 1.519<br />

6 0 0.917 0.966 0.734<br />

7 0 0.917 0.990 0.285<br />

8 0 0.917 0.998 0.090<br />

9 0 0.917 1.000 0.023<br />

10 1 1.000 1.000 0.005<br />

2


F Function<br />

The F Function provides a<br />

cumulative frequency<br />

distribution <strong>of</strong> <strong>the</strong> nearest<br />

neighbor point-event distances<br />

7<br />

1<br />

9<br />

8<br />

. [ dmin<br />

( p , S ) d ]<br />

no<br />

i <<br />

F( d ) =<br />

m<br />

K Function<br />

n<br />

λ =<br />

a<br />

4<br />

3<br />

5<br />

2<br />

6<br />

11<br />

10<br />

12<br />

Frequency<br />

CSR Expected F(d):<br />

1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

E<br />

[ F(<br />

d ) ]<br />

G E<br />

= 1−<br />

e<br />

F<br />

0<br />

0 2 4 6 8 10<br />

Distance<br />

−λπd<br />

The K function uses all distances between events and provides a measure<br />

<strong>of</strong> spatial dependence over a wider range <strong>of</strong> scales.<br />

K(<br />

d)<br />

n<br />

∑<br />

i=<br />

= 1<br />

no<br />

. [ S ∈C(<br />

s , d ) ]<br />

nλ<br />

i<br />

[ K(<br />

d ) ]<br />

E =<br />

λπ 2<br />

d<br />

λ<br />

2<br />

= πd<br />

( d )<br />

d<br />

K<br />

L( d ) = −<br />

π<br />

2


L Function<br />

7<br />

1<br />

9<br />

8<br />

4<br />

3<br />

5<br />

2<br />

6<br />

11<br />

10<br />

12<br />

Distance K(d) L(d) E(K(d)) E(L(d))<br />

0 0.000 0.000 0.000 0<br />

1 16.667 1.303 3.142 0<br />

2 30.556 1.119 12.566 0<br />

3 30.556 0.119 28.274 0<br />

4 30.556 -0.881 50.265 0<br />

5 30.556 -1.881 78.540 0<br />

6 30.556 -2.881 113.097 0<br />

7 30.556 -3.881 153.938 0<br />

8 30.556 -4.881 201.062 0<br />

9 30.556 -5.881 254.469 0<br />

10 33.333 -6.743 314.159 0<br />

Multi Variant <strong>Analysis</strong><br />

An L(d) <strong>of</strong> 0 is expected<br />

An L(d) above <strong>the</strong> “zero line” indicates <strong>the</strong>re are<br />

more events at that separation distance than<br />

expected<br />

An L(d) below <strong>the</strong> “zero line” indicates <strong>the</strong>re are<br />

fewer events at that separation distance than<br />

expected<br />

Sometimes you have multiple data sets and wish to know if <strong>the</strong><br />

spatial patterns among <strong>the</strong>m are similar.<br />

Cross functions are variants on <strong>the</strong> previously discussed<br />

distance-based analysis methods.<br />

2<br />

1<br />

0<br />

0 2 4 6 8 10<br />

-1<br />

For G(d), <strong>the</strong> distance <strong>of</strong> interest is between events in one dataset<br />

and events in ano<strong>the</strong>r dataset<br />

For K(d), counts <strong>the</strong> number <strong>of</strong> events one dataset based on<br />

distances from events in ano<strong>the</strong>r dataset<br />

L(d)<br />

-2<br />

-3<br />

-4<br />

-5<br />

-6<br />

-7<br />

Distance


Incorporating Temporal Dimension<br />

Intensity is defined as <strong>the</strong> number <strong>of</strong> events per unit area in unit time<br />

Are events clustered in space and time?<br />

Distance is a statistical distance in units <strong>of</strong> physical distance X time<br />

distance.<br />

One method for determining whe<strong>the</strong>r <strong>the</strong>re is space-time dependence<br />

is to calculate<br />

1) a K(d,t) for space*time distance<br />

2) a K(d) and K(t) separately<br />

3) K(d) * K(t)<br />

4) K(d,t) – K(d)*K(t)<br />

Examples<br />

Phytopathology Vol. 92, No. 4, 2002 361-377<br />

Background<br />

Florida has been substantially impacted by Asiatic citrus canker, a disease<br />

that can cause defoliation, dieback and fruit drop<br />

Objectives<br />

Does removing healthy citrus trees within a 38 m radius <strong>of</strong> infected trees<br />

curtail fur<strong>the</strong>r spread <strong>of</strong> <strong>the</strong> disease<br />

Methods Used<br />

K function for testing complete spatial randomness (CSR)


American Journal <strong>of</strong> Tropical Medicine and Hygiene Vol. 58, No 3, 1998 287-298<br />

Background<br />

Dengue fever, a viral disease transmitted by mosquitoes, can spread<br />

rapidly and is without a vaccine. Cases tend to be clustered because it is assumed<br />

that female mosquitoes rarely travel fur<strong>the</strong>r than 50-100 m in <strong>the</strong>ir lifetime.<br />

Objectives<br />

What is <strong>the</strong> spatial-temporal pattern <strong>of</strong> <strong>the</strong> diesease<br />

Methods Used<br />

K function for testing complete spatial randomness (CSR)


Projects and Articles<br />

Begin searching for articles to get ideas<br />

Also try to understand <strong>the</strong> availability <strong>of</strong> data<br />

This weekend, <strong>the</strong> class website will be updated with data sources, links<br />

to online journals, references to o<strong>the</strong>r journals<br />

If you come across a good resource, please let me know and I’ll add it to<br />

<strong>the</strong> website

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!