Testing for spatial correlation between two linked point processes

**Testing** **for** **spatial** **correlation** **between** **two** **linked** **point****processes**Danilo Lopes and Renato AssunçãoLESTE - Laboratório de Estatística EspacialDepartamento de Estatística - UFMG - Brazildanilollopes@yahoo.com.br1

Point patternsA RANDOM number of **point** locations in a map.Many examples:• Addresses of diseased individuals• Places of **for**est res in the last month on a region• Locations of homicides in a cityThe important aspect: the location of the **point**s are random.Danilo Lopes, LESTE - UFMG 2

Example: Point patternsDanilo Lopes, LESTE - UFMG 3

Bivariate Point patternsWe can have **two** **point** patterns such as:• disease cases and control individuals• **two** species of plants in a eld• **two** types of crimes depending on the lapse of time to police response: more or less than 10minutesDanilo Lopes, LESTE - UFMG 4

A single event originates **two** **point**sLINKED Bivariate **point** patternsThe **point**s are **linked** by their common originExample:• Cars are stolen and eventually retrieved within a large city.• Each robbery has **two** **point** locations associated: the place where it was stolen and the placewhere it was eventually foundWe want to model the association **between** the **two** locations.Lingo: ORIGIN and DESTINATIONDanilo Lopes, LESTE - UFMG 5

Example: **linked** Bivariate **point** patternsDanilo Lopes, LESTE - UFMG 6

From origin to destination: 50 carsDanilo Lopes, LESTE - UFMG 7

All stolen carsDanilo Lopes, LESTE - UFMG 8

Marginal patternsIf interest is on the origin pattern or destination pattern, then carry outa separate analysis.However, our focus is on the association **between** the **two** patterns.Danilo Lopes, LESTE - UFMG 9

A stochastic model **for** bivariate **point** patternsSuppose that a car is stolen at position x (the origin).We want to nd the probability distribution of its retrieval location y(the destination).Let f(y|x) be the density of the destination y given that the origin is x.For EACH possible origin x, we have a surface f(y|x) showing the mostprobable destinations of events originating at x.Danilo Lopes, LESTE - UFMG 10

Example: car is stolen in the Pampulha DistrictDanilo Lopes, LESTE - UFMG 11

Example: car is stolen in the North DistrictDanilo Lopes, LESTE - UFMG 12

General considerationsA good stochastic model should have three properties:1 - Fit the data well;2 - Describe them in a simple way;3 - Allows us to quantify several aspects of the process: estimation ofparameters, test about their values, etc.We want to tell a simple story about how the function f(y|x) changes asthe origin x moves.How to do that?There are an innite number of possibilities to consider.Danilo Lopes, LESTE - UFMG 13

A simple visualization toolA visualization tool to be incorporated in TERRAVIEW in the rstsemester of 2007 (??)Imagine a user with a map in front of him.With the mouse, he selects an origin location x on the map.He sees a surface representing the probability density of the likely destinationsy given that the origin is x.He then moves slowly the mouse over the map changing the origin x.As he moves the mouse, the surface is continuously updated showing howthe surface f(y|x) changes with x.This tool will inspire mathematical models to synthesize the stochasticprocess.Estimation of f(y|x) will be done with bivariate **linked** kernel functions.However,...one needs to be sure that he is not simply modeling noise.Danilo Lopes, LESTE - UFMG 14

The need **for** a testBe**for**e embarking on a complex modeling exercise, one needs to test ifthere is any dependence **between** origin and destination.This is a minimum requirement to estimate a stochastic model.That is, we want to test if f(y|x) changes with the origin x.If there is NO evidence that f(y|x) changes with x, we say that originand destination are independent.Otherwise, we say that they are dependent.If we conclude that origin and destination are independent then the analysisshould be purely univariate: study the origins and the destinationsseparately.The story is the simplest possible: it does not matter where your car isstolen, the likely destination locations are such and such.Danilo Lopes, LESTE - UFMG 15

A general model **for** bivariate **linked** **point** patternsLet (N 1 , N 2 ) be a bivariate **linked** **point** process in a polygon AThe data are a set of n events (x i , y i ), i = 1, . . . , n.x i is the origin location of the i-th eventy i is the destination location of the i-th eventUnder the assumption of independence, the JOINT probability density o**for**igins and destinations is given byp(ϕ) = C exp (g(x 1 , . . . , x n ) + h(y 1 , . . . , y n ))where C is a normalizing constant.Function g models the interaction among the origin events as well as any**spatial** variation in the rst order intensity of this marginal **processes**.Function h has the same role with respect to the destination events.The functions g and h can be chosen (almost) arbitrarily.Danilo Lopes, LESTE - UFMG 16

Introducing interaction **between** origin and destinationA model **for** the density p(ϕ) similar to a pairwise Gibbs **point** process:⎛⎞C exp⎝g(x 1 , . . . , x n ) + h(y 1 , . . . , y n ) − ∑ φ ((x i , y i ), (x j , y j ) ; θ) ⎠i

A simple one-parameter modelThen, we havep(ϕ) = C exp (g(x 1 , . . . , x n ) + h(y 1 , . . . , y n ) − θT (ϕ))whereT (ϕ) = ∑ i

Conditioning on the origins and on the destinationsWe do NOT want to specify models **for** the origin patterns or the destinationspatternsWe want to test if origin and destination are associated irrespective oftheir marginal **spatial** patterns.That is, we do not want to specify the functions g and h in the jointdensityp(ϕ) = C exp (g(x 1 , . . . , x n ) + h(y 1 , . . . , y n ) − θT (ϕ))One additional issue: the normalizing constant C is impossible to becalculated analytically.Both problems can be overcome if we condition on the origins x 1 , . . . , x nand on the destinations y 1 , . . . , y nDanilo Lopes, LESTE - UFMG 19

The conditional **linked** bivariate densityOne can show that, conditional on the unordered set of origin locationsx 1 , . . . , x n and on the unordered set of destinations y 1 , . . . , y n , we havep(ϕ|{x 1 , x n }, {y 1 , y n }) =(exp − ∑ )i

InferenceThe log-likelihood of the interaction parameter θ is given by( )∑l(θ) = −θT (ϕ) − log e θT (πϕ)πT (ϕ) is a natural sucient statistic **for** the parameter θ.The locally most powerful test of the hypothesis θ = 0 is based on thescore test statistic given by∂l∂θ∣ = 1 ∑T (πϕ) − T (ϕ) .θ=0n!πDanilo Lopes, LESTE - UFMG 21

MomentsThe moments of T (ϕ) under the null hypothesis are easily obtained. Forexample:E π [T (πϕ)] =n y( n2) ∑ i

Monte Carlo testIf the null hypothesis θ = 0 is true, then we havep(ϕ|{x 1 , x n }, {y 1 , y n }) = 1 n!That is, all the n! permutations of the LINKS **between** the xed destinationsand the xed origins have the same probability.Then:• Let T (ϕ) = t 0 be the test statistic using the real data;• Shue the destinations among the origins;• Recalculate the test statistic T (ϕ);• Repeat these **two** steps a large number of times;• The p-value is the proportion of times t 0 was larger than the simulated values.Danilo Lopes, LESTE - UFMG 23

ResultsMonte Carlo simulation with 3999 random permutations.r x = r y ϕ E[ϕ] P-value750 m 13884 1920 2.5E-41500 m 93262 21966.32 2.5E-42500 m 371307 129216.81 2.5E-43250 m 734011 314244.47 2.5E-44000 m 1241616 622437.05 2.5E-45000 m 2115013 1240667.53 2.5E-4Danilo Lopes, LESTE - UFMG 24

And then we have dependence. So what?The modeling of f (y|x) is context dependent.For Belo Horizonte data, it seems that a useful model is:• Given that a car is stolen at the origin x, its destination tends to be a mixture of **two** densities.• With probability p(x), it stays around x.• With probability 1 − p(x), it tends to be attracted towards the Center-South region.We need to model how p(x) changes with x.We need to model the spread around the origin location and the Center-South sink place.This requires Bayesian geostatistical modeling.Danilo Lopes, LESTE - UFMG 25

Next stepsWe are studying how to simulate the **point** pattern model using MonteCarlo Markov chain and perfect simulation methods.Through its simulation we want to study the model properties.For example, what is exactly the behavior of T (ϕ) when θ ≠ 0. Whatkind of constraints on the association can be due to the marginal patterns?We are implementing the geostatistical Bayesian model **for** Belo Horizontedata.Danilo Lopes, LESTE - UFMG 26

Next stepsWe hope to have a nice and simple story to tell you about the geographicalpatterns of car stealing in Belo Horzionte in the next GEOINFO.THANKS!Danilo Lopes, LESTE - UFMG 27