02.04.2015 Views

Bivariate or joint probability distributions

Bivariate or joint probability distributions

Bivariate or joint probability distributions

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1<br />

STATISTICS: MODULE 12122<br />

Chapter 3 - <strong>Bivariate</strong> <strong>or</strong> <strong>joint</strong> <strong>probability</strong> <strong>distributions</strong><br />

In this chapter we consider the distribution of two random variables where both<br />

random variables are discrete (considered first) and probably m<strong>or</strong>e imp<strong>or</strong>tantly where<br />

both random variables are continuous. <strong>Bivariate</strong> <strong>or</strong> <strong>joint</strong> <strong>distributions</strong> model the way<br />

two random variables vary together.<br />

A. DISCRETE VARIABLES<br />

Example 3.1<br />

Here we have a <strong>probability</strong> model of the demand and supply of a perishable<br />

commodity. The <strong>probability</strong> model/distribution is defined as follows:<br />

Supply of commodity (SP)<br />

1 2 3<br />

0 0.015 0.025 0.010<br />

Demand f<strong>or</strong> 1 0.045 0.075 0.030<br />

commodity 2 0.195 0.325 0.130<br />

(D) 3 0.030 0.050 0.020<br />

4 0.015 0.025 0.010<br />

This is known as a discrete bivariate <strong>or</strong> <strong>joint</strong> <strong>probability</strong> distribution since there are<br />

two random variables which are "demand f<strong>or</strong> commodity (D)" and "supply of<br />

commodity (SP)".<br />

The sample space S consists of 15 outcomes (d, s) where d and s are the values of D<br />

and SP.<br />

The probabilities in the table are <strong>joint</strong> probabilities, namely P( D = d and SP = s) <strong>or</strong><br />

P( D = d ï SP = s) using set notation.<br />

Examples<br />

Note: The sum of the 15 probabilities is 1.<br />

3.2 Joint <strong>probability</strong> function<br />

Suppose the random variables are X and Y , then the <strong>joint</strong> <strong>probability</strong> function is<br />

denoted by p( x, y) and is defined as follows:<br />

p( x, y) = P( X = x and Y = y) <strong>or</strong> P( X = x ï Y = y)


∑ ∑ , = 1.<br />

Also p( x y)<br />

x<br />

y<br />

2<br />

3.3 Marginal <strong>probability</strong> <strong>distributions</strong><br />

The marginal <strong>distributions</strong> are the <strong>distributions</strong> of X and Y considered separately<br />

and model how X and Y vary separately from each other. Suppose the <strong>probability</strong><br />

functions of X and Y are p<br />

X ( x)<br />

and pY ( y)<br />

respectively so that<br />

p<br />

X ( x)<br />

= P(X = x) and p ( y)<br />

Also ∑ p<br />

X ( x)<br />

= 1 and pY<br />

( y)<br />

x<br />

∑ = 1.<br />

y<br />

Y<br />

= P(Y = y)<br />

It is quite straightf<strong>or</strong>ward to obtain these these from the <strong>joint</strong> <strong>probability</strong> distribution<br />

p x p x , y p y p x , y<br />

since<br />

X ( ) = ∑ ( ) and<br />

Y ( ) = ∑ ( )<br />

y<br />

In regression problems we are very interested in conditional <strong>probability</strong> <strong>distributions</strong><br />

such as the conditional distribution of X given Y = y and the conditional distribution<br />

of Y given X = x<br />

x<br />

3.4 Conditional <strong>probability</strong> <strong>distributions</strong><br />

The conditional <strong>probability</strong> function of X given Y = y is denoted by p( x y)<br />

is defined as<br />

p( x y ) = P( X x Y y)<br />

= = =<br />

( = = )<br />

P( Y = y)<br />

P X x and Y y<br />

=<br />

( , )<br />

( y)<br />

p x y<br />

whereas the conditional <strong>probability</strong> function of Y given X = x is denoted by p( y x)<br />

and defined as<br />

p( y x ) = P( Y y X x)<br />

= = =<br />

( = = )<br />

P( X = x)<br />

P Y y and X x<br />

=<br />

p<br />

Y<br />

( , )<br />

( x)<br />

p x y<br />

p<br />

X<br />

3.5 Joint <strong>probability</strong> distribution function<br />

The <strong>joint</strong> (cumulative) <strong>probability</strong> distribution function (c.d.f.) is denoted by F(x, y)<br />

and is defined as<br />

F(x, y) = P( X ó x and Y ó y) and 0 ó F(x, y) ó 1<br />

The marginal c.d.f ’s are denoted by FX ( x)<br />

and F ( y)<br />

F<br />

X ( x)<br />

= P( X ó x) and F ( y)<br />

(see Chapter 1, section 1.12 ).<br />

Y<br />

= P( Y ó y)<br />

Y<br />

and are defined as follows


3<br />

3.6 Are X and Y independent?<br />

If either (a) F(x, y) = FX ( x)<br />

. FY ( y)<br />

<strong>or</strong> (b)p(x, y) = p<br />

X ( x)<br />

. pY<br />

( y)<br />

then X and Y are independent random variables.<br />

Example 3.2 The <strong>joint</strong> distribution of X and Y is<br />

X<br />

-2 -1 0 1 2<br />

Y 10 0.09 0.15 0.27 0.25 0.04<br />

20 0.01 0.05 0.08 0.05 0.01<br />

(a) Find the marginal <strong>distributions</strong> of X and Y.<br />

(b) Find the conditional distribution of X given Y =20.<br />

(c) Are X and Y independent?<br />

B. CONTINUOUS VARIABLES<br />

3.7 Joint <strong>probability</strong> density function<br />

The <strong>joint</strong> p.d.f. is denoted by f (x, y) (where f ( x , y)<br />

≥ 0 all x and y) and defines<br />

a <strong>probability</strong> surface in 3 dimensions. Probability is a volume under this surface and<br />

the total volume under the p.d.f. surface is 1 as the total <strong>probability</strong> is 1 i.e.<br />

∞<br />

∞<br />

( , )<br />

∫ ∫ f x y dx dy = 1<br />

−∞<br />

−∞<br />

y=<br />

d x=<br />

b<br />

and P( a ó X ó b and c ó Y ó d ) = ( , )<br />

∫<br />

∫<br />

y=<br />

c x=<br />

a<br />

f x y dx dy<br />

As bef<strong>or</strong>e with discrete variables, the marginal <strong>distributions</strong> are the <strong>distributions</strong> of<br />

X and Y considered separately and model how X and Y vary separately from each other.<br />

Whereas with discrete random variables we speak of marginal <strong>probability</strong> functions,<br />

with continuous random variables we speak of marginal <strong>probability</strong> density functions.<br />

Example 3.3<br />

An electronics system has one of each of two different types of components in <strong>joint</strong><br />

operation. Let X and Y denote the random lengths of life of the components of type 1<br />

and 2, respectively. Their <strong>joint</strong> density function is given by<br />

( x y)<br />

/<br />

f ( x , y) = ⎛ x e x ; y<br />

⎝ ⎜ 1 ⎞ ⎠ ⎟ − + 2<br />

> 0 > 0<br />

8<br />

= 0<br />

otherwise


4<br />

Example 3.4<br />

The random variables X and Y have a bivariate n<strong>or</strong>mal distribution if<br />

( )<br />

−<br />

f x, y = ae b<br />

where<br />

a=<br />

2π σ σ 1−<br />

ρ<br />

X<br />

1<br />

Y<br />

2<br />

and<br />

b =<br />

1<br />

2<br />

( − ρ )<br />

2 1<br />

⎡ ⎛ x − µ ⎞ ⎛<br />

X<br />

x − µ ⎞ ⎛<br />

X<br />

y − µ ⎞ ⎛<br />

Y<br />

y − µ<br />

⎢ ⎜ ⎟ − 2ρ⎜<br />

⎟ ⎜ ⎟ + ⎜<br />

⎣⎢<br />

⎝ σ<br />

X ⎠ ⎝ σ<br />

X ⎠ ⎝ σ<br />

Y ⎠ ⎝ σ<br />

Y<br />

2 2<br />

Y<br />

⎞<br />

⎟<br />

⎠<br />

⎤<br />

⎥<br />

⎦⎥<br />

where −∞< x


3.9 Conditional <strong>probability</strong> density functions<br />

The conditional p.d.f of X given Y = y is denoted by f ( x y)<br />

and<br />

defined as f ( x y ) = f ( x Y y)<br />

= =<br />

5<br />

( , )<br />

( y)<br />

f x y<br />

whereas the conditional p.d.f of Y given X = x is denoted by f ( y x)<br />

and<br />

defined as f ( y x ) = f ( y X x)<br />

= =<br />

f<br />

Y<br />

( , )<br />

( x)<br />

f x y<br />

f<br />

X<br />

3.10 Joint <strong>probability</strong> distribution function<br />

As in 3.5 the <strong>joint</strong> (cumulative) <strong>probability</strong> distribution function (c.d.f.) is denoted by<br />

F(x, y) and is defined as F(x, y) = P( X ó x and Y ó y) but F(x, y) in the continuous<br />

case is the volume under the p.d.f. surface from X = −∞ to X = x and from Y = −∞ to<br />

Y = y, so that<br />

F( x y)<br />

v=<br />

y<br />

∫<br />

u=<br />

x<br />

, = ( , )<br />

v=−∞<br />

∫<br />

u=−∞<br />

f u v du dv<br />

The marginal c.d.f. ‘s are defined as in 3.5 and can be obtained from the <strong>joint</strong><br />

distribution function F(x, y) as follows:<br />

F<br />

F<br />

X ( x)<br />

= F( x y MAX )<br />

Y ( y)<br />

= F( x y)<br />

, where y MAX<br />

is the largest value of y and<br />

MAX , where x MAX<br />

is the largest value of x.<br />

3.11 Imp<strong>or</strong>tant connections between the p.d.f ‘s and the <strong>joint</strong> c.d.f.’s.<br />

(i) The joinf p.d.f. f (x, y) =<br />

( , )<br />

∂<br />

2 F x y<br />

∂x∂y<br />

(ii) The marginal p.d.f’s can be obtained from the marginal c.d.f.’s as follows:<br />

the marginal p.d.f. of X = f ( x)<br />

X<br />

=<br />

the marginal p.d.f. of Y = f ( y)<br />

Y<br />

=<br />

dF<br />

X<br />

dx<br />

dF y<br />

Y<br />

dy<br />

( x)<br />

( )<br />

<strong>or</strong> F ′ ( x)<br />

X<br />

,<br />

<strong>or</strong> F ′ ( y)<br />

Y<br />

3.12 Are X and Y independent?<br />

X and Y are independent random variables if either<br />

(a) F(x, y) = FX ( x)<br />

FY<br />

( y)<br />

; <strong>or</strong>


(b) f(x, y) = f<br />

X ( x)<br />

f ( y)<br />

Y<br />

; <strong>or</strong><br />

6<br />

(c) f ( x y ) = function of x only <strong>or</strong> equivalently f ( y x ) = function of y only<br />

Example 3.5 The <strong>joint</strong> distribution function of X and Y is given by<br />

F x y y x 2<br />

⎛ ⎞<br />

2<br />

( , ) =<br />

3 ⎜ + x⎟ ⎝ 2 ⎠<br />

0≤ x,<br />

y ≤1<br />

= 0 otherwise<br />

(i) Find the marginal distribution and density functions.<br />

(ii) Find the <strong>joint</strong> density function.<br />

(iii) Are X and Y independent random variables?<br />

Example 3.6<br />

X and Y have the <strong>joint</strong> <strong>probability</strong> density function<br />

2<br />

8x<br />

f ( x, y)<br />

= 1≤ x,<br />

y≤<br />

2<br />

3<br />

7y<br />

(a) Derive the marginal distribution function of X.<br />

(b) Derive the conditional density function of X given Y = y<br />

(c) Are X and Y independent?<br />

Given:<br />

Given:<br />

Joint density fn. f (x ,y) Joint distribution fn. F(x, y)<br />

⏐ Integrate w.r.t ⏐ Differentiate (partially)<br />

⏐ x and y ⏐ w.r.t. x and y<br />

↓<br />

↓<br />

Joint distribution fn. F(x, y) Joint density fn. f (x, y)<br />

v=<br />

y<br />

∫<br />

v=−∞<br />

u=<br />

x<br />

∫<br />

u=−∞<br />

( , )<br />

f u v du dv<br />

∂<br />

2 F<br />

∂x∂y .


Example 3.6(b) and (c)<br />

7<br />

Solution From 3.9 the conditional p.d.f of X given Y = y is denoted by f ( x y)<br />

and<br />

defined as f ( x y ) = f ( x Y y)<br />

= =<br />

( , )<br />

( y)<br />

f x y<br />

and Y and fY ( y)<br />

is the marginal p.d.f. of Y. We know f ( x y)<br />

find f ( y)<br />

Y<br />

.<br />

There are two ways you can find f ( y)<br />

f<br />

Y<br />

where f ( x , y)<br />

is the <strong>joint</strong> p.d.f. of X<br />

x<br />

, = 8 7y<br />

2<br />

3<br />

so we need to<br />

Y<br />

. The first way involves integration and the<br />

second way involves differentiation. I will do both ways to show you how to use the<br />

different results we have here but you should always choose the way you find easiest i.e<br />

you would not be expected to find fY ( y)<br />

both ways in any assessed w<strong>or</strong>k .<br />

Method 1<br />

∞<br />

From 3.8 fY ( y)<br />

= ∫ f ( x , y)<br />

dx so f ( y)<br />

−∞<br />

Y<br />

=<br />

2<br />

2<br />

8x<br />

∫ 3<br />

7y<br />

dx =<br />

1<br />

8<br />

y<br />

2<br />

7 3 2<br />

∫ x dx =<br />

1<br />

2<br />

3<br />

8 ⎡ x ⎤<br />

3 ⎢<br />

7y<br />

⎣ 3<br />

⎥ =<br />

⎦<br />

1<br />

=<br />

8<br />

7y<br />

3<br />

3<br />

⎡2<br />

⎢<br />

⎣ 3<br />

−<br />

1⎤<br />

⎥<br />

3<br />

= 8<br />

⎦ 7y<br />

3<br />

⎡7<br />

⎣<br />

⎢3<br />

⎤<br />

⎦<br />

⎥ = 8<br />

y<br />

3 3<br />

Method 2<br />

From 3.11 f ( y)<br />

Y<br />

=<br />

dF<br />

Y<br />

( y)<br />

dy<br />

From 3.10 FY ( y)<br />

= F( x y)<br />

F<br />

F<br />

where F ( y)<br />

MAX , where x MAX<br />

Y ( y)<br />

= F( 2, y)<br />

and from part (a), F( x y)<br />

( y)<br />

Hence f ( y)<br />

4<br />

21 2 1 ⎛<br />

1 1 ⎞<br />

⎜ −<br />

2 ⎟ = 4 ⎛<br />

⎝ y ⎠ 3 1 1 ⎞<br />

⎜ −<br />

2 ⎟<br />

⎝ y ⎠<br />

3<br />

2, = ( − )<br />

Y<br />

= d ⎛ 4 ⎛<br />

dy 3 1 1 ⎞⎞<br />

⎜ ⎜ −<br />

2<br />

⎟⎟ = 4 ⎛ 2 ⎞<br />

⎜ 3 ⎟ =<br />

⎝ ⎝ y ⎠⎠<br />

3 ⎝ y ⎠<br />

Y<br />

is the marginal c.d.f of Y.<br />

is the largest value of x, so<br />

4<br />

21<br />

3<br />

, = ( x − )<br />

8<br />

3y<br />

3<br />

hence F ( y)<br />

⎛<br />

1 ⎜1−<br />

⎝<br />

1 ⎞<br />

2 ⎟ so<br />

y ⎠<br />

Y<br />

= 4 ⎛<br />

3 1 1 ⎞<br />

⎜ −<br />

2 ⎟ .<br />

⎝ y ⎠<br />

as with method 1.<br />

Now theref<strong>or</strong>e the conditional density function of X given Y = y , f ( x y)<br />

is given by<br />

So ( )<br />

f<br />

f ( x y ) = ( x,<br />

y )<br />

f ( y)<br />

Y<br />

=<br />

⎛ 8x<br />

⎜<br />

⎝ 7y<br />

⎛ 8<br />

⎜<br />

⎝ 3y<br />

2<br />

3<br />

3<br />

⎞<br />

⎟<br />

⎠<br />

⎞<br />

⎟<br />

⎠<br />

= 3 x<br />

7<br />

f x y = 3 3<br />

x<br />

7<br />

1 ó x ó 2 and 1 ó y ó 2<br />

= 0 otherwise<br />

(c) Now f ( x y)<br />

is a function of x only, so using result 3.12(c), X and Y are independent.<br />

Notice also that f ( x y ) = f<br />

X ( x)<br />

which you would expect if X and Y are independent.<br />

3


3.13 Expectations and variances<br />

8<br />

Discrete random variables<br />

r<br />

r<br />

r<br />

r<br />

( ) = ∑ ∑ ( , ) = ∑ x ∑ p( x , y)<br />

= x p<br />

X ( x)<br />

E X x p x y<br />

x<br />

y<br />

x<br />

r<br />

r<br />

r<br />

r<br />

( ) = ∑ ∑ ( , ) = ∑ y ∑ p( x , y)<br />

= y pY<br />

( y)<br />

E Y y p x y<br />

Examples<br />

x<br />

y<br />

y<br />

y<br />

x<br />

∑ r =1,2.....<br />

x<br />

∑ r =1,2....<br />

y<br />

2 2<br />

Hence Var(X) = E( X ) − ( E( X ))<br />

, Var(Y) = E( Y ) ( E( Y<br />

)<br />

Continuous random variables<br />

∞<br />

∞<br />

r r r<br />

( ) ∫ ∫ ( ) ∫ X ( )<br />

− etc.<br />

2 2<br />

E X = x f x , y dx dy = x f x dx r = 1, 2 .....<br />

−∞ −∞<br />

∞<br />

∞<br />

r r r<br />

( ) ∫ ∫ ( ) ∫ Y ( )<br />

E Y = y f x , y dx dy = y f y dy r = 1, 2 .....<br />

Examples<br />

−∞ −∞<br />

∞<br />

−∞<br />

∞<br />

−∞<br />

3.14 Expectation of a function of the r.v.'s X and Y<br />

Continuous X and Y<br />

Discrete X and Y<br />

∞<br />

∞<br />

∫ ∫<br />

E[ g( X, Y)] = g( x, y) f ( x, y)<br />

dxdy<br />

−∞ −∞<br />

∞<br />

∞<br />

e . g . ⎡<br />

E X ⎤ x<br />

⎣<br />

⎢ Y ⎦<br />

⎥ = ∫ ∫<br />

y f ( x, y)<br />

dxdy<br />

E[XY] =<br />

−∞ −∞<br />

∞<br />

∞<br />

∫ ∫<br />

−∞ −∞<br />

xy f ( x , y ) dxdy .<br />

3.15 Covariance and c<strong>or</strong>relation<br />

Covariance of X and Y is defined as follows : Cov (X,Y) = σ XY<br />

= E(XY) - E(X)E(Y).<br />

Notes<br />

(a) If the random variables increase together <strong>or</strong> decrease together, then the covariance<br />

will be positive, whereas if one random variable increases and the other variable<br />

decreases and vice-versa, then the covariance will be negative.<br />

(b) If X and Y are independent r.v's, then E(XY) = E(X)E(Y) so cov(X, Y) = 0.<br />

However<br />

if cov(X,Y) = 0, it does not follow that X and Y are independent unless X and Y are


9<br />

N<strong>or</strong>mal r.v's.<br />

C<strong>or</strong>relation coefficient = ρ = c<strong>or</strong>r(X,Y) = Cov ( X , Y ) .<br />

σ<br />

Xσ<br />

Y<br />

Note<br />

(a) The c<strong>or</strong>relation coefficient is a number between -1 and 1 i.e. -1 ó ρ ó 1<br />

(b) If the random variables increase together <strong>or</strong> decrease together, then ρ<br />

will be positive, whereas if one random variable increases and the other variable<br />

decreases and vice-versa, then ρ will be negative.<br />

(c) It measures the degree of linear relationship between the two random variables X<br />

and Y , so if there is a non-linear relationship between X and Y <strong>or</strong> X and Y are<br />

independent random variables, then ρ will be 0.<br />

You will study c<strong>or</strong>relation in m<strong>or</strong>e detail in the Econometric part of the course with<br />

David Winter.<br />

Example 3.7 In Example 3.2 are X and Y c<strong>or</strong>related?<br />

Solution Below is the <strong>joint</strong> <strong>or</strong> bivariate <strong>probability</strong> distribution of X and Y:<br />

X<br />

-2 -1 0 1 2<br />

Y 10 0.09 0.15 0.27 0.25 0.04<br />

20 0.01 0.05 0.08 0.05 0.01<br />

The marginal <strong>distributions</strong> of X and Y are<br />

x -2 -1 0 1 2 Total<br />

p<br />

X<br />

x<br />

0.10 0.20 0.35 0.30 0.05 1.00<br />

P(X =x) <strong>or</strong> ( )<br />

and<br />

P(Y = y) <strong>or</strong> ( )<br />

y 10 20 Total<br />

pY y<br />

0.8 0.20 1.00<br />

Example 3. 8 In Example 3.6<br />

(i) Calculate E(X), Var(X), E(Y) and cov(X,Y).<br />

(ii) Are X and Y independent?<br />

3.14 Useful results on expectations and variances<br />

(i) E( aX + bY) = aE( X) + bE( Y)<br />

where a and b are constants.


10<br />

(ii) Var( aX + bY) = a Var( X) + b Var( Y) + 2ab cov( X, Y).<br />

Result (i) can be extended to any n random variables X 1<br />

, X 2<br />

,......., X n<br />

E a X + a X + ....... + a X = a E X + a E X + ........ + a E X<br />

( ) ( ) ( ) ( )<br />

1 1 2 2 n n 1 1 2 2<br />

n n<br />

When X and Y are independent, then<br />

(iii) Var( aX + bY) = a 2 Var( X) + b 2 Var( Y)<br />

= so cov( X, Y ) = 0<br />

(iv) E( XY ) E( X) E( Y )<br />

Results (iii) and (iv) can be extended to any n independent random variables<br />

X 1<br />

, X 2<br />

,......., X n<br />

(iii)* Var( a X a X ....... a X )<br />

+ + + =<br />

1 1 2 2<br />

n<br />

n<br />

2<br />

2<br />

( ) + ( ) + ........ + ( )<br />

2<br />

a Var X a Var X a Var X<br />

1<br />

1 2<br />

(iv)* E( X X ..... X ) = E( X ). E( X )........<br />

E( X )<br />

1 2 n<br />

1 2<br />

n<br />

2<br />

n<br />

n


11<br />

3.15 Combinations of independent N<strong>or</strong>mal random variables<br />

Suppose X<br />

X<br />

2<br />

2<br />

2<br />

~ N ( µ , σ ) , X ~ N( µ , σ ) , X ~ N ( µ , σ ) ,.........and<br />

1 2 2<br />

2 2 2<br />

3 3 3<br />

~ N ( µ , σ 2 ) X 1<br />

, X 2<br />

,......., X n<br />

are independent random variables, then if<br />

n n n<br />

Y = a1 X1 + a2 X2 + a3 X3 + ..... + a n<br />

X n<br />

where a 1<br />

, a 2<br />

..... a n<br />

are constants,<br />

2 2 2 2 2 2<br />

Y ~ N( a µ + a µ + a µ + ..... + a µ , a σ + a σ + ..... + a σ )<br />

1 1 2 2 3 3 n n 1 1 2 2<br />

n n<br />

∑<br />

i.e. Y ~ N( a µ , a σ 2 ).<br />

∑<br />

i i i i


12<br />

In particular, suppose X 1<br />

, X 2<br />

..... X n<br />

f<strong>or</strong>m a random sample from a N<strong>or</strong>mal<br />

population with mean µ and variance σ 2 ,<br />

2 2 2 2<br />

µ = µ = µ = ..... = µ = µ and σ = σ = ..... = σ = σ .<br />

1 2 3 n<br />

1<br />

∑<br />

Y ~ N( a µ , a<br />

2 σ<br />

2 ).<br />

i<br />

∑<br />

i<br />

2<br />

n<br />

Further, suppose that a = a = a = ..... = a =<br />

1 2 3<br />

n<br />

1<br />

n<br />

then Y = X + X + X<br />

1 2<br />

...<br />

n<br />

n<br />

=<br />

X<br />

and X ~ N( µ , σ 2 ) .<br />

n

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!