08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Random variables x 1 , x 2 , . . . , x n are pairwise independent if for any a i and a j , i ≠ j,<br />

Prob(x i = a i , x j = a j ) = Prob(x i = a i )Prob(x j = a j ). Mutual independence is much<br />

stronger than requiring that the variables are pairwise independent. Consider the example<br />

<strong>of</strong> 2-universal hash functions discussed in Chapter ??.<br />

(<br />

If (x, y) is a random vector and one normalizes it to a unit vector √ x √<br />

y<br />

x 2 +y 2 ,<br />

x 2 +y 2 )<br />

the coordinates are no longer independent since knowing the value <strong>of</strong> one coordinate<br />

uniquely determines the value <strong>of</strong> the other.<br />

12.4.2 Linearity <strong>of</strong> Expectation<br />

An important concept is that <strong>of</strong> the expectation <strong>of</strong> a random variable. The expected<br />

value, E(x), <strong>of</strong> a random variable x is E(x) = ∑ xp(x) in the discrete case and E(x) =<br />

x<br />

∞∫<br />

xp(x)dx in the continuous case. The expectation <strong>of</strong> a sum <strong>of</strong> random variables is equal<br />

−∞<br />

to the sum <strong>of</strong> their expectations. The linearity <strong>of</strong> expectation follows directly from the<br />

definition and does not require independence.<br />

12.4.3 Union Bound<br />

Let A 1 , A 2 , . . . , A n be events. The actual probability <strong>of</strong> the union <strong>of</strong> events is given<br />

by Boole’s formula.<br />

Prob(A 1 ∪ A 2 ∪ · · · A n ) =<br />

n∑<br />

i=1<br />

Prob(A i ) − ∑ ij<br />

Prob(A i ∧ A j ) + ∑ ijk<br />

Prob(A i ∧ A j ∧ A k ) − · · ·<br />

Often we only need an upper bound on the probability <strong>of</strong> the union and use<br />

Prob(A 1 ∪ A 2 ∪ · · · A n ) ≤<br />

This upper bound is called the union bound.<br />

n∑<br />

Prob(A i )<br />

i=1<br />

12.4.4 Indicator Variables<br />

A useful tool is that <strong>of</strong> an indicator variable that takes on value 0 or 1 to indicate<br />

whether some quantity is present or not. The indicator variable is useful in determining<br />

the expected size <strong>of</strong> a subset. Given a random subset <strong>of</strong> the integers {1, 2, . . . , n}, the<br />

expected size <strong>of</strong> the subset is the expected value <strong>of</strong> x 1 + x 2 + · · · + x n where x i is the<br />

indicator variable that takes on value 1 if i is in the subset.<br />

Example: Consider a random permutation <strong>of</strong> n integers. Define the indicator function<br />

x i = 1 if the i th integer in the permutation is i. The expected number <strong>of</strong> fixed points is<br />

given by<br />

389

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!