08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

7.6 Exercises<br />

Algorithms for Massive <strong>Data</strong> Problems<br />

Exercise 7.1 THIS EXERCISE IS IN THE TEXT. SHOULD WE DELETE?<br />

Given a stream <strong>of</strong> n positive real numbers a 1 , a 2 , . . . , a n , upon seeing<br />

a 1 , a 2 , . . . , a i keep track <strong>of</strong> the sum a = a 1 + a 2 + · · · + a i and a sample a j , j ≤ i drawn<br />

with probability proportional to its value. On reading a i+1 , with probability a i+1<br />

a+a i+1<br />

replace<br />

the current sample with a i+1 and update a. Prove that the algorithm selects an a i from<br />

the stream with the probability <strong>of</strong> picking a i being proportional to its value.<br />

Exercise 7.2 Given a stream <strong>of</strong> symbols a 1 , a 2 , . . . , a n , give an algorithm that will select<br />

one symbol uniformly at random from the stream. How much memory does your algorithm<br />

require?<br />

Exercise 7.3 Give an algorithm to select an a i from a stream <strong>of</strong> symbols a 1 , a 2 , . . . , a n<br />

with probability proportional to a 2 i .<br />

Exercise 7.4 How would one pick a random word from a very large book where the probability<br />

<strong>of</strong> picking a word is proportional to the number <strong>of</strong> occurrences <strong>of</strong> the word in the<br />

book?<br />

Exercise 7.5 Consider a matrix where each element has a probability <strong>of</strong> being selected.<br />

Can you select a row according to the sum <strong>of</strong> probabilities <strong>of</strong> elements in that row by just<br />

selecting an element according to its probability and selecting the row that the element is<br />

in?<br />

Exercise 7.6 For the streaming model give an algorithm to draw s independent samples<br />

each with the probability proportional to its value. Justify that your algorithm works<br />

correctly.<br />

Frequency Moments <strong>of</strong> <strong>Data</strong> Streams<br />

Number <strong>of</strong> Distinct Elements in a <strong>Data</strong> Stream<br />

Lower bound on memory for exact deterministic algorithm<br />

Algorithm for the Number <strong>of</strong> distinct elements<br />

Universal Hash Functions<br />

Exercise 7.7 Consider an algorithm that uses a random hash function and gives an<br />

estimate <strong>of</strong> a variable x. Let a be the actual value <strong>of</strong> x. Suppose that the estimate <strong>of</strong> x is<br />

within a ≤ x ≤ 4a with probability 0.6. The probability <strong>of</strong> the estimate is with respect to<br />

4<br />

choice <strong>of</strong> the hash function.<br />

1. How would you improve the estimate <strong>of</strong> x to a 2<br />

≤ x ≤ 2a with probability 0.6?<br />

2. How would you improve the probability that a 4<br />

≤ x ≤ 4a to 0.8?<br />

259

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!