08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Exercise 7.30 Blast: Given a long sequence a, say 10 9 and a shorter sequence b, say<br />

10 5 , how do we find a position in a which is the start <strong>of</strong> a subsequence b ′ that is close to<br />

b? This problem can be solved by dynamic programming but not in reasonable time. Find<br />

a time efficient algorithm to solve this problem.<br />

Hint: (Shingling approach) One possible approach would be to fix a small length, say<br />

seven, and consider the shingles <strong>of</strong> a and b <strong>of</strong> length seven. If a close approximation to b<br />

is a substring <strong>of</strong> a, then a number <strong>of</strong> shingles <strong>of</strong> b must be shingles <strong>of</strong> a. This should allows<br />

us to find the approximate location in a <strong>of</strong> the approximation <strong>of</strong> b. Some final algorithm<br />

should then be able to find the best match.<br />

263

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!