BLAST and BLAT - Algorithms in Bioinformatics
BLAST and BLAT - Algorithms in Bioinformatics
BLAST and BLAT - Algorithms in Bioinformatics
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
28 Bio<strong>in</strong>formatics I, WS’12/13, D. Huson, October 30, 2012<br />
(Source: Kent 2002)<br />
1. For comparison of translated mouse reads <strong>and</strong> the human genome, Table 6 <strong>in</strong>dicates that K = 8<br />
would detect true seeds for 99% of all mouse reads, while only generat<strong>in</strong>g 749 r<strong>and</strong>om hits per<br />
query.<br />
<strong>BLAT</strong> implements near-perfect matches allow<strong>in</strong>g one mismatch <strong>in</strong> a hit, as follows:<br />
A non-overlapp<strong>in</strong>g <strong>in</strong>dex of all K-mers <strong>in</strong> the database is generated.<br />
For each K-mer <strong>in</strong> the query, a lookup is performed for the K-mer <strong>and</strong> for all words that differ from<br />
it at one position, which means K · (A − 1) + 1 lookups. For an am<strong>in</strong>o-acid search with K = 8, for<br />
example, 153 lookups are required per occurr<strong>in</strong>g K-mer.<br />
For a given level of sensitivity however, the near-perfect match criterion runs slower than the multipleperfect<br />
match criterion <strong>and</strong> thus is not so useful <strong>in</strong> practice.<br />
3.11.5 Strategy 3: Multiple perfect matches<br />
An alternative seed<strong>in</strong>g strategy is to require multiple perfect matches that are constra<strong>in</strong>ed to be near<br />
each other.<br />
For example, consider a situation where there are two hits between the query <strong>and</strong> the database<br />
sequences that “lie on the same diagonal” <strong>and</strong> are close to each other (with<strong>in</strong> some given distance W ),<br />
such as a <strong>and</strong> b here: