01.04.2015 Views

Sequence Comparison.pdf

Sequence Comparison.pdf

Sequence Comparison.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

78 4 Homology Search Tools<br />

high-scoring segment pairs (HSPs). As for the gap extension of HSPs, a red-black<br />

tree with diagonals as the key is employed to manipulate the extension process efficiently.<br />

4.6 Bibliographic Notes and Further Reading<br />

Twenty years ago, it would have been a legend to find similarities between two sequences.<br />

However, nowadays it would be a great surprise if we cannot find similarities<br />

between a newly sequenced biomolecular sequence and the GenBank database.<br />

FASTA and BLAST were the ones that boosted this historical change. In Table C.2<br />

of Appendix C, we compile a list of homology search tools.<br />

4.1<br />

The first linear-time suffix tree was given by Weiner [202] in 1973. A spaceefficient<br />

linear-time construction was proposed by McCreight [135] in 1976. An online<br />

linear-time construction was presented by Ukkonen [191] in 1995. A review of<br />

these three linear-time algorithms was given by Giegerich and Kurtz [75]. Gusfield’s<br />

book [85] gave a very detailed explanation of suffix trees and their applications.<br />

Suffix arrays were proposed by Manber and Myers [134] in 1991.<br />

4.2<br />

The FASTA program was proposed by Pearson and Lipman [161] in 1988. It improved<br />

the sensitivity of the FASTP program [128] by joining several initial regions<br />

if their scores pass a certain threshold.<br />

4.3<br />

The original BLAST paper by Altschul et al. [7] was the most cited paper published<br />

in the 1990s. The first version of BLAST generates ungapped alignments,<br />

whereas BLAST 2.0 [8] considers gapped alignments as well as position-specific<br />

scoring schemes. The idea of seeking multiple hits on the same diagonal was first<br />

proposed by Wilbur and Lipman in 1983 [204]. The book of Korf, Yandell, and<br />

Bedell [116] gave a full account of the BLAST tool. It gave a clear understanding<br />

of BLAST programs and demonstrated how to use BLAST for taking the full<br />

advantage.<br />

4.4<br />

The UCSC Genome Browser contains a large collection of genomes [104].<br />

BLAT [109] is used to map the query sequence to the genome.<br />

4.5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!