You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
78 4 Homology Search Tools<br />
high-scoring segment pairs (HSPs). As for the gap extension of HSPs, a red-black<br />
tree with diagonals as the key is employed to manipulate the extension process efficiently.<br />
4.6 Bibliographic Notes and Further Reading<br />
Twenty years ago, it would have been a legend to find similarities between two sequences.<br />
However, nowadays it would be a great surprise if we cannot find similarities<br />
between a newly sequenced biomolecular sequence and the GenBank database.<br />
FASTA and BLAST were the ones that boosted this historical change. In Table C.2<br />
of Appendix C, we compile a list of homology search tools.<br />
4.1<br />
The first linear-time suffix tree was given by Weiner [202] in 1973. A spaceefficient<br />
linear-time construction was proposed by McCreight [135] in 1976. An online<br />
linear-time construction was presented by Ukkonen [191] in 1995. A review of<br />
these three linear-time algorithms was given by Giegerich and Kurtz [75]. Gusfield’s<br />
book [85] gave a very detailed explanation of suffix trees and their applications.<br />
Suffix arrays were proposed by Manber and Myers [134] in 1991.<br />
4.2<br />
The FASTA program was proposed by Pearson and Lipman [161] in 1988. It improved<br />
the sensitivity of the FASTP program [128] by joining several initial regions<br />
if their scores pass a certain threshold.<br />
4.3<br />
The original BLAST paper by Altschul et al. [7] was the most cited paper published<br />
in the 1990s. The first version of BLAST generates ungapped alignments,<br />
whereas BLAST 2.0 [8] considers gapped alignments as well as position-specific<br />
scoring schemes. The idea of seeking multiple hits on the same diagonal was first<br />
proposed by Wilbur and Lipman in 1983 [204]. The book of Korf, Yandell, and<br />
Bedell [116] gave a full account of the BLAST tool. It gave a clear understanding<br />
of BLAST programs and demonstrated how to use BLAST for taking the full<br />
advantage.<br />
4.4<br />
The UCSC Genome Browser contains a large collection of genomes [104].<br />
BLAT [109] is used to map the query sequence to the genome.<br />
4.5