09.12.2012 Views

Principles of Plant Genetics and Breeding

Principles of Plant Genetics and Breeding

Principles of Plant Genetics and Breeding

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

240 CHAPTER 14<br />

Introduction<br />

Bioinformatics is the application <strong>of</strong> informatic techniques to biological data. These techniques include acquiring, annotating,<br />

analyzing, <strong>and</strong> archiving the biological data, using concepts from biology, computer science, <strong>and</strong> mathematics. Bioinformatics<br />

has its roots in the work <strong>of</strong> people like Margaret Dayh<strong>of</strong>f (collecting known sequences (Dayh<strong>of</strong>f et al. 1965) <strong>and</strong> a mathematical<br />

model <strong>of</strong> protein evolution (Dayh<strong>of</strong>f et al. 1978)) <strong>and</strong> David Sank<strong>of</strong>f (sequence alignment <strong>and</strong> statistical tests for homology) in<br />

molecular phylogenetics.<br />

Bioinformatic analyses aim to discover precise, testable hypotheses to supplement or redirect biology experiments. The results<br />

<strong>of</strong> these bioinformatics-derived experiments then influence the next round <strong>of</strong> computer-based studies. This interaction between<br />

experiment <strong>and</strong> computation speeds scientific progress.<br />

The first large collections <strong>of</strong> biological data were protein sequences, followed by nucleic acid sequences, <strong>and</strong> were collected<br />

by individual laboratories <strong>and</strong> stored on computer punch cards, along with annotations, such as species, biochemical function,<br />

<strong>and</strong> physiological role, <strong>and</strong> functional <strong>and</strong> structural domains. As the amount <strong>of</strong> sequence data grew <strong>and</strong> the need to share this<br />

data between multiple laboratories grew, the collection <strong>and</strong> curation were taken over by specialized groups such as National<br />

Biomedical Research Foundation/Protein Information Resource (NBRF/PIR), GeneBank, EMBL, <strong>and</strong> Swiss-Prot, where curation<br />

included st<strong>and</strong>ardizing the format <strong>and</strong> ancillary data. With increased size also came the need for tools to search, analyze, <strong>and</strong><br />

annotate these databases.<br />

Pairwise alignment <strong>and</strong> database searching<br />

Molecular biologists commonly isolate <strong>and</strong> sequence molecules based on their association with particular biological phenomena,<br />

such as disease resistance in plants. Typically, the biochemical function <strong>of</strong> the newly determined sequence is not known <strong>and</strong><br />

one compares the newly determined sequence to all known sequences whose biochemical functions are known to generate a<br />

testable hypothesis about its function. Thus, an early bioinformatics tool was to search a database <strong>of</strong> annotated sequences with a<br />

newly determined sequence to find all similar sequences. Sequences that were similar enough were inferred to be homologous,<br />

that is to have descended from a common evolutionary ancestor. The inference <strong>of</strong> homology generates the hypothesis that the<br />

two molecules carry out the same biochemical function <strong>and</strong> perhaps the same physiological role. A complete discussion <strong>of</strong><br />

database searching is given in Nicholas et al. (2000).<br />

The power <strong>of</strong> a successful database search is demonstrated by comparing the histories <strong>of</strong> cystic fibrosis (CF) <strong>and</strong> type I<br />

neur<strong>of</strong>ibromatosis (NF-1) research. Both disease genes were isolated in 1988. The CF gene was identified as a chloride ion<br />

transport protein, which led to the development <strong>of</strong> a number <strong>of</strong> therapies, with many now in final clinical trials. In 1988<br />

the database search with the NF-1 gene failed <strong>and</strong> no homologues were found. It was not until 1998 that NF-1 was identified<br />

as a growth suppressor, which has rapidly lead to improved diagnosis <strong>and</strong> many potential therapies for which clinical trials<br />

are just beginning. Thus, the successful identification <strong>of</strong> the CF gene as a chloride ion transporter accelerated this research<br />

area by a decade compared to the time required to discover the biochemical function <strong>of</strong> the NF-1 gene through biological<br />

experiments.<br />

Multiple sequence alignment<br />

Industry highlights<br />

Bioinformatics for sequence <strong>and</strong> genomic data<br />

Hugh B. Nicholas, Jr., David W. Deerfield II, <strong>and</strong> Alex<strong>and</strong>er J. Ropelewski<br />

Pittsburgh Supercomputing Center, Pittsburgh, PA 15213, USA<br />

A database search results in discovering many similar sequences from which one would like to create a multiple sequence alignment<br />

that simultaneously shows the relationship among its homologous residues in the other sequences. This alignment is a map<br />

<strong>of</strong> the evolution <strong>of</strong> the protein family. The multiple sequence alignment is a rich source <strong>of</strong> hypotheses to guide experimental work<br />

since the alignment contains patterns <strong>of</strong> conservation <strong>and</strong> variation <strong>of</strong> residues among the sequences, which provides insights<br />

into functional <strong>and</strong> structural positions for either the family <strong>of</strong> proteins or the genes encoding them. Such inferences are strongest<br />

if the alignment contains sequences from widely diverse species.<br />

Multiple sequence alignment implies that the residues in each column <strong>of</strong> the alignment are all evolutionarily related to each<br />

other. Thus accuracy is most commonly considered to be improved by maximizing the observed degree <strong>of</strong> conservation in the<br />

alignment as a whole, as discussed in Nicholas et al. (2002).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!