Principles of Plant Genetics and Breeding
Principles of Plant Genetics and Breeding
Principles of Plant Genetics and Breeding
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Analysis<br />
BIOTECHNOLOGY IN PLANT BREEDING 241<br />
Two evolutionary concepts underlie the inferences from analyses <strong>of</strong> multiple sequence alignments. The first is that the mutation <strong>of</strong><br />
one sequence residue to another is a r<strong>and</strong>om event in nature <strong>and</strong> that all residues are more or less equally subject to mutation. The<br />
second is that the conservation <strong>of</strong> specific residues is maintained through evolutionary selection; that is, mutations that adversely<br />
affect the ability <strong>of</strong> a molecule to carry out its biochemical activity or physiological role will be eliminated by reducing the ability<br />
<strong>of</strong> the organism to live. Details <strong>of</strong> the evolutionary history between different pairs <strong>of</strong> sequences can lead to different inferences<br />
about the properties <strong>of</strong> the sequences.<br />
In the evolutionary history <strong>of</strong> some pairs <strong>of</strong> sequences – orthologues or orthologous sequences – the sequences have only speciation<br />
events in their evolutionary history. Other pairs <strong>of</strong> sequences – paralogues or paralogous sequences – have one or more<br />
gene duplication events in their common evolutionary history as well as having speciation events. In general, paralogues will<br />
carry out the same biochemistry on different substrates or with different c<strong>of</strong>actors; while orthologues will carry out the same biochemistry<br />
on the same substrates <strong>and</strong> will <strong>of</strong>ten serve an identical physiological role in the same pathways in different organisms.<br />
Homologues include both orthologues <strong>and</strong> paralogues. Since paralogues carry out the same basic biochemistry, for example<br />
reduce an aldehyde, the residues responsible for this activity are conserved. But, the residues responsible for the physiological<br />
role (e.g., substrate recognition, which specific aldehyde to reduce or lipid to bind) will be under different evolutionary pressures<br />
<strong>and</strong> will <strong>of</strong>ten diverge. A complete analysis <strong>of</strong> the multiple sequence alignment includes identifying residues responsible for the<br />
common, shared properties <strong>of</strong> the entire family <strong>and</strong> the residues that discriminate between paralogous groups (Figure 1).<br />
The results from the analysis <strong>of</strong> multiple sequence alignments is illustrated in the results obtained in the decades’ long quest to<br />
underst<strong>and</strong> how each <strong>of</strong> the 20 aminoacyl tRNA synthetase (aaRS) enzymes map each <strong>of</strong> the 60 tRNAs to its correct cognate<br />
amino acid. The problem, as outlined by Sir Francis Crick in 1957, is where is the information that allows an aaRS to recognize<br />
only the tRNAs encoding the correct amino acid? Experimental groups have employed a number <strong>of</strong> creative <strong>and</strong> innovative<br />
approaches towards answering this question (Söll & Schimmel 1974). McClain, Nicholas, <strong>and</strong> coworkers applied computational<br />
techniques, beginning with creating an accurate multiple sequence alignment <strong>of</strong> the tRNAs.<br />
The tRNA multiple sequence alignment was divided into 20 subsets, each based on their amino acid acceptor activity (McClain<br />
& Nicholas 1987). First these authors identified the sequence positions in the 66 tRNA sequences from Escherichia coli, its bacteriophages,<br />
<strong>and</strong> near relatives that were conserved. They then examined the remaining positions <strong>and</strong> asked the question: what<br />
Bpi_Bovin : VTCSSCSSHINSVHVHISKSK-VGWLIQLFSKKIESALRNKMNSQVCEKVTNSVSSKLQPYFQTLP<br />
Bpi_Human : ITCSSCSSDIADVEVDMSGD--SGWLLNLFHNQIESKFQKVLESRICEMIQKSVSSDLQPYLQTLP<br />
Lbp_Human : GYCLSCSSDIQNVELDIEGD--LEELLNLLQSQIDARLREVLESKICRQIEEAVTAHLQPYLQTLP<br />
Lbp_Rabit : VTTSSCSSRIRDLELHVSGN--VGWLLNLFHNQIESKLQKVLESKICEMIQKSVTSDLQPYLQTLP<br />
Lbp_Rat : VTASGCSNSFHKLLLHLQGEREPGWIKQLFTNFISFTLKLVLKGQICKEI-NVISNIMADFVQTRA<br />
Cetp_Human: TDAPDCYLSFHKLLLHLQGEREPGWIKQLFTNFISFTLKLVLKGQICKEI-NIISNIMADFVQTRA<br />
Cetp_Macfa: TDAPDCYLAFHKLLLHLQGEREPGWLKQLFTNFISFTLKLILKRQVCNEI-NTISNIMADFVQTRA<br />
Cetp_Rabit: TNAPDCYLAFHKLLLHLQGEREPGWLKQLFTNFISFTLKLILKRQVCNEI-NTISNIMADFVQTRA<br />
Pltp_Human: VSNVSCQASVSRMHAAFGGT--FKKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLLDYVP<br />
Pltp_Mouse: VSNVSCEASVSKMNMAFGGT--FRRMYNFFSTFITSGMRFLLNQQICPVLYHAGTVLLNSLLDTVP<br />
Lplc3_Rat : LILKRCNT----LLGHISLT--SGLLPTPIFGLVEQTLCKVLPGLLCPVV-DSVLSVVNELLGATL<br />
Lplc4_Rat : LVIERCDT----LLGGIKVKLLRGLLPNLVDNLVNRVLANVLPDLLCPIV-DVVLGLVNDQLGLVD<br />
Consensus C i l C t<br />
Figure 1 A section <strong>of</strong> a multiple sequence alignment <strong>of</strong> 12 sequences from the Bpi/Lbp/Lp1 superfamily <strong>of</strong><br />
lipid-binding protein sequences. The 12 sequences belong to six different paralogous families, each <strong>of</strong> which binds a<br />
different lipid <strong>and</strong> performs a different physiological role, generally a transport function. Each sequence is identified<br />
by a label that identifies the paralogous family (they have a gene duplication event separating the families) before the<br />
underscore character in the protein name <strong>and</strong> the species after the underscore. The protein names are taken from the<br />
Swiss-Prot database. The annotation associated with each sequence in the Swiss-Prot database describes the lipid<br />
bound <strong>and</strong> the physiological role <strong>of</strong> the protein. The sequences within each paralogous family are orthologous<br />
(related to each other only by speciation events). The dash sequence character indicates an alignment column where<br />
either the amino acids replaced by dashes have been lost through a deletion event or amino acids shown by the single<br />
letter codes have been inserted into the sequence during the course <strong>of</strong> evolution. The section marked with a black<br />
background <strong>and</strong> white letters is a moderately well-conserved motif. Such conservation <strong>of</strong>ten marks regions<br />
important to the structure or function <strong>of</strong> the protein. Amino acids highlighted with black letters on a light gray<br />
background are identical within the highlighted family <strong>of</strong> orthologous sequences <strong>and</strong> have quite different chemical<br />
or physical properties to those in the same column in other families. They may mark positions that are responsible<br />
for the differences among the paralogous families.