Principles of Plant Genetics and Breeding
Principles of Plant Genetics and Breeding
Principles of Plant Genetics and Breeding
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
242 CHAPTER 14<br />
position or combination <strong>of</strong> positions discriminate one subset from the others? Using a number <strong>of</strong> statistical techniques (e.g.,<br />
multivariate analysis, group theory), they were able to develop a model for how tRNA sequences allowed the aaRS enzymes to<br />
identify the correct tRNA molecules <strong>and</strong> reject the incorrect ones (McClain 1995). Experimental verification (started prior to the<br />
computational work in one case) <strong>of</strong> these results were published subsequently (Hou & Schimmel 1988; McClain & Foss 1988).<br />
Additional use <strong>of</strong> the multiple sequence alignment<br />
In addition to the above analysis <strong>of</strong> a well-crafted multiple sequence alignment, there are two additional areas that can use<br />
the information contained within a multiple sequence alignment. The first is the creation <strong>of</strong> a phylogenetic tree for evolutionary<br />
studies. The second is to allow more sensitive database searches using a representation that incorporates the pattern <strong>of</strong> substitution<br />
seen in the multiple sequence alignment to allow researchers to find more highly diverged homologous sequences.<br />
Pattern identification<br />
Pattern identification has been developed to identify small, unique sections <strong>of</strong> the several unaligned sequences. Often, these<br />
contiguous regions <strong>of</strong> conserved residues, called motifs, are important for molecular interactions, such as regulatory regions or<br />
binding sites. Thus, motifs are <strong>of</strong>ten essential for the correct functioning <strong>of</strong> the molecule.<br />
The classic example <strong>of</strong> pattern identification is to collect DNA sequences from the region just upstream (on the 5′ side) <strong>of</strong> the<br />
coding region <strong>of</strong> a gene <strong>and</strong> examine these for a conserved pattern <strong>of</strong> nucleotides (Sadler et al. 1983) involved in regulating the<br />
transcription <strong>of</strong> the genes. What distinguishes this problem from global multiple sequence alignment is that outside the conserved<br />
patterns there is no expectation that the sequence is conserved <strong>and</strong> thus alignable.<br />
Modern pattern identification programs (Bailey & Elkan 1994) make use <strong>of</strong> a modern statistical process designed to deal with<br />
the fact that we do not know where the patterns are located (expectation maximization) <strong>and</strong> a sophisticated sampling routine<br />
(stochastic sampling) that reduces the number <strong>of</strong> combinations that must be tried.<br />
Other techniques<br />
As the biomedical sciences have exp<strong>and</strong>ed their repertoire <strong>of</strong> research methods <strong>and</strong> the kinds <strong>of</strong> data that can be collected, the<br />
field <strong>of</strong> bioinformatics has created techniques for dealing with these new kinds <strong>of</strong> data. The advent <strong>of</strong> the complete genome<br />
sequences for many organisms has been accompanied by s<strong>of</strong>tware to allow the manipulation, annotation, analysis, <strong>and</strong> comparison<br />
<strong>of</strong> these large sequences. Complex mathematical models <strong>of</strong> genes try to find <strong>and</strong> identify all <strong>of</strong> the genes in each genome<br />
(Rogic et al. 2001).<br />
Techniques have been developed to measure the change in expression for cDNA (microarrays) or the amount <strong>of</strong> proteins in<br />
cells over time or between mutants <strong>and</strong> wild types. It is not unusual for a research group to monitor thous<strong>and</strong>s <strong>of</strong> molecules simultaneously;<br />
looking for either increases or decreases in the relative levels between the st<strong>and</strong>ard <strong>and</strong> the state under investigation.<br />
These large-scale experiments are being analyzed with a number <strong>of</strong> statistical techniques (Wetzel et al. 2000) such as analysis <strong>of</strong><br />
variance, which produces a statistical model <strong>of</strong> the changes observed (Kerr & Churchill 2001). Other researchers are using multivariate<br />
statistical techniques to identify which molecules vary their presence in a coordinated manner in response to changing<br />
conditions. Interestingly, a number <strong>of</strong> these techniques were first developed many years ago to study the factors influencing crop<br />
growth.<br />
Conclusion<br />
Ultimately, though, the field <strong>of</strong> bioinformatics does have some general themes that should continue to run throughout it in the<br />
future. First, the bioinformaticist tool chest is not complete – the tool chest <strong>of</strong> tomorrow will have only minimal relationship to<br />
today’s set <strong>of</strong> tools, with better <strong>and</strong> more sensitive tools continuing to be developed. Second, the numbers <strong>and</strong> types <strong>of</strong> databases<br />
<strong>of</strong> experimental data will continue to exp<strong>and</strong> at an alarming rate. The majority <strong>of</strong> the databases will be developed to describe one<br />
type <strong>of</strong> experimental data, like sequence data or microarray data, with only minimal references or consistencies (vocabulary) to<br />
the other databases. Third is that diverse data must be integrated across ranges <strong>of</strong> scale, both temporal <strong>and</strong> spatial. For example, a<br />
single point mutation in a mouse might cause kidney deformations that result in blood chemistry being incorrect. Thus, you have<br />
a single point mutation causing effects at the cellular <strong>and</strong> organ levels. Biological scientists must learn the techniques necessary to<br />
manage <strong>and</strong> make use <strong>of</strong> the new data resources that their research is creating.<br />
References<br />
Bailey, T.L., <strong>and</strong> C. Elkan. 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers.<br />
In: ISMB-94: Proceedings <strong>of</strong> the Second International Conference on Intelligent Systems for Molecular Biology (Altman, R.,<br />
D. Brutlag, P. Karp, R. Lathrop, <strong>and</strong> D. Searls, eds), pp. 28–36. AAAI Press, Menlo Park, CA.<br />
Crick, F.H.C. 1957. The structure <strong>of</strong> nucleic acids <strong>and</strong> their role in protein sysnthesis. In: Biochemical Society Symposium, No. 14<br />
(Crook, E.M., ed.), pp. 26–36. Cambridge University Press, Cambridge, UK.