09.12.2012 Views

Principles of Plant Genetics and Breeding

Principles of Plant Genetics and Breeding

Principles of Plant Genetics and Breeding

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

242 CHAPTER 14<br />

position or combination <strong>of</strong> positions discriminate one subset from the others? Using a number <strong>of</strong> statistical techniques (e.g.,<br />

multivariate analysis, group theory), they were able to develop a model for how tRNA sequences allowed the aaRS enzymes to<br />

identify the correct tRNA molecules <strong>and</strong> reject the incorrect ones (McClain 1995). Experimental verification (started prior to the<br />

computational work in one case) <strong>of</strong> these results were published subsequently (Hou & Schimmel 1988; McClain & Foss 1988).<br />

Additional use <strong>of</strong> the multiple sequence alignment<br />

In addition to the above analysis <strong>of</strong> a well-crafted multiple sequence alignment, there are two additional areas that can use<br />

the information contained within a multiple sequence alignment. The first is the creation <strong>of</strong> a phylogenetic tree for evolutionary<br />

studies. The second is to allow more sensitive database searches using a representation that incorporates the pattern <strong>of</strong> substitution<br />

seen in the multiple sequence alignment to allow researchers to find more highly diverged homologous sequences.<br />

Pattern identification<br />

Pattern identification has been developed to identify small, unique sections <strong>of</strong> the several unaligned sequences. Often, these<br />

contiguous regions <strong>of</strong> conserved residues, called motifs, are important for molecular interactions, such as regulatory regions or<br />

binding sites. Thus, motifs are <strong>of</strong>ten essential for the correct functioning <strong>of</strong> the molecule.<br />

The classic example <strong>of</strong> pattern identification is to collect DNA sequences from the region just upstream (on the 5′ side) <strong>of</strong> the<br />

coding region <strong>of</strong> a gene <strong>and</strong> examine these for a conserved pattern <strong>of</strong> nucleotides (Sadler et al. 1983) involved in regulating the<br />

transcription <strong>of</strong> the genes. What distinguishes this problem from global multiple sequence alignment is that outside the conserved<br />

patterns there is no expectation that the sequence is conserved <strong>and</strong> thus alignable.<br />

Modern pattern identification programs (Bailey & Elkan 1994) make use <strong>of</strong> a modern statistical process designed to deal with<br />

the fact that we do not know where the patterns are located (expectation maximization) <strong>and</strong> a sophisticated sampling routine<br />

(stochastic sampling) that reduces the number <strong>of</strong> combinations that must be tried.<br />

Other techniques<br />

As the biomedical sciences have exp<strong>and</strong>ed their repertoire <strong>of</strong> research methods <strong>and</strong> the kinds <strong>of</strong> data that can be collected, the<br />

field <strong>of</strong> bioinformatics has created techniques for dealing with these new kinds <strong>of</strong> data. The advent <strong>of</strong> the complete genome<br />

sequences for many organisms has been accompanied by s<strong>of</strong>tware to allow the manipulation, annotation, analysis, <strong>and</strong> comparison<br />

<strong>of</strong> these large sequences. Complex mathematical models <strong>of</strong> genes try to find <strong>and</strong> identify all <strong>of</strong> the genes in each genome<br />

(Rogic et al. 2001).<br />

Techniques have been developed to measure the change in expression for cDNA (microarrays) or the amount <strong>of</strong> proteins in<br />

cells over time or between mutants <strong>and</strong> wild types. It is not unusual for a research group to monitor thous<strong>and</strong>s <strong>of</strong> molecules simultaneously;<br />

looking for either increases or decreases in the relative levels between the st<strong>and</strong>ard <strong>and</strong> the state under investigation.<br />

These large-scale experiments are being analyzed with a number <strong>of</strong> statistical techniques (Wetzel et al. 2000) such as analysis <strong>of</strong><br />

variance, which produces a statistical model <strong>of</strong> the changes observed (Kerr & Churchill 2001). Other researchers are using multivariate<br />

statistical techniques to identify which molecules vary their presence in a coordinated manner in response to changing<br />

conditions. Interestingly, a number <strong>of</strong> these techniques were first developed many years ago to study the factors influencing crop<br />

growth.<br />

Conclusion<br />

Ultimately, though, the field <strong>of</strong> bioinformatics does have some general themes that should continue to run throughout it in the<br />

future. First, the bioinformaticist tool chest is not complete – the tool chest <strong>of</strong> tomorrow will have only minimal relationship to<br />

today’s set <strong>of</strong> tools, with better <strong>and</strong> more sensitive tools continuing to be developed. Second, the numbers <strong>and</strong> types <strong>of</strong> databases<br />

<strong>of</strong> experimental data will continue to exp<strong>and</strong> at an alarming rate. The majority <strong>of</strong> the databases will be developed to describe one<br />

type <strong>of</strong> experimental data, like sequence data or microarray data, with only minimal references or consistencies (vocabulary) to<br />

the other databases. Third is that diverse data must be integrated across ranges <strong>of</strong> scale, both temporal <strong>and</strong> spatial. For example, a<br />

single point mutation in a mouse might cause kidney deformations that result in blood chemistry being incorrect. Thus, you have<br />

a single point mutation causing effects at the cellular <strong>and</strong> organ levels. Biological scientists must learn the techniques necessary to<br />

manage <strong>and</strong> make use <strong>of</strong> the new data resources that their research is creating.<br />

References<br />

Bailey, T.L., <strong>and</strong> C. Elkan. 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers.<br />

In: ISMB-94: Proceedings <strong>of</strong> the Second International Conference on Intelligent Systems for Molecular Biology (Altman, R.,<br />

D. Brutlag, P. Karp, R. Lathrop, <strong>and</strong> D. Searls, eds), pp. 28–36. AAAI Press, Menlo Park, CA.<br />

Crick, F.H.C. 1957. The structure <strong>of</strong> nucleic acids <strong>and</strong> their role in protein sysnthesis. In: Biochemical Society Symposium, No. 14<br />

(Crook, E.M., ed.), pp. 26–36. Cambridge University Press, Cambridge, UK.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!