Lecture 2: Describing Microbial Diversity: the ... - MCD Biology
Lecture 2: Describing Microbial Diversity: the ... - MCD Biology
Lecture 2: Describing Microbial Diversity: the ... - MCD Biology
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Lecture</strong> 2: <strong>Describing</strong> <strong>Microbial</strong> <strong>Diversity</strong>: <strong>the</strong> Changing Paradigm<br />
----------------<br />
1<br />
General reading: S&F 638-648 (rudimentary phylogeny);<br />
Pace, “<strong>Microbial</strong> <strong>Diversity</strong> and <strong>the</strong> Biosphere,” Science 276:734-740 (1997) (Website);<br />
Pace, “Mapping <strong>the</strong> Tree of Life: Progress and Prospects,” Microbiol. Mol. Biol. Rev. 73:565-576 (2009)<br />
(Website).<br />
Baum, D.A., S.D. Smith and S.S. Donovan, “The Tree Thinking Challenge,” Science 310:979-980 (2005)<br />
(with supporting info quiz, Website).<br />
1. Traditional taxonomy (classification) of microbes is in a mess -- more on this later -- but at this stage we are<br />
stuck with it for traditional reasons.<br />
The traditional (Linnaean) classifications allocate organisms into “taxa” (s. taxon):<br />
kingdom/phylum (division)/class/order/family/genus/species<br />
However, <strong>the</strong>se boundaries fall apart in <strong>the</strong> complexity of microbial diversity (and most phyla outside of animals;<br />
“species” even fails in plants).<br />
A. A “natural” taxonomy would be based on evolutionary relatedness:
2<br />
Thus, organisms in same “genus” (a collection of “species”) would have similar properties in a fundamental<br />
sense; <strong>the</strong> representatives of a “species” are expected to share most properties with o<strong>the</strong>r representatives of<br />
that species, perhaps with specialized gimmicks (“subspecies”, serovars, etc.).<br />
1. A relatedness group at any level is a “clade”, a natural (phylogenetic) grouping. Representatives of a<br />
clade are more closely related to one ano<strong>the</strong>r than to any “outgroup” (non-clade) organism. “Phylotype” would<br />
refer to <strong>the</strong> collection of organisms that make-up <strong>the</strong> clade, <strong>the</strong> phylogenetic type.<br />
2. The question of what is a microbial “species” has recently become really complex with <strong>the</strong> revelation of<br />
<strong>the</strong> concept of “pangenome” (text, p. 651-2). More later on this important topic, one issue in <strong>the</strong> description of<br />
“microbial diversity”.<br />
B. A natural taxonomy of macrobes has long been possible:<br />
Large organisms have many easily distinguished features, e.g. morphology, body-plans and developmental<br />
processes, that can be used to describe hierarchies of relatedness.<br />
C. Microbes usually have few distinguishing properties that relate <strong>the</strong>m, so a hierarchical taxonomy mainly was<br />
not possible until molecular comparisons (not only sequence) came along.<br />
2. Recent advances in molecular phylogeny have changed <strong>the</strong> picture a lot: we now have a relatively non-<br />
subjective, semi-quantitative way to view “biodiversity”, in <strong>the</strong> context of phylogenetic “maps” - evolutionary trees.
3<br />
A. Slowly evolving molecules (e.g. rRNA) used for large-scale structure; more rapidly evolving (“fast-clock”)<br />
molecules for fine-structure.<br />
B. But always remember: an organism is much more than one gene; rRNA tells you about <strong>the</strong> evolution of<br />
<strong>the</strong> genetic and basic cellular machinery, <strong>the</strong> equivalent of <strong>the</strong> “body plan” sought by <strong>the</strong> early<br />
evolutionists.<br />
3. A culmination is “The Big Tree” - a molecular coordinate system based on rRNA that relates all of life - below.<br />
A. BUT, <strong>the</strong> literature, language (e.g. “species”) and formal nomenclature, of biology however, remain solidly<br />
rooted in <strong>the</strong> tradition of Linnaeus at this time. (You have to call <strong>the</strong>m something!)<br />
B. First, an overview of phylogenetic perspective on microbial (indeed, biological) diversity, <strong>the</strong>n a brief look at<br />
o<strong>the</strong>r methods used to characterize microbes.<br />
PHYLOGENETIC DIVERSITY<br />
1. Phylogenetic relationships provide a “natural classification” of organisms (Darwin’s dream – “Our classifications<br />
will come to be, as far as <strong>the</strong>y can be so made, genealogies.”): Phylogenetic relationships are <strong>the</strong> only way to
4<br />
understand <strong>the</strong> evolutionary process. But, you need a metric – some property that provides a quantitative<br />
measure of <strong>the</strong> extent of relatedness of different organisms.<br />
A. Note that phylogenetic relationships can be “predictive”, not only “descriptive”: We can predict (some)<br />
properties of organisms based on <strong>the</strong> properties of <strong>the</strong>ir relatives – <strong>the</strong> principle: representatives of a<br />
particular phylogenetic group are expected to have <strong>the</strong> properties that are common to <strong>the</strong> group.<br />
e.g. -- We know a lot about Escherichia coli, but relatively little about Chromatium vinosum:<br />
1) E. coli is a chemoheterotroph, whereas C. vinosum is a photoheterotroph. (What do <strong>the</strong>se terms<br />
mean?) Classical physiological treatment would have considered <strong>the</strong>se organisms wildly different. From<br />
gene-sequence comparisons we know <strong>the</strong>y are fairly close relatives (!-group of proteobacteria).<br />
2) Thus, even though one eats glucose and <strong>the</strong> o<strong>the</strong>r light, we can predict that <strong>the</strong> biochemical underpinnings<br />
are similar -- protein-syn<strong>the</strong>sizing machinery (antibiotic-sensitivity), DNA replication machinery, amino acid-<br />
syn<strong>the</strong>sizing machinery, nucleotide biosyn<strong>the</strong>sis, etc. Because of <strong>the</strong> close phylogenetic relationship, we<br />
expect to be able to swap many genes between <strong>the</strong>se two superficially disparate organisms.<br />
3) The predictive value of phylogeny is essential for interpreting environmental sequence surveys.
5<br />
B. Phylogenetic perspective allows rational selection of model systems -- sometimes you need a model organism<br />
to gain perspective on a more difficult system.<br />
1) e.g. use of a model non-pathogen can provide safe study of a pathogen -- if it is a close relative. (What<br />
means pathogen??)<br />
e.g. Bacillus cereus and Bacillus anthracis: same organism except that <strong>the</strong> latter contains a few<br />
pretty nasty genes for animals.<br />
2) Or can provide a simple system for a complex one -- e.g. for plants, what to use as a model? Chlorella?<br />
Chlamydomonas? Euglena?<br />
Molecular phylogenetic studies say Chlorella or Chlamydomonas, both of which are in <strong>the</strong> plant relatedness<br />
group (“clade”) -- Euglena is a trypanosome!<br />
3) e.g. if you want to study <strong>the</strong> detail of an uncultivatable symbiont, identify a cultivatable free-living form. E.g.<br />
<strong>the</strong> Riftia symbiont – how to choose <strong>the</strong> model?<br />
C. Phylogenetic perspective even on large organisms is quite recent -- ~150 years -- and on microbes only ca. 30<br />
years. Many microbiology texts don’t have it and many (even most) general biology texts get it wrong! (Although<br />
it’s getting better.)
6<br />
A Bit on <strong>the</strong> Evolution of Evolutionary Thought<br />
2. Prior to <strong>the</strong> late 19th century, <strong>the</strong> concept of evolution was on <strong>the</strong> “evolutionary ladder”:<br />
Man<br />
"<br />
Apes<br />
"<br />
Marsupials<br />
"<br />
Reptiles<br />
"<br />
Amphibia<br />
"<br />
Fish<br />
"<br />
Invertebrates<br />
"<br />
Plants<br />
"<br />
Fungi<br />
"<br />
Leewenhoek’s “animacules”
7<br />
Indeed, <strong>the</strong> lexicon of biology still deals in “higher and lower” eucaryotes (I try not to use <strong>the</strong>se terms -- <strong>the</strong>y are<br />
dumb), “missing links” (no such thing exists), and “primitive” organisms (no such thing today).<br />
A. In its milieu, E. coli is as highly evolved as are we. E. coli is simple (~5#106 bp genome), we are complex<br />
(~3#109 bps); complexity has nothing to do with “evolutionary advancement”.<br />
B. Lineages evolve by diversification, "radiation", not “progression”. (!!)<br />
C. There is no such thing as a “primitive” organism alive today. Simple, yes, but still a finely honed product of 4<br />
billion years under <strong>the</strong> selective hammer of <strong>the</strong> niches that it and its progenitors have occupied.
3. By <strong>the</strong> late 1800s <strong>the</strong> concept of “evolutionary trees” was on <strong>the</strong><br />
8<br />
table -- e.g. Ernst Haeckel, 1866 (Note that Darwin’s “Origin of<br />
Species” was first presented in 1858).<br />
Note “kingdoms” of Plants, Animals, Protists (non-plants and<br />
animals, mostly microbial), and “monera” (“procaryotes”) at <strong>the</strong><br />
base.<br />
4. The conceptual basis for biological diversity was pretty much stalled<br />
at <strong>the</strong> Haeckel stage for <strong>the</strong> next century -- and still is in many/most<br />
general texts of biology. One current articulation is <strong>the</strong> “five<br />
kingdoms of life”, here taken from <strong>the</strong> 1969 Science lead article that canonized 5-kingdoms; <strong>the</strong>re are many<br />
popular versions of <strong>the</strong> five-kingdoms notion, which dominates textbooks currently.
9<br />
A. Still, “monera” (mostly termed “procaryotes” by this time) at <strong>the</strong> origin and progression up a ladder of sorts. But<br />
it still is <strong>the</strong> Haeckel tree in essence.<br />
B. Note much o<strong>the</strong>r subjectivity -- e.g. why do “fungi” get to be a “kingdom”, and not e.g. Alveolates (inc. ciliates,<br />
dinoflagellates) or Stramenopiles (inc. diatoms, brown algae, oomycetes)? [I guess because mushrooms are<br />
large, sometimes?]<br />
C. Note some subtleties by this time:
10<br />
1) Chloroplasts recognized as derived from "Blue-Green algae" (now “cyanobacteria” – one of ~100 bacterial<br />
“phyla” (“divisions”, deepest clades in bacterial domain).<br />
2) Mitochondria were thought probably derived from some sort of bacteria.<br />
Note that <strong>the</strong> “endosymbiotic” origins for <strong>the</strong> organelles had been in <strong>the</strong> air since <strong>the</strong> 19th century -- for<br />
comprehensive overview of this history (and controversy) see J. Sapp, “Evolution by Association: A History of<br />
Symbiosis” (Oxford, 1994).<br />
D. Still, <strong>the</strong>re are many problems with this Five-kingdoms story:<br />
1) Relationships among microbes, both “procaryote” and eucaryote were speculative, at best<br />
2) There were no criteria to relate organisms between “kingdoms” (even between e.g. phyla of animals)-- a<br />
universal phylogeny was impossible.<br />
3) Implicit timeline remained -- “procaryotes”, “protists” “primitive.”
11<br />
4) Supposition that <strong>the</strong> eucaryotic nucleus was derived from a “procaryote” progenitor turned out to be<br />
fundamentally incorrect (<strong>the</strong> often-cited date in textbooks, and even some of <strong>the</strong> current scientific literature, of<br />
1.5 billion years ago for <strong>the</strong> origin of eucaryotes is utter B.S.) -<br />
5) Studies in “molecular phylogeny” over <strong>the</strong> past two decades have changed <strong>the</strong> “paradigm” significantly. (ala<br />
Thomas Kuhn -- “The Structure of Scientific Revolutions”)<br />
E. By comparing macromolecular sequences, one can extract evolutionary relationships -- “evolutionary distances”<br />
-- between organisms. The basic notion is that sequence change = evolutionary distance.<br />
5. The goal of molecular phylogeny is to relate molecules (hence in principle organisms) quantitatively, so as to<br />
reconstruct <strong>the</strong>ir evolutionary histories, e.g. as a “phylogenetic tree.”<br />
A. There are many ways to "relate" molecules. Some subjective ways are:<br />
e.g. immunologically (fractional gross reactivity)<br />
e.g. DNA-DNA “heterologous hybridization” (more below)<br />
But <strong>the</strong>se are difficult (impossible) to quantify precisely in terms of relationships.
12<br />
B. The best way is by direct comparison of sequences of nucleic acids or proteins. This provides “precise”<br />
numbers for defining relationships between molecules -- and, ideally, organisms.<br />
6. For “orthologous” (of common ancestry and function) nucleic acid (or protein) sequences:<br />
A. Consider:<br />
--Organism X • • • AGCUGCCAGU • • •<br />
X XX<br />
--Organism Y • • • AACCCCCAGU • • •<br />
"DNA OR RNA?<br />
Sequence X is 70% identical to Sequence Y,<br />
Fractional identity is 0.7<br />
Fractional difference is 0.3 (1-0.7)<br />
1) Note <strong>the</strong> terms: “homologous” = of common ancestry; “orthologous” = of common ancestry and function.<br />
2) Note that <strong>the</strong> term “homology” is commonly used incorrectly when “identity” is meant. Note that<br />
nucleotide sequences are not “##% similar”, <strong>the</strong>y are “##% identical”; protein seqs, on <strong>the</strong> o<strong>the</strong>r hand, can<br />
be “similar”. (How is that?)
13<br />
3) You cannot meaningfully compare sequences unless <strong>the</strong>y are "homologous" -- of common ancestry.<br />
Homologous sequences are not necessarily identical – indeed probably aren’t in different organisms;<br />
identical sequences are not necessarily homologous, particularly if short (e.g. promoters, translation<br />
punctuation, etc.).<br />
B. Align sequences (big deal – more later) and do difference (1-identity) count for all pairs of organisms<br />
considered. This difference count is a measure of <strong>the</strong> extent of evolution - evolutionary distance - separating <strong>the</strong><br />
pairs of organisms.<br />
e.g. with organisms A,B,C,D and E:<br />
C. To build relationships, construct a “difference matrix” for organisms A-E:
14<br />
A B C D E<br />
A $ 0.1 0.2 0.2 0.4 Fractional Difference<br />
B 0.9 $ 0.2 0.2 0.4<br />
C 0.8 0.8 $ 0.1 0.4<br />
D 0.8 0.8 0.9 $ 0.4<br />
E 0.6 0.6 0.6 0.6 $<br />
Fractional Identity<br />
Can relate in a “tree”- like figure, a “dendrogram:”
Note that organism-to-node<br />
“distance” is 1/2 of organism-<br />
to-organism “distance<br />
15<br />
Note that this (or any o<strong>the</strong>r) tree is a single dimension -- you can rotate around any node and have <strong>the</strong> same<br />
topology in an evolutionary sense.<br />
1) It is common to see sequence-divergence presented in terms of time, but this is not legitimate unless you<br />
have a fossil record with which to calibrate <strong>the</strong> line segment-lengths. “Clock speeds” of organisms vary and <strong>the</strong><br />
evolutionary clock (rate of change) is not constant, contrary to common supposition<br />
2) The root of <strong>the</strong> tree, <strong>the</strong> line representing <strong>the</strong> common ancestral line, may or may not be <strong>the</strong> “deepest”<br />
branch point: Most properly <strong>the</strong> tree should be drawn:
16<br />
OR<br />
3) E, <strong>the</strong> “outgroup”, "roots" A/B/C/D
17<br />
Vignette – Tree-reading –<br />
Baum, D.A., S.D. Smith and S.S. Donovan, “The Tree Thinking Challenge,” Science 310:979-980 (2005) (with<br />
supporting info quiz).<br />
1. Remember – phylogenetic trees have only a single dimension – along line segments.<br />
A few trees from Baum et al.:
19<br />
End Vignette<br />
------------------------------------------------------------------------------------<br />
7. What molecule(s) to use for molecular phylogeny of organisms for mapping <strong>the</strong> Tree of Life?
20<br />
A. Doesn’t really matter so long as:<br />
1. “Homologous” gene occurs in all organisms considered<br />
a. More specifically, you need to know that <strong>the</strong> genes are “orthologs,” not “paralogs”<br />
b. “Orthologs” share ancestry and retain <strong>the</strong> same function in <strong>the</strong> different organisms<br />
c. “Paralogs” result from an ancestral duplication, with potentially different functions taken-on subsequent<br />
to <strong>the</strong> duplication – duplications produce “gene families”.<br />
d. For instance, <strong>the</strong> % and & globins are a gene family; <strong>the</strong>y have ancient common ancestry -- <strong>the</strong> % - type<br />
and & - type globins have evolved independently since <strong>the</strong> ancestral duplication:<br />
% - globins are orthologs; & - globins are orthologs.<br />
% and & globins are paralogs<br />
e. The tree of <strong>the</strong> gene family would look like:
21<br />
f. If you did not keep your orthologs and paralogs straight (sometimes a tough call) when you build <strong>the</strong><br />
dataset, you might get some most unexpected (and incorrect) trees, e.g.: You miss <strong>the</strong> true topology with <strong>the</strong><br />
restricted data set.<br />
better.<br />
Human %-globin<br />
Mouse %-globin<br />
Frog %-globin<br />
Human &-globin<br />
Mouse &-globin<br />
Frog &-globin<br />
Human (%-globin)<br />
Frog (%-globin)<br />
Mouse (&-globin)<br />
2. You need a sufficient number of nucleotides (or amino acids) to be statistically significant -- more is always
22<br />
3. Changes must span <strong>the</strong> evolutionary distance inspected -- i.e., compared sequences must not be<br />
randomized.<br />
4. No lateral transfer -- <strong>the</strong> evolution of <strong>the</strong> gene must reflect <strong>the</strong> evolution of <strong>the</strong> cells considered.<br />
a. Genes that are known to be derived from lateral transfer are called “xenologs.” If you are interested<br />
in metabolic genes, <strong>the</strong>re is a good chance, at least among bacteria, that you are dealing with<br />
genes or pathways that potentially can move between different organisms.<br />
b. e.g. penicillinase and o<strong>the</strong>r commonly transferred antibiotic resistance genes (scary!).<br />
c. e.g. Rhizobium symbiotic (leguminous plants) N2-fixation.<br />
5. Note <strong>the</strong> evident impact of lateral transfers throughout evolution. Much of microbial physiological diversity<br />
(probably) is dependent on laterally transferred genes.<br />
a. Note also that portions of genes can transfer (e.g. with “two- component” systems) so that<br />
“homologous” blocks of sequence can show-up in functionally unrelated genes. Formation of<br />
intralineage "gene families" also results in mixing-up functional modules of macromolecules.<br />
b. How to detect xenologs?
23<br />
B. When “molecular phylogeny” first got started, considerable work was done with protein sequences, e.g.<br />
cytochrome C, hemoglobin.<br />
1. But proteins are hard to get and sequence; it is now easier to isolate/sequence genes.<br />
2. Most protein genes are “shallow” clocks, <strong>the</strong> result of relatively recent evolution; e.g. E. coli doesn’t have<br />
hemoglobin.<br />
8. Choice of molecules for comprehensive (all organisms) phylogeny -- ribosomal RNAs (rRNAs).<br />
A. Ribosome -- carries out protein syn<strong>the</strong>sis<br />
Small subunit Large subunit<br />
S L “23S” rRNA (LSU): 3000-5000nt<br />
“16S”-“18S” (SSU) rRNA: 1500-2000nt “5S” rRNA - 120 nt<br />
ca. 25 proteins ca. 30-40 proteins<br />
B. rRNAs present in all organisms and <strong>the</strong> major organelles (mitochondria and chloroplasts).
24<br />
C. Highly conserved throughout evolution; e.g., ca. 50% identity between E. coli and human SSU rRNAs over<br />
alignable nt<br />
1. Length variation between molecules may cause problems; must consider only “homologous” nt (<strong>the</strong><br />
alignment problem - more later).<br />
2. Note that rRNA, while useful for distantly related organisms, is not good for establishing (resolving) close<br />
relationships: it is too conservative. Consequently, close relatives have too few differences to be reliable.<br />
e.g. human/apes/mice have <strong>the</strong> same rRNA seqs.<br />
e.g. E. coli and Yersinia pestis (causes Black Plague) have essentially <strong>the</strong> same rRNA seqs.<br />
D. The rRNA genes are large enough for reasonable statistics (more later on this).<br />
E. First used for all-life phylogenetic trees by Carl Woese (University of Illinois).<br />
For historical overview; Pace et al. “Phylogeny and beyond: scientific, historical and conceptual significance<br />
of <strong>the</strong> first tree of life.” Proc. Natl. Acad. Sci. 109:1011-1018, 2012.<br />
9. Given sequences of multiple organisms (>2 million now available, but spotty representation of diverse phyla –<br />
MMBR reference):<br />
$Align seqs., count number of differences -- is some measure of evolutionary change/distance
25<br />
$”Correct” for multiple and back mutations – <strong>the</strong> number of changes you count is always less than <strong>the</strong> number of<br />
mutations that have occurred.<br />
$Computer-fit pairwise “evolutionary distances” to best-fit overall tree topology.<br />
(More detail on all this below)<br />
10. The “Big Tree” emerges: Outlines first seen by Woese in 1977.<br />
A. Note that Woese did not have tree-building methods now available. In fact, he did not have full sequences,<br />
only short “signature” oligonucleotides (more on “signatures” later.
B. In essence, <strong>the</strong> tree is a quantitative map of evolutionary relatedness, a comprehensive map of biological<br />
26<br />
diversity - - rRNA “sequence space”.
C. Indeed, <strong>the</strong> tree is a quantitative estimate – a metric - for that slippery concept, amount of evolution. The<br />
“amount of biological diversity” (for any particular molecule) might be <strong>the</strong> summation of all unique line segment-<br />
lengths in a comprehensive tree.<br />
11.Some lessons from <strong>the</strong> Big Tree:<br />
27<br />
A. There was a single origin for terrestrial type of life -- all life forms are related.<br />
B. Three “primary lines of evolutionary descent” -- ”Domains” -- “ur-kingdoms”: Eucarya (eucaryotes),<br />
Bacteria, Archaea (originally called “archaebacteria”, but <strong>the</strong> name was changed when it became clear that<br />
<strong>the</strong> things aren’t bacteria.)<br />
1. Sometimes see referred to as “kingdoms,” but usage in this context is probably not a good idea -- too<br />
historically loaded.<br />
2. You can inject "time" into tree, but sequence change is not necessarily linear with time - indeed, probably it<br />
usually isn't.
28<br />
Bacteria<br />
C. The eucaryote nuclear line of descent is as old as <strong>the</strong> archaeal line – eucaryotes have been around since<br />
<strong>the</strong> beginning<br />
Archaea<br />
Eucarya<br />
"Time"<br />
2) The term “procaryote” is inappropriate in <strong>the</strong> light of <strong>the</strong> relationships – <strong>the</strong>re isn’t any such group as<br />
“prokaryote”. More on this issue later.<br />
D. Are <strong>the</strong>re still more domain-level divergences to be discovered??<br />
E. Note that lines connecting organisms to nodes are not all <strong>the</strong> same length -- <strong>the</strong> evolutionary clock is not<br />
constant between different lineages (e.g. Haloferax vs. Methanopyrus, Aquifex vs. Bacillus, Eucarya in general vs.<br />
any representative of Archaea or Bacteria)
29<br />
1) The rate of evolution is not even necessarily <strong>the</strong> same for a particular lineage at all stages in <strong>the</strong> evolution<br />
of <strong>the</strong> line, e.g. Agrobacterium vs. mitochondrion<br />
2) Note domain-level tendencies:<br />
Eucarya -- fast clocks<br />
Archaea -- slow clocks<br />
Bacteria -- intermediate rates of evolution<br />
3) Because of variable rates, estimating time from sequence change is chancy--even fatuous--without some<br />
sort of calibration, a correlation between <strong>the</strong> Tree and some datable geological event.<br />
F. Note that <strong>the</strong> phylogenetic space occupied by multicellular eucaryotes is shallow and limited, but enormously<br />
diverse in morphological (less biochemical) phenotype. Is this a consequence of large, highly plastic<br />
genomes?<br />
1) Note, however, that <strong>the</strong> “typical” eucaryote is microbial and has a small genome, e.g. Saccharomyces<br />
cerevisiae at 13.5 # 106 bps. (vs, E. coli at ~4.2 # 106 bps; Calothrix [a cyanobacterium] at ~12.5 # 106 bps;<br />
human at ~3.2 # 109 (mostly garbage?) bps; Methanococcus jannaschii at ~1.7 # 106 bps.)<br />
G. The rRNA (and o<strong>the</strong>r molecular) data prove that mitochondria and chloroplasts were of bacterial origin (<strong>the</strong><br />
bacterial phyla [“divisions”] “proteobacteria” and “cyanobacteria,” respectively).
30<br />
H. Note how deeply divergent are Giardia, Trichomonas and Vairimorpha in <strong>the</strong> eucaryotic line. These<br />
organisms lack mitochondria, so may have diverged from <strong>the</strong> main eucaryal line of descent before <strong>the</strong> mitos<br />
came in.<br />
1) It turns out <strong>the</strong>y, like all eucaryotes have some bacterial (and archaeal) genes, but it is not clear where<br />
or when <strong>the</strong>y got <strong>the</strong>m. Some folks think this resulted from mitochondria import but this is questionable.<br />
12. The three-domain Big Tree is an “unrooted” tree -- you don’t know where is <strong>the</strong> ancestral node. You need<br />
an “outgroup” to “root” <strong>the</strong> tree, and a universal tree has no outgroup.<br />
A. However, <strong>the</strong> Tree can be rooted using “paralogs” that arose from duplication before <strong>the</strong> last common<br />
ancestor:<br />
e.g. translation factors EF-TU and EF-G<br />
ATP synthase subunits % and &<br />
tRNAs met-initiator and met-elongator<br />
These paralogs occur in all three domains, so presumably arose before <strong>the</strong> last common ancestor. Each<br />
yields <strong>the</strong> 3-domain tree upon analysis, so you can use tree with one paralog to root <strong>the</strong> tree with <strong>the</strong> o<strong>the</strong>r.<br />
(You will do this in <strong>the</strong> Mol Phy Workshop.) All concur that <strong>the</strong> relationships are:
31<br />
Archaea<br />
Eucarya<br />
Bacteria<br />
Archaea<br />
Eucarya<br />
Bacteria<br />
This indicates that <strong>the</strong> root of <strong>the</strong> Big Tree is (presumably deep) on <strong>the</strong> bacterial line of descent.<br />
B. This means also that Eucarya and Archaea shared common history after divergence from Bacteria<br />
1) This explains many similarities between archaeal and eucaryal basal machineries.<br />
e.g. similar transcription machineries; Archaea and Eucarya use TATA-binding proteins whereas Bacteria<br />
use ' factors for specification of transcription initiation.<br />
e.g. Archaeal and eucaryal DNA-syn<strong>the</strong>tic machineries are far more like one ano<strong>the</strong>r than ei<strong>the</strong>r is to<br />
bacteria (for overview, two recent review books are Garret and Klenk, “Archaea: Evolution, Physiology,<br />
and Molecular <strong>Biology</strong>”, 2007. Cavicchioli, Archaea: Molecular and Cellular <strong>Biology</strong>, 2007).<br />
13. Note that <strong>the</strong> Big Tree shown is a limited set of specific organisms: > 2 million SSU sequences are now<br />
available.<br />
A. Several databases of curated rRNA sequences e.g. RDP II, GreenGenes, SILVA, of course GenBank (raw<br />
sequence - not aligned).
32<br />
You can download trees, carry out functions, get programs, etc.<br />
14. A few domain-level trees for reference.<br />
15. Note recent expansion of known bacterial diversity (also next page)<br />
A. The lines indicate relatedness groups of bacteria, <strong>the</strong> phylogenetic “phyla” or “divisions” of bacteria (referred<br />
to as “kingdoms” by Woese). There is no formal taxonomic status of <strong>the</strong>se phyla at this time:
B. ca. 100 phyla identified so far by rRNA sequence. Only ca. 25 contain cultivated representatives (bold lines; non-<br />
bold have no cultivated representatives). Only ~8-10 have significant cultured representation.<br />
33<br />
1) Uncultivated organisms in <strong>the</strong> environment can be identified by obtaining rRNA genes without cultivation:<br />
environmental sample ( isolate total DNA ( clone rRNA genes( sequence<br />
(more later)<br />
You know that <strong>the</strong> organism is <strong>the</strong>re, and get some idea of abundance. How to get more information?<br />
2) Most of <strong>the</strong> environmental sequences that are abundant in <strong>the</strong> environment are poorly represented by<br />
cultivars, or not represented at all, e.g.:<br />
The Acidobacterium group contains only a few cultivars, but is very abundant in many environments.<br />
The“OP11” group is very abundant in anoxic environments at low and high temperatures, but has no<br />
cultivated representatives<br />
A tree with some names of <strong>the</strong> better-known groups is <strong>the</strong> following:
34<br />
C. Most of what we know about bacteria is based on studies of organisms representing only a few phyla:<br />
$ Proteobacteria (E. coli, Pseudomonas spp., “purple photosyn<strong>the</strong>tic bacteria”) - <strong>the</strong> classic "Gram<br />
negative" group.
35<br />
$ Firmicutes (aka “Low G + C Gram Positive bacteria” [Bacillus, Clostridium,<br />
Streptococcus, Staphylococcus, Lactobacillus)])<br />
$ Actinobacteria (aka “High G + C Gram Positive bacteria” [Streptomyces,<br />
Mycobacterium])<br />
$ Cyanobacteria<br />
D. Note <strong>the</strong> expansion in known bacterial diversity over <strong>the</strong> past few years!<br />
16. Archaea:<br />
A. Classically two groups (Crenarchaeota and Euryarchaeota) have cultivated representatives and recognition.<br />
1) Crenarchaeota: Most cultivated types are high-temperature, but uncultivated low-temp. types are<br />
abundant in <strong>the</strong> environment (detected by cloning rRNA and o<strong>the</strong>r genes – “metagenomics”)<br />
a) Name “cren-” from <strong>the</strong> Greek for spring or fount, referring to <strong>the</strong> ostensible similarity of such organisms<br />
to <strong>the</strong> earliest life (high temperature, using geo<strong>the</strong>rmal compounds for energy, e.g. H2 / S 0 -- more later)<br />
2) Euryarchaeota: methanogens, extreme halophiles, many heterotrophs (more later):<br />
a) Name from Greek “eury-” meaning “varied”, referring to variable<br />
phenotypes, compared to cultivated crenarchaeota.
36<br />
B. Environmental sequences have swamped cultured sequences in <strong>the</strong> archaea, and <strong>the</strong> structure of <strong>the</strong><br />
archaeal tree currently is in a state of flux
17. Eucarya (Eucaryotes):<br />
37
38<br />
A. Note that <strong>the</strong> microbial eucs are vastly more diverse than <strong>the</strong> popular three phyla (“kingdoms” --<br />
fungi, plants and animal).<br />
B. Note that <strong>the</strong> phylogeny of eucaryotes is a controversial (!!) place. rRNA always gives <strong>the</strong> same<br />
pic as above. Protein gene trees are sometimes taken to represent various topologies and <strong>the</strong>re is<br />
little agreement in <strong>the</strong> field regarding <strong>the</strong> deepest branchings. For instance, a tree from Baldauf et al.<br />
is <strong>the</strong> following:
18. The Large-scale Pic from <strong>the</strong> rRNA perspective is probably <strong>the</strong> least biased, but also is difficult to resolve at <strong>the</strong><br />
deepest branchings. What we see in <strong>the</strong> trees is biased by a lot to things, e.g. limited representation of known<br />
diversity, treeing methods and nuances, “clockspeed” effects, o<strong>the</strong>rs. See Pace 2009 for discussion.<br />
39
19. The three-domain tree is seen for all molecules in <strong>the</strong> central information-processing machinery (DNA, RNA,<br />
protein syn<strong>the</strong>sis), <strong>the</strong> “core” genes of genetic transfer.<br />
40<br />
A. When you consider molecules outside <strong>the</strong>se core functions, however, e.g. carbohydrate metabolism, amino<br />
acid metabolism, etc., relationships can become weird. The susceptibility of genes to transfer may depend<br />
on whe<strong>the</strong>r or not <strong>the</strong> gene product has to interact specifically with o<strong>the</strong>r cellular components; “stand-alone”<br />
gene products can transfer more readily if <strong>the</strong>y aren’t required to interact specifically with o<strong>the</strong>r cellular<br />
components. Some genes seem to have moved around in <strong>the</strong> very deep past, e.g. aminoacyl-tRNA<br />
synthases. In phylogenetic analysis of <strong>the</strong>se genes, some give “canonical” three-domain trees, while o<strong>the</strong>rs<br />
show evidence of transfers.<br />
B. E.g. <strong>the</strong> canonical pattern, as seen in <strong>the</strong> Leu RS:
C. O<strong>the</strong>rs are decidedly noncanonical, e.g. Ile RS:<br />
41
42<br />
D. Even enzymes in <strong>the</strong> same biosyn<strong>the</strong>tic pathway may have somewhat different evolutionary histories, e.g. in<br />
His biosyn<strong>the</strong>sis:
44<br />
Archaea (in bold), EUCARYA (IN CAPS)<br />
1) Numbers at nodes are “bootstrap” valves, <strong>the</strong> % of trees in multiple solutions that give that particular node<br />
(more later on “bootstrap analysis”).<br />
E. These “incongruencies” with <strong>the</strong> Big Tree are generally considered to be <strong>the</strong> results of “lateral transfer” of<br />
genes between <strong>the</strong> vertical lines of descent. For lots more discussion and <strong>the</strong> meaning in <strong>the</strong> larger context of<br />
biology see:<br />
Woese, Olsen, Soll. Aminoacyl-tRNA syn<strong>the</strong>tases , <strong>the</strong> genetic code and <strong>the</strong> evolutionary process.<br />
Microbiol. Molec. Biol. Rev. 64:202-236 (2000)<br />
20. What happened in evolution? It looks as though <strong>the</strong> “core genome,” reflected in e.g. rRNA, had genetic<br />
continuity throughout evolution. A lot of o<strong>the</strong>r things got scrambled – but not systematically.<br />
A. “Endosymbiosis” has involved more than organelles!<br />
B. Note that lateral transfer in general tends to be “phylogenetically local” and idiosyncratic. It can, however,<br />
have powerful influence in evolution (e.g. cyanobacterial photosyn<strong>the</strong>sis).<br />
21. Some scientists argue that <strong>the</strong> occurrence of lateral transfers screws up large-scale trees: e.g. Ford Doolittle:
Gary Olsen edits <strong>the</strong> Doolittle tree to a more realistic view:<br />
What lateral transfer really means phylogenetically is that if you want to track organismic lineages you need to be<br />
wary of xenologs (and paralogs!). Remember: Any molecular tree is just that, a tree of molecules, not organisms.<br />
22. Vignette: The prokaryote issue.<br />
45