30.07.2013 Views

Lecture 2: Describing Microbial Diversity: the ... - MCD Biology

Lecture 2: Describing Microbial Diversity: the ... - MCD Biology

Lecture 2: Describing Microbial Diversity: the ... - MCD Biology

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Lecture</strong> 2: <strong>Describing</strong> <strong>Microbial</strong> <strong>Diversity</strong>: <strong>the</strong> Changing Paradigm<br />

----------------<br />

1<br />

General reading: S&F 638-648 (rudimentary phylogeny);<br />

Pace, “<strong>Microbial</strong> <strong>Diversity</strong> and <strong>the</strong> Biosphere,” Science 276:734-740 (1997) (Website);<br />

Pace, “Mapping <strong>the</strong> Tree of Life: Progress and Prospects,” Microbiol. Mol. Biol. Rev. 73:565-576 (2009)<br />

(Website).<br />

Baum, D.A., S.D. Smith and S.S. Donovan, “The Tree Thinking Challenge,” Science 310:979-980 (2005)<br />

(with supporting info quiz, Website).<br />

1. Traditional taxonomy (classification) of microbes is in a mess -- more on this later -- but at this stage we are<br />

stuck with it for traditional reasons.<br />

The traditional (Linnaean) classifications allocate organisms into “taxa” (s. taxon):<br />

kingdom/phylum (division)/class/order/family/genus/species<br />

However, <strong>the</strong>se boundaries fall apart in <strong>the</strong> complexity of microbial diversity (and most phyla outside of animals;<br />

“species” even fails in plants).<br />

A. A “natural” taxonomy would be based on evolutionary relatedness:


2<br />

Thus, organisms in same “genus” (a collection of “species”) would have similar properties in a fundamental<br />

sense; <strong>the</strong> representatives of a “species” are expected to share most properties with o<strong>the</strong>r representatives of<br />

that species, perhaps with specialized gimmicks (“subspecies”, serovars, etc.).<br />

1. A relatedness group at any level is a “clade”, a natural (phylogenetic) grouping. Representatives of a<br />

clade are more closely related to one ano<strong>the</strong>r than to any “outgroup” (non-clade) organism. “Phylotype” would<br />

refer to <strong>the</strong> collection of organisms that make-up <strong>the</strong> clade, <strong>the</strong> phylogenetic type.<br />

2. The question of what is a microbial “species” has recently become really complex with <strong>the</strong> revelation of<br />

<strong>the</strong> concept of “pangenome” (text, p. 651-2). More later on this important topic, one issue in <strong>the</strong> description of<br />

“microbial diversity”.<br />

B. A natural taxonomy of macrobes has long been possible:<br />

Large organisms have many easily distinguished features, e.g. morphology, body-plans and developmental<br />

processes, that can be used to describe hierarchies of relatedness.<br />

C. Microbes usually have few distinguishing properties that relate <strong>the</strong>m, so a hierarchical taxonomy mainly was<br />

not possible until molecular comparisons (not only sequence) came along.<br />

2. Recent advances in molecular phylogeny have changed <strong>the</strong> picture a lot: we now have a relatively non-<br />

subjective, semi-quantitative way to view “biodiversity”, in <strong>the</strong> context of phylogenetic “maps” - evolutionary trees.


3<br />

A. Slowly evolving molecules (e.g. rRNA) used for large-scale structure; more rapidly evolving (“fast-clock”)<br />

molecules for fine-structure.<br />

B. But always remember: an organism is much more than one gene; rRNA tells you about <strong>the</strong> evolution of<br />

<strong>the</strong> genetic and basic cellular machinery, <strong>the</strong> equivalent of <strong>the</strong> “body plan” sought by <strong>the</strong> early<br />

evolutionists.<br />

3. A culmination is “The Big Tree” - a molecular coordinate system based on rRNA that relates all of life - below.<br />

A. BUT, <strong>the</strong> literature, language (e.g. “species”) and formal nomenclature, of biology however, remain solidly<br />

rooted in <strong>the</strong> tradition of Linnaeus at this time. (You have to call <strong>the</strong>m something!)<br />

B. First, an overview of phylogenetic perspective on microbial (indeed, biological) diversity, <strong>the</strong>n a brief look at<br />

o<strong>the</strong>r methods used to characterize microbes.<br />

PHYLOGENETIC DIVERSITY<br />

1. Phylogenetic relationships provide a “natural classification” of organisms (Darwin’s dream – “Our classifications<br />

will come to be, as far as <strong>the</strong>y can be so made, genealogies.”): Phylogenetic relationships are <strong>the</strong> only way to


4<br />

understand <strong>the</strong> evolutionary process. But, you need a metric – some property that provides a quantitative<br />

measure of <strong>the</strong> extent of relatedness of different organisms.<br />

A. Note that phylogenetic relationships can be “predictive”, not only “descriptive”: We can predict (some)<br />

properties of organisms based on <strong>the</strong> properties of <strong>the</strong>ir relatives – <strong>the</strong> principle: representatives of a<br />

particular phylogenetic group are expected to have <strong>the</strong> properties that are common to <strong>the</strong> group.<br />

e.g. -- We know a lot about Escherichia coli, but relatively little about Chromatium vinosum:<br />

1) E. coli is a chemoheterotroph, whereas C. vinosum is a photoheterotroph. (What do <strong>the</strong>se terms<br />

mean?) Classical physiological treatment would have considered <strong>the</strong>se organisms wildly different. From<br />

gene-sequence comparisons we know <strong>the</strong>y are fairly close relatives (!-group of proteobacteria).<br />

2) Thus, even though one eats glucose and <strong>the</strong> o<strong>the</strong>r light, we can predict that <strong>the</strong> biochemical underpinnings<br />

are similar -- protein-syn<strong>the</strong>sizing machinery (antibiotic-sensitivity), DNA replication machinery, amino acid-<br />

syn<strong>the</strong>sizing machinery, nucleotide biosyn<strong>the</strong>sis, etc. Because of <strong>the</strong> close phylogenetic relationship, we<br />

expect to be able to swap many genes between <strong>the</strong>se two superficially disparate organisms.<br />

3) The predictive value of phylogeny is essential for interpreting environmental sequence surveys.


5<br />

B. Phylogenetic perspective allows rational selection of model systems -- sometimes you need a model organism<br />

to gain perspective on a more difficult system.<br />

1) e.g. use of a model non-pathogen can provide safe study of a pathogen -- if it is a close relative. (What<br />

means pathogen??)<br />

e.g. Bacillus cereus and Bacillus anthracis: same organism except that <strong>the</strong> latter contains a few<br />

pretty nasty genes for animals.<br />

2) Or can provide a simple system for a complex one -- e.g. for plants, what to use as a model? Chlorella?<br />

Chlamydomonas? Euglena?<br />

Molecular phylogenetic studies say Chlorella or Chlamydomonas, both of which are in <strong>the</strong> plant relatedness<br />

group (“clade”) -- Euglena is a trypanosome!<br />

3) e.g. if you want to study <strong>the</strong> detail of an uncultivatable symbiont, identify a cultivatable free-living form. E.g.<br />

<strong>the</strong> Riftia symbiont – how to choose <strong>the</strong> model?<br />

C. Phylogenetic perspective even on large organisms is quite recent -- ~150 years -- and on microbes only ca. 30<br />

years. Many microbiology texts don’t have it and many (even most) general biology texts get it wrong! (Although<br />

it’s getting better.)


6<br />

A Bit on <strong>the</strong> Evolution of Evolutionary Thought<br />

2. Prior to <strong>the</strong> late 19th century, <strong>the</strong> concept of evolution was on <strong>the</strong> “evolutionary ladder”:<br />

Man<br />

"<br />

Apes<br />

"<br />

Marsupials<br />

"<br />

Reptiles<br />

"<br />

Amphibia<br />

"<br />

Fish<br />

"<br />

Invertebrates<br />

"<br />

Plants<br />

"<br />

Fungi<br />

"<br />

Leewenhoek’s “animacules”


7<br />

Indeed, <strong>the</strong> lexicon of biology still deals in “higher and lower” eucaryotes (I try not to use <strong>the</strong>se terms -- <strong>the</strong>y are<br />

dumb), “missing links” (no such thing exists), and “primitive” organisms (no such thing today).<br />

A. In its milieu, E. coli is as highly evolved as are we. E. coli is simple (~5#106 bp genome), we are complex<br />

(~3#109 bps); complexity has nothing to do with “evolutionary advancement”.<br />

B. Lineages evolve by diversification, "radiation", not “progression”. (!!)<br />

C. There is no such thing as a “primitive” organism alive today. Simple, yes, but still a finely honed product of 4<br />

billion years under <strong>the</strong> selective hammer of <strong>the</strong> niches that it and its progenitors have occupied.


3. By <strong>the</strong> late 1800s <strong>the</strong> concept of “evolutionary trees” was on <strong>the</strong><br />

8<br />

table -- e.g. Ernst Haeckel, 1866 (Note that Darwin’s “Origin of<br />

Species” was first presented in 1858).<br />

Note “kingdoms” of Plants, Animals, Protists (non-plants and<br />

animals, mostly microbial), and “monera” (“procaryotes”) at <strong>the</strong><br />

base.<br />

4. The conceptual basis for biological diversity was pretty much stalled<br />

at <strong>the</strong> Haeckel stage for <strong>the</strong> next century -- and still is in many/most<br />

general texts of biology. One current articulation is <strong>the</strong> “five<br />

kingdoms of life”, here taken from <strong>the</strong> 1969 Science lead article that canonized 5-kingdoms; <strong>the</strong>re are many<br />

popular versions of <strong>the</strong> five-kingdoms notion, which dominates textbooks currently.


9<br />

A. Still, “monera” (mostly termed “procaryotes” by this time) at <strong>the</strong> origin and progression up a ladder of sorts. But<br />

it still is <strong>the</strong> Haeckel tree in essence.<br />

B. Note much o<strong>the</strong>r subjectivity -- e.g. why do “fungi” get to be a “kingdom”, and not e.g. Alveolates (inc. ciliates,<br />

dinoflagellates) or Stramenopiles (inc. diatoms, brown algae, oomycetes)? [I guess because mushrooms are<br />

large, sometimes?]<br />

C. Note some subtleties by this time:


10<br />

1) Chloroplasts recognized as derived from "Blue-Green algae" (now “cyanobacteria” – one of ~100 bacterial<br />

“phyla” (“divisions”, deepest clades in bacterial domain).<br />

2) Mitochondria were thought probably derived from some sort of bacteria.<br />

Note that <strong>the</strong> “endosymbiotic” origins for <strong>the</strong> organelles had been in <strong>the</strong> air since <strong>the</strong> 19th century -- for<br />

comprehensive overview of this history (and controversy) see J. Sapp, “Evolution by Association: A History of<br />

Symbiosis” (Oxford, 1994).<br />

D. Still, <strong>the</strong>re are many problems with this Five-kingdoms story:<br />

1) Relationships among microbes, both “procaryote” and eucaryote were speculative, at best<br />

2) There were no criteria to relate organisms between “kingdoms” (even between e.g. phyla of animals)-- a<br />

universal phylogeny was impossible.<br />

3) Implicit timeline remained -- “procaryotes”, “protists” “primitive.”


11<br />

4) Supposition that <strong>the</strong> eucaryotic nucleus was derived from a “procaryote” progenitor turned out to be<br />

fundamentally incorrect (<strong>the</strong> often-cited date in textbooks, and even some of <strong>the</strong> current scientific literature, of<br />

1.5 billion years ago for <strong>the</strong> origin of eucaryotes is utter B.S.) -<br />

5) Studies in “molecular phylogeny” over <strong>the</strong> past two decades have changed <strong>the</strong> “paradigm” significantly. (ala<br />

Thomas Kuhn -- “The Structure of Scientific Revolutions”)<br />

E. By comparing macromolecular sequences, one can extract evolutionary relationships -- “evolutionary distances”<br />

-- between organisms. The basic notion is that sequence change = evolutionary distance.<br />

5. The goal of molecular phylogeny is to relate molecules (hence in principle organisms) quantitatively, so as to<br />

reconstruct <strong>the</strong>ir evolutionary histories, e.g. as a “phylogenetic tree.”<br />

A. There are many ways to "relate" molecules. Some subjective ways are:<br />

e.g. immunologically (fractional gross reactivity)<br />

e.g. DNA-DNA “heterologous hybridization” (more below)<br />

But <strong>the</strong>se are difficult (impossible) to quantify precisely in terms of relationships.


12<br />

B. The best way is by direct comparison of sequences of nucleic acids or proteins. This provides “precise”<br />

numbers for defining relationships between molecules -- and, ideally, organisms.<br />

6. For “orthologous” (of common ancestry and function) nucleic acid (or protein) sequences:<br />

A. Consider:<br />

--Organism X • • • AGCUGCCAGU • • •<br />

X XX<br />

--Organism Y • • • AACCCCCAGU • • •<br />

"DNA OR RNA?<br />

Sequence X is 70% identical to Sequence Y,<br />

Fractional identity is 0.7<br />

Fractional difference is 0.3 (1-0.7)<br />

1) Note <strong>the</strong> terms: “homologous” = of common ancestry; “orthologous” = of common ancestry and function.<br />

2) Note that <strong>the</strong> term “homology” is commonly used incorrectly when “identity” is meant. Note that<br />

nucleotide sequences are not “##% similar”, <strong>the</strong>y are “##% identical”; protein seqs, on <strong>the</strong> o<strong>the</strong>r hand, can<br />

be “similar”. (How is that?)


13<br />

3) You cannot meaningfully compare sequences unless <strong>the</strong>y are "homologous" -- of common ancestry.<br />

Homologous sequences are not necessarily identical – indeed probably aren’t in different organisms;<br />

identical sequences are not necessarily homologous, particularly if short (e.g. promoters, translation<br />

punctuation, etc.).<br />

B. Align sequences (big deal – more later) and do difference (1-identity) count for all pairs of organisms<br />

considered. This difference count is a measure of <strong>the</strong> extent of evolution - evolutionary distance - separating <strong>the</strong><br />

pairs of organisms.<br />

e.g. with organisms A,B,C,D and E:<br />

C. To build relationships, construct a “difference matrix” for organisms A-E:


14<br />

A B C D E<br />

A $ 0.1 0.2 0.2 0.4 Fractional Difference<br />

B 0.9 $ 0.2 0.2 0.4<br />

C 0.8 0.8 $ 0.1 0.4<br />

D 0.8 0.8 0.9 $ 0.4<br />

E 0.6 0.6 0.6 0.6 $<br />

Fractional Identity<br />

Can relate in a “tree”- like figure, a “dendrogram:”


Note that organism-to-node<br />

“distance” is 1/2 of organism-<br />

to-organism “distance<br />

15<br />

Note that this (or any o<strong>the</strong>r) tree is a single dimension -- you can rotate around any node and have <strong>the</strong> same<br />

topology in an evolutionary sense.<br />

1) It is common to see sequence-divergence presented in terms of time, but this is not legitimate unless you<br />

have a fossil record with which to calibrate <strong>the</strong> line segment-lengths. “Clock speeds” of organisms vary and <strong>the</strong><br />

evolutionary clock (rate of change) is not constant, contrary to common supposition<br />

2) The root of <strong>the</strong> tree, <strong>the</strong> line representing <strong>the</strong> common ancestral line, may or may not be <strong>the</strong> “deepest”<br />

branch point: Most properly <strong>the</strong> tree should be drawn:


16<br />

OR<br />

3) E, <strong>the</strong> “outgroup”, "roots" A/B/C/D


17<br />

Vignette – Tree-reading –<br />

Baum, D.A., S.D. Smith and S.S. Donovan, “The Tree Thinking Challenge,” Science 310:979-980 (2005) (with<br />

supporting info quiz).<br />

1. Remember – phylogenetic trees have only a single dimension – along line segments.<br />

A few trees from Baum et al.:


19<br />

End Vignette<br />

------------------------------------------------------------------------------------<br />

7. What molecule(s) to use for molecular phylogeny of organisms for mapping <strong>the</strong> Tree of Life?


20<br />

A. Doesn’t really matter so long as:<br />

1. “Homologous” gene occurs in all organisms considered<br />

a. More specifically, you need to know that <strong>the</strong> genes are “orthologs,” not “paralogs”<br />

b. “Orthologs” share ancestry and retain <strong>the</strong> same function in <strong>the</strong> different organisms<br />

c. “Paralogs” result from an ancestral duplication, with potentially different functions taken-on subsequent<br />

to <strong>the</strong> duplication – duplications produce “gene families”.<br />

d. For instance, <strong>the</strong> % and & globins are a gene family; <strong>the</strong>y have ancient common ancestry -- <strong>the</strong> % - type<br />

and & - type globins have evolved independently since <strong>the</strong> ancestral duplication:<br />

% - globins are orthologs; & - globins are orthologs.<br />

% and & globins are paralogs<br />

e. The tree of <strong>the</strong> gene family would look like:


21<br />

f. If you did not keep your orthologs and paralogs straight (sometimes a tough call) when you build <strong>the</strong><br />

dataset, you might get some most unexpected (and incorrect) trees, e.g.: You miss <strong>the</strong> true topology with <strong>the</strong><br />

restricted data set.<br />

better.<br />

Human %-globin<br />

Mouse %-globin<br />

Frog %-globin<br />

Human &-globin<br />

Mouse &-globin<br />

Frog &-globin<br />

Human (%-globin)<br />

Frog (%-globin)<br />

Mouse (&-globin)<br />

2. You need a sufficient number of nucleotides (or amino acids) to be statistically significant -- more is always


22<br />

3. Changes must span <strong>the</strong> evolutionary distance inspected -- i.e., compared sequences must not be<br />

randomized.<br />

4. No lateral transfer -- <strong>the</strong> evolution of <strong>the</strong> gene must reflect <strong>the</strong> evolution of <strong>the</strong> cells considered.<br />

a. Genes that are known to be derived from lateral transfer are called “xenologs.” If you are interested<br />

in metabolic genes, <strong>the</strong>re is a good chance, at least among bacteria, that you are dealing with<br />

genes or pathways that potentially can move between different organisms.<br />

b. e.g. penicillinase and o<strong>the</strong>r commonly transferred antibiotic resistance genes (scary!).<br />

c. e.g. Rhizobium symbiotic (leguminous plants) N2-fixation.<br />

5. Note <strong>the</strong> evident impact of lateral transfers throughout evolution. Much of microbial physiological diversity<br />

(probably) is dependent on laterally transferred genes.<br />

a. Note also that portions of genes can transfer (e.g. with “two- component” systems) so that<br />

“homologous” blocks of sequence can show-up in functionally unrelated genes. Formation of<br />

intralineage "gene families" also results in mixing-up functional modules of macromolecules.<br />

b. How to detect xenologs?


23<br />

B. When “molecular phylogeny” first got started, considerable work was done with protein sequences, e.g.<br />

cytochrome C, hemoglobin.<br />

1. But proteins are hard to get and sequence; it is now easier to isolate/sequence genes.<br />

2. Most protein genes are “shallow” clocks, <strong>the</strong> result of relatively recent evolution; e.g. E. coli doesn’t have<br />

hemoglobin.<br />

8. Choice of molecules for comprehensive (all organisms) phylogeny -- ribosomal RNAs (rRNAs).<br />

A. Ribosome -- carries out protein syn<strong>the</strong>sis<br />

Small subunit Large subunit<br />

S L “23S” rRNA (LSU): 3000-5000nt<br />

“16S”-“18S” (SSU) rRNA: 1500-2000nt “5S” rRNA - 120 nt<br />

ca. 25 proteins ca. 30-40 proteins<br />

B. rRNAs present in all organisms and <strong>the</strong> major organelles (mitochondria and chloroplasts).


24<br />

C. Highly conserved throughout evolution; e.g., ca. 50% identity between E. coli and human SSU rRNAs over<br />

alignable nt<br />

1. Length variation between molecules may cause problems; must consider only “homologous” nt (<strong>the</strong><br />

alignment problem - more later).<br />

2. Note that rRNA, while useful for distantly related organisms, is not good for establishing (resolving) close<br />

relationships: it is too conservative. Consequently, close relatives have too few differences to be reliable.<br />

e.g. human/apes/mice have <strong>the</strong> same rRNA seqs.<br />

e.g. E. coli and Yersinia pestis (causes Black Plague) have essentially <strong>the</strong> same rRNA seqs.<br />

D. The rRNA genes are large enough for reasonable statistics (more later on this).<br />

E. First used for all-life phylogenetic trees by Carl Woese (University of Illinois).<br />

For historical overview; Pace et al. “Phylogeny and beyond: scientific, historical and conceptual significance<br />

of <strong>the</strong> first tree of life.” Proc. Natl. Acad. Sci. 109:1011-1018, 2012.<br />

9. Given sequences of multiple organisms (>2 million now available, but spotty representation of diverse phyla –<br />

MMBR reference):<br />

$Align seqs., count number of differences -- is some measure of evolutionary change/distance


25<br />

$”Correct” for multiple and back mutations – <strong>the</strong> number of changes you count is always less than <strong>the</strong> number of<br />

mutations that have occurred.<br />

$Computer-fit pairwise “evolutionary distances” to best-fit overall tree topology.<br />

(More detail on all this below)<br />

10. The “Big Tree” emerges: Outlines first seen by Woese in 1977.<br />

A. Note that Woese did not have tree-building methods now available. In fact, he did not have full sequences,<br />

only short “signature” oligonucleotides (more on “signatures” later.


B. In essence, <strong>the</strong> tree is a quantitative map of evolutionary relatedness, a comprehensive map of biological<br />

26<br />

diversity - - rRNA “sequence space”.


C. Indeed, <strong>the</strong> tree is a quantitative estimate – a metric - for that slippery concept, amount of evolution. The<br />

“amount of biological diversity” (for any particular molecule) might be <strong>the</strong> summation of all unique line segment-<br />

lengths in a comprehensive tree.<br />

11.Some lessons from <strong>the</strong> Big Tree:<br />

27<br />

A. There was a single origin for terrestrial type of life -- all life forms are related.<br />

B. Three “primary lines of evolutionary descent” -- ”Domains” -- “ur-kingdoms”: Eucarya (eucaryotes),<br />

Bacteria, Archaea (originally called “archaebacteria”, but <strong>the</strong> name was changed when it became clear that<br />

<strong>the</strong> things aren’t bacteria.)<br />

1. Sometimes see referred to as “kingdoms,” but usage in this context is probably not a good idea -- too<br />

historically loaded.<br />

2. You can inject "time" into tree, but sequence change is not necessarily linear with time - indeed, probably it<br />

usually isn't.


28<br />

Bacteria<br />

C. The eucaryote nuclear line of descent is as old as <strong>the</strong> archaeal line – eucaryotes have been around since<br />

<strong>the</strong> beginning<br />

Archaea<br />

Eucarya<br />

"Time"<br />

2) The term “procaryote” is inappropriate in <strong>the</strong> light of <strong>the</strong> relationships – <strong>the</strong>re isn’t any such group as<br />

“prokaryote”. More on this issue later.<br />

D. Are <strong>the</strong>re still more domain-level divergences to be discovered??<br />

E. Note that lines connecting organisms to nodes are not all <strong>the</strong> same length -- <strong>the</strong> evolutionary clock is not<br />

constant between different lineages (e.g. Haloferax vs. Methanopyrus, Aquifex vs. Bacillus, Eucarya in general vs.<br />

any representative of Archaea or Bacteria)


29<br />

1) The rate of evolution is not even necessarily <strong>the</strong> same for a particular lineage at all stages in <strong>the</strong> evolution<br />

of <strong>the</strong> line, e.g. Agrobacterium vs. mitochondrion<br />

2) Note domain-level tendencies:<br />

Eucarya -- fast clocks<br />

Archaea -- slow clocks<br />

Bacteria -- intermediate rates of evolution<br />

3) Because of variable rates, estimating time from sequence change is chancy--even fatuous--without some<br />

sort of calibration, a correlation between <strong>the</strong> Tree and some datable geological event.<br />

F. Note that <strong>the</strong> phylogenetic space occupied by multicellular eucaryotes is shallow and limited, but enormously<br />

diverse in morphological (less biochemical) phenotype. Is this a consequence of large, highly plastic<br />

genomes?<br />

1) Note, however, that <strong>the</strong> “typical” eucaryote is microbial and has a small genome, e.g. Saccharomyces<br />

cerevisiae at 13.5 # 106 bps. (vs, E. coli at ~4.2 # 106 bps; Calothrix [a cyanobacterium] at ~12.5 # 106 bps;<br />

human at ~3.2 # 109 (mostly garbage?) bps; Methanococcus jannaschii at ~1.7 # 106 bps.)<br />

G. The rRNA (and o<strong>the</strong>r molecular) data prove that mitochondria and chloroplasts were of bacterial origin (<strong>the</strong><br />

bacterial phyla [“divisions”] “proteobacteria” and “cyanobacteria,” respectively).


30<br />

H. Note how deeply divergent are Giardia, Trichomonas and Vairimorpha in <strong>the</strong> eucaryotic line. These<br />

organisms lack mitochondria, so may have diverged from <strong>the</strong> main eucaryal line of descent before <strong>the</strong> mitos<br />

came in.<br />

1) It turns out <strong>the</strong>y, like all eucaryotes have some bacterial (and archaeal) genes, but it is not clear where<br />

or when <strong>the</strong>y got <strong>the</strong>m. Some folks think this resulted from mitochondria import but this is questionable.<br />

12. The three-domain Big Tree is an “unrooted” tree -- you don’t know where is <strong>the</strong> ancestral node. You need<br />

an “outgroup” to “root” <strong>the</strong> tree, and a universal tree has no outgroup.<br />

A. However, <strong>the</strong> Tree can be rooted using “paralogs” that arose from duplication before <strong>the</strong> last common<br />

ancestor:<br />

e.g. translation factors EF-TU and EF-G<br />

ATP synthase subunits % and &<br />

tRNAs met-initiator and met-elongator<br />

These paralogs occur in all three domains, so presumably arose before <strong>the</strong> last common ancestor. Each<br />

yields <strong>the</strong> 3-domain tree upon analysis, so you can use tree with one paralog to root <strong>the</strong> tree with <strong>the</strong> o<strong>the</strong>r.<br />

(You will do this in <strong>the</strong> Mol Phy Workshop.) All concur that <strong>the</strong> relationships are:


31<br />

Archaea<br />

Eucarya<br />

Bacteria<br />

Archaea<br />

Eucarya<br />

Bacteria<br />

This indicates that <strong>the</strong> root of <strong>the</strong> Big Tree is (presumably deep) on <strong>the</strong> bacterial line of descent.<br />

B. This means also that Eucarya and Archaea shared common history after divergence from Bacteria<br />

1) This explains many similarities between archaeal and eucaryal basal machineries.<br />

e.g. similar transcription machineries; Archaea and Eucarya use TATA-binding proteins whereas Bacteria<br />

use ' factors for specification of transcription initiation.<br />

e.g. Archaeal and eucaryal DNA-syn<strong>the</strong>tic machineries are far more like one ano<strong>the</strong>r than ei<strong>the</strong>r is to<br />

bacteria (for overview, two recent review books are Garret and Klenk, “Archaea: Evolution, Physiology,<br />

and Molecular <strong>Biology</strong>”, 2007. Cavicchioli, Archaea: Molecular and Cellular <strong>Biology</strong>, 2007).<br />

13. Note that <strong>the</strong> Big Tree shown is a limited set of specific organisms: > 2 million SSU sequences are now<br />

available.<br />

A. Several databases of curated rRNA sequences e.g. RDP II, GreenGenes, SILVA, of course GenBank (raw<br />

sequence - not aligned).


32<br />

You can download trees, carry out functions, get programs, etc.<br />

14. A few domain-level trees for reference.<br />

15. Note recent expansion of known bacterial diversity (also next page)<br />

A. The lines indicate relatedness groups of bacteria, <strong>the</strong> phylogenetic “phyla” or “divisions” of bacteria (referred<br />

to as “kingdoms” by Woese). There is no formal taxonomic status of <strong>the</strong>se phyla at this time:


B. ca. 100 phyla identified so far by rRNA sequence. Only ca. 25 contain cultivated representatives (bold lines; non-<br />

bold have no cultivated representatives). Only ~8-10 have significant cultured representation.<br />

33<br />

1) Uncultivated organisms in <strong>the</strong> environment can be identified by obtaining rRNA genes without cultivation:<br />

environmental sample ( isolate total DNA ( clone rRNA genes( sequence<br />

(more later)<br />

You know that <strong>the</strong> organism is <strong>the</strong>re, and get some idea of abundance. How to get more information?<br />

2) Most of <strong>the</strong> environmental sequences that are abundant in <strong>the</strong> environment are poorly represented by<br />

cultivars, or not represented at all, e.g.:<br />

The Acidobacterium group contains only a few cultivars, but is very abundant in many environments.<br />

The“OP11” group is very abundant in anoxic environments at low and high temperatures, but has no<br />

cultivated representatives<br />

A tree with some names of <strong>the</strong> better-known groups is <strong>the</strong> following:


34<br />

C. Most of what we know about bacteria is based on studies of organisms representing only a few phyla:<br />

$ Proteobacteria (E. coli, Pseudomonas spp., “purple photosyn<strong>the</strong>tic bacteria”) - <strong>the</strong> classic "Gram<br />

negative" group.


35<br />

$ Firmicutes (aka “Low G + C Gram Positive bacteria” [Bacillus, Clostridium,<br />

Streptococcus, Staphylococcus, Lactobacillus)])<br />

$ Actinobacteria (aka “High G + C Gram Positive bacteria” [Streptomyces,<br />

Mycobacterium])<br />

$ Cyanobacteria<br />

D. Note <strong>the</strong> expansion in known bacterial diversity over <strong>the</strong> past few years!<br />

16. Archaea:<br />

A. Classically two groups (Crenarchaeota and Euryarchaeota) have cultivated representatives and recognition.<br />

1) Crenarchaeota: Most cultivated types are high-temperature, but uncultivated low-temp. types are<br />

abundant in <strong>the</strong> environment (detected by cloning rRNA and o<strong>the</strong>r genes – “metagenomics”)<br />

a) Name “cren-” from <strong>the</strong> Greek for spring or fount, referring to <strong>the</strong> ostensible similarity of such organisms<br />

to <strong>the</strong> earliest life (high temperature, using geo<strong>the</strong>rmal compounds for energy, e.g. H2 / S 0 -- more later)<br />

2) Euryarchaeota: methanogens, extreme halophiles, many heterotrophs (more later):<br />

a) Name from Greek “eury-” meaning “varied”, referring to variable<br />

phenotypes, compared to cultivated crenarchaeota.


36<br />

B. Environmental sequences have swamped cultured sequences in <strong>the</strong> archaea, and <strong>the</strong> structure of <strong>the</strong><br />

archaeal tree currently is in a state of flux


17. Eucarya (Eucaryotes):<br />

37


38<br />

A. Note that <strong>the</strong> microbial eucs are vastly more diverse than <strong>the</strong> popular three phyla (“kingdoms” --<br />

fungi, plants and animal).<br />

B. Note that <strong>the</strong> phylogeny of eucaryotes is a controversial (!!) place. rRNA always gives <strong>the</strong> same<br />

pic as above. Protein gene trees are sometimes taken to represent various topologies and <strong>the</strong>re is<br />

little agreement in <strong>the</strong> field regarding <strong>the</strong> deepest branchings. For instance, a tree from Baldauf et al.<br />

is <strong>the</strong> following:


18. The Large-scale Pic from <strong>the</strong> rRNA perspective is probably <strong>the</strong> least biased, but also is difficult to resolve at <strong>the</strong><br />

deepest branchings. What we see in <strong>the</strong> trees is biased by a lot to things, e.g. limited representation of known<br />

diversity, treeing methods and nuances, “clockspeed” effects, o<strong>the</strong>rs. See Pace 2009 for discussion.<br />

39


19. The three-domain tree is seen for all molecules in <strong>the</strong> central information-processing machinery (DNA, RNA,<br />

protein syn<strong>the</strong>sis), <strong>the</strong> “core” genes of genetic transfer.<br />

40<br />

A. When you consider molecules outside <strong>the</strong>se core functions, however, e.g. carbohydrate metabolism, amino<br />

acid metabolism, etc., relationships can become weird. The susceptibility of genes to transfer may depend<br />

on whe<strong>the</strong>r or not <strong>the</strong> gene product has to interact specifically with o<strong>the</strong>r cellular components; “stand-alone”<br />

gene products can transfer more readily if <strong>the</strong>y aren’t required to interact specifically with o<strong>the</strong>r cellular<br />

components. Some genes seem to have moved around in <strong>the</strong> very deep past, e.g. aminoacyl-tRNA<br />

synthases. In phylogenetic analysis of <strong>the</strong>se genes, some give “canonical” three-domain trees, while o<strong>the</strong>rs<br />

show evidence of transfers.<br />

B. E.g. <strong>the</strong> canonical pattern, as seen in <strong>the</strong> Leu RS:


C. O<strong>the</strong>rs are decidedly noncanonical, e.g. Ile RS:<br />

41


42<br />

D. Even enzymes in <strong>the</strong> same biosyn<strong>the</strong>tic pathway may have somewhat different evolutionary histories, e.g. in<br />

His biosyn<strong>the</strong>sis:


44<br />

Archaea (in bold), EUCARYA (IN CAPS)<br />

1) Numbers at nodes are “bootstrap” valves, <strong>the</strong> % of trees in multiple solutions that give that particular node<br />

(more later on “bootstrap analysis”).<br />

E. These “incongruencies” with <strong>the</strong> Big Tree are generally considered to be <strong>the</strong> results of “lateral transfer” of<br />

genes between <strong>the</strong> vertical lines of descent. For lots more discussion and <strong>the</strong> meaning in <strong>the</strong> larger context of<br />

biology see:<br />

Woese, Olsen, Soll. Aminoacyl-tRNA syn<strong>the</strong>tases , <strong>the</strong> genetic code and <strong>the</strong> evolutionary process.<br />

Microbiol. Molec. Biol. Rev. 64:202-236 (2000)<br />

20. What happened in evolution? It looks as though <strong>the</strong> “core genome,” reflected in e.g. rRNA, had genetic<br />

continuity throughout evolution. A lot of o<strong>the</strong>r things got scrambled – but not systematically.<br />

A. “Endosymbiosis” has involved more than organelles!<br />

B. Note that lateral transfer in general tends to be “phylogenetically local” and idiosyncratic. It can, however,<br />

have powerful influence in evolution (e.g. cyanobacterial photosyn<strong>the</strong>sis).<br />

21. Some scientists argue that <strong>the</strong> occurrence of lateral transfers screws up large-scale trees: e.g. Ford Doolittle:


Gary Olsen edits <strong>the</strong> Doolittle tree to a more realistic view:<br />

What lateral transfer really means phylogenetically is that if you want to track organismic lineages you need to be<br />

wary of xenologs (and paralogs!). Remember: Any molecular tree is just that, a tree of molecules, not organisms.<br />

22. Vignette: The prokaryote issue.<br />

45

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!