Phylogeny and molecular evolution of green algae - Phycology ...

More documents

Recommendations

Info

Tree building methods Maximum likelihood and Bayesian Inference INTRODUCTION 13 Using the most accurate tree building methods and evolutionary models available is a basic necessity to obtain accurate phylogenies (Fig. 9) (Delsuc et al. 2005). Likelihood-based methods (maximum likelihood and Bayesian Inference) generally outperform methods based on distance or parsimony criteria because they allow the explicit incorporation of the processes of character evolution into probabilistic models to calculate the likelihood of the data given the model and tree. Maximum likelihood (ML) selects the tree that maximizes the probability of observing the data under a given model of sequence evolution. Bayesian methods derive the distribution of trees according to their posterior probability, using Bayes’ mathematical formula to combine the likelihood function (including tree and model parameters) with prior probabilities on trees. Since prior knowledge is mostly lacking or bias towards one or the other tree is not generally desirable, flat priors are usually chosen, i.e. giving the same prior probability to all trees. Consequently, posterior tree probabilities depend primarily on the tree likelihood. Unlike ML, which optimizes model parameters to find the highest peak in parameter space and where confidence is obtained by non-parametric bootstrapping, Bayesian approaches integrate the model parameters by measuring the volume under a posterior probability surface rather than finding its maximum height and simultaneously estimates trees and measurements of uncertainty for every branch (Holder and Lewis 2003, Delsuc et al. 2005, Verbruggen and Theriot 2008). During Bayesian analysis, Markov chain Monte Carlo (MCMC) simulation is used to approximate the posterior probability distribution because the complexity of the phylogenetic likelihood functions prevents its analytical calculation. During each generation, a parameter change is proposed (topology, branch lengths and model parameters) and accepted if it increases the posterior probability. If the posterior probability decreases, the parameter change is either accepted or rejected depending on the amount of change in posterior probability. Whereas small changes are often accepted, large decreases are usually rejected. Because parameters are usually not near their optimal values during initial generations these first generations, called burn-in, need to be removed before a consensus tree of all post-burn-in samples can be made. In order to search tree space even more thoroughly, Metropolis-coupled MCMC, in which several chains are run in parallel can be applied. Metropolis-coupled MCMC is implemented in the commonly used BI program MrBayes (Ronquist and Huelsenbeck 2003). The first chain is the called the cold chain and only propose small parameter changes. The other chains are incrementally heated and propose larger parameter changes in order to find distant regions with high posterior probabilities. After each generation, chains can be swapped, i.e. a heated chain in a higher posterior probability region than the current cold chain can become the cold chain in order to find the local optimum. Only the output from the cold chain is used to summarize the posterior distribution and, due to chain swapping, this chain will contain a more complete image of the high posterior probability regions of tree space compared with a BI analysis based on a single MCMC chain. The downside of Metropolis-coupled MCMC is a considerably higher computational cost because several chains have to be run in parallel.
14 CHAPTER 1 Missing data Deep phylogenies require the simultaneous analysis of many characters and many taxa (Delsuc et al. 2005). Individual, orthologous genes can be combined into a supermatrix which inevitably involves a certain amount of missing data. Many studies have studied the effects of missing data on phylogenetic reconstruction. A simulation study suggests that the placement of individual taxa in a tree is robust to large amounts of missing data in the sequences of the taxa in question (up to 50% under the simulated conditions) and that model-based methods can deal with even greater amounts of missing data (Wiens 2005). Another simulations study demonstrates that Bayesian analyses are even more robust to missing data, i.e. the phylogenetic position of taxa with 95% of missing data in their sequence is still accurate, as long as the total number of characters in the dataset is large (Wiens and Moen 2008). Studies of empirical datasets have shown that datasets with up to 92% of missing data are still able to provide insights into various parts of the tree of life (Driskell et al. 2004, Philippe et al. 2004, Delsuc et al. 2005). Models of sequence evolution The General Time Reversible (GTR) model and its simpler variants include one or more parameters to describe the substitution rate between the different bases. The GTR model uses a set of parameters to describe the relative substitution rate between all combinations of bases (AC, AG, AT, CG, CT, and GT). The simpler models only consider transitions versus transversions or attribute an equal substitution rate to all possible changes. A second important component of a model are the base frequencies. They can be calculated directly from the dataset (‘empirical’ base frequencies) or optimized along with the other parameters of the model. A third common element of the model allows for variations of evolutionary rate across site (e.g. different codon positions in protein coding genes, loops and stems in ribosomal DNA). Such among site rate variation is commonly accounted for by assuming that the site rates follow a gamma distribution and/or by incorporating a proportion of invariable sites. Partitioning strategies A supermatrix, a dataset composed of different genes, often demands data partitioning to account for across site heterogeneity in evolutionary rate (Delsuc et al. 2005). Therefore, careful attention has to be paid to the selection of suitable partitioning strategies (Brown and Lemmon 2007, Li et al. 2008, Verbruggen and Theriot 2008). Protein coding genes usually benefit from partitioning into codon position. Empirical studies showed that codon position models perform better than models which do not take codon position into account (Shapiro et al. 2006). In order to accommodate differences in evolutionary rate among partitions rate multipliers can be used.
Page 1 and 2: Phylogeny and molecular evolution o
Page 3 and 4: Promotor: Prof. Dr. O. De Clerck (U
Page 5 and 6: Aan de mensen van de plantkunde in
Page 8 and 9: 1 Introduction Algae Algae are a la
Page 10 and 11: Green lineage or Viridiplantae INTR
Page 12 and 13: INTRODUCTION 5 Figure 4. Variation
Page 14 and 15: INTRODUCTION 7 (shared gene losses
Page 16 and 17: INTRODUCTION 9 Sphaeropleales (dire
Page 18 and 19: INTRODUCTION 11 Figure 7. The estim
Page 22 and 23: INTRODUCTION 15 Figure 9. Flow char
Page 24 and 25: Removal of fast-evolving sites INTR
Page 26 and 27: 2 Ancient relationships among green
Page 28 and 29: PHYLOGENY OF GREEN ALGAE 21 environ
Page 30 and 31: PHYLOGENY OF GREEN ALGAE 23 Our phy
Page 32 and 33: PHYLOGENY OF GREEN ALGAE 25 Figure
Page 34 and 35: PHYLOGENY OF GREEN ALGAE 27 to the
Page 36 and 37: PHYLOGENY OF GREEN ALGAE 29 primers
Page 38 and 39: Topological hypothesis testing PHYL
Page 40 and 41: Additional files PHYLOGENY OF GREEN
Page 42 and 43: PHYLOGENY OF GREEN ALGAE 35 Figure
Page 44 and 45: actin G6PI GapA histone OEE1 40S_S9
Page 46 and 47: PHYLOGENY OF GREEN ALGAE 39 atpB rb
Page 48 and 49: PHYLOGENY OF GREEN ALGAE 41 Table S
Page 50 and 51: 3 Gain and loss of elongation facto
Page 52 and 53: GAIN AND LOSS OF ELONGATION FACTOR
Page 60 and 61: Methods Algal strains GAIN AND LOSS
Page 64 and 65: Authors' contributions GAIN AND LOS
Page 66 and 67: Additional file 2 GAIN AND LOSS OF
Page 68 and 69: Additional file 4 GAIN AND LOSS OF
Page 70 and 71:
Additional file 6 Table S2. GenBank
Page 72 and 73:
atpB rbcL SSU rDNA EF-1α EFL Chlor
Page 74 and 75:
atpB rbcL SSU rDNA EF-1α EFL choan
Page 76 and 77:
4 Complex phylogenetic distribution
Page 78 and 79:
NON-CANONICAL GENENTIC CODE 71 addi
Page 80 and 81:
Figure 1. The occurrence of a non-c
Page 82 and 83:
NON-CANONICAL GENENTIC CODE 75 Figu
Page 84 and 85:
Multiple independent gains NON-CANO
Page 86:
Acknowledgements NON-CANONICAL GENE
Page 89 and 90:
82 CHAPTER 5 Introduction The genet
Page 91 and 92:
84 CHAPTER 5 The goal of this study
Page 93 and 94:
86 CHAPTER 5
Page 95 and 96:
88 CHAPTER 5 Codon usage bias and G
Page 98 and 99:
6 A multi-locus time-calibrated phy
Page 100 and 101:
A MULTI-LOCUS TIME-CALIBRATED PHYLO
Page 102 and 103:
Page 104 and 105:
Page 106 and 107:
Time-calibrated phylogeny A MULTI-L
Page 108 and 109:
Page 110 and 111:
Page 112 and 113:
Page 114 and 115:
Acknowledgments A MULTI-LOCUS TIME-
Page 116 and 117:
Dichotomosiphon tuberosus AB038487
Page 118 and 119:
Table 3. List of calibration points
Page 120:
Page 123 and 124:
116 CHAPTER 7 Introduction Green al
Page 125 and 126:
118 CHAPTER 7 ulvophycean order Ulo
Page 127 and 128:
120 CHAPTER 7 10). Filaments that w
Page 129 and 130:
122 CHAPTER 7 invaded freshwater ha
Page 131 and 132:
124 CHAPTER 7 filaments, and the po
Page 133 and 134:
126 CHAPTER 7 Type species: Okellya
Page 136 and 137:
8 General discussion This thesis fo
Page 138 and 139:
SSU nrDNA phylogenies GENERAL DISCU
Page 140 and 141:
GENERAL DISCUSSION 133 copy nuclear
Page 142 and 143:
GENERAL DISCUSSION 135 Figure 2. Su
Page 144 and 145:
GENERAL DISCUSSION 137 Our site str
Page 146 and 147:
GENERAL DISCUSSION 139 In the light
Page 148:
GENERAL DISCUSSION 141 genomes (Rok
Page 151 and 152:
144 REFERENCES Bartsch I and Kuhlen
Page 153 and 154:
146 REFERENCES Derelle E, Ferraz C,
Page 155 and 156:
148 REFERENCES Hanyuda T, Wakana I,
Page 157 and 158:
150 REFERENCES Knight RD, Freeland
Page 159 and 160:
152 REFERENCES Mattox K and Stewart
Page 161 and 162:
154 REFERENCES Philippe H, Lartillo
Page 163 and 164:
156 REFERENCES Sanderson MJ. 2002.
Page 165 and 166:
158 REFERENCES Turmel M, Otis C, an
Page 167 and 168:
160 REFERENCES Zechman FW, Theriot
Page 169 and 170:
162 SUMMARY derived from a multinuc
Page 172 and 173:
Samenvatting Groenwieren worden wer
Page 174 and 175:
SAMENVATTING 167 kleine variaties o
show all

Phylogeny and molecular evolution of green algae - Phycology ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?