22.12.2012 Views

Phylogeny and molecular evolution of green algae - Phycology ...

Phylogeny and molecular evolution of green algae - Phycology ...

Phylogeny and molecular evolution of green algae - Phycology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

14 CHAPTER 1<br />

Missing data<br />

Deep phylogenies require the simultaneous analysis <strong>of</strong> many characters <strong>and</strong> many taxa (Delsuc et al.<br />

2005). Individual, orthologous genes can be combined into a supermatrix which inevitably involves a<br />

certain amount <strong>of</strong> missing data. Many studies have studied the effects <strong>of</strong> missing data on<br />

phylogenetic reconstruction. A simulation study suggests that the placement <strong>of</strong> individual taxa in a<br />

tree is robust to large amounts <strong>of</strong> missing data in the sequences <strong>of</strong> the taxa in question (up to 50%<br />

under the simulated conditions) <strong>and</strong> that model-based methods can deal with even greater amounts<br />

<strong>of</strong> missing data (Wiens 2005). Another simulations study demonstrates that Bayesian analyses are<br />

even more robust to missing data, i.e. the phylogenetic position <strong>of</strong> taxa with 95% <strong>of</strong> missing data in<br />

their sequence is still accurate, as long as the total number <strong>of</strong> characters in the dataset is large<br />

(Wiens <strong>and</strong> Moen 2008). Studies <strong>of</strong> empirical datasets have shown that datasets with up to 92% <strong>of</strong><br />

missing data are still able to provide insights into various parts <strong>of</strong> the tree <strong>of</strong> life (Driskell et al. 2004,<br />

Philippe et al. 2004, Delsuc et al. 2005).<br />

Models <strong>of</strong> sequence <strong>evolution</strong><br />

The General Time Reversible (GTR) model <strong>and</strong> its simpler variants include one or more parameters to<br />

describe the substitution rate between the different bases. The GTR model uses a set <strong>of</strong> parameters<br />

to describe the relative substitution rate between all combinations <strong>of</strong> bases (AC, AG, AT, CG, CT, <strong>and</strong><br />

GT). The simpler models only consider transitions versus transversions or attribute an equal<br />

substitution rate to all possible changes. A second important component <strong>of</strong> a model are the base<br />

frequencies. They can be calculated directly from the dataset (‘empirical’ base frequencies) or<br />

optimized along with the other parameters <strong>of</strong> the model. A third common element <strong>of</strong> the model<br />

allows for variations <strong>of</strong> <strong>evolution</strong>ary rate across site (e.g. different codon positions in protein coding<br />

genes, loops <strong>and</strong> stems in ribosomal DNA). Such among site rate variation is commonly accounted for<br />

by assuming that the site rates follow a gamma distribution <strong>and</strong>/or by incorporating a proportion <strong>of</strong><br />

invariable sites.<br />

Partitioning strategies<br />

A supermatrix, a dataset composed <strong>of</strong> different genes, <strong>of</strong>ten dem<strong>and</strong>s data partitioning to account<br />

for across site heterogeneity in <strong>evolution</strong>ary rate (Delsuc et al. 2005). Therefore, careful attention<br />

has to be paid to the selection <strong>of</strong> suitable partitioning strategies (Brown <strong>and</strong> Lemmon 2007, Li et al.<br />

2008, Verbruggen <strong>and</strong> Theriot 2008). Protein coding genes usually benefit from partitioning into<br />

codon position. Empirical studies showed that codon position models perform better than models<br />

which do not take codon position into account (Shapiro et al. 2006). In order to accommodate<br />

differences in <strong>evolution</strong>ary rate among partitions rate multipliers can be used.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!