Phylogeny and molecular evolution of green algae - Phycology ...
Phylogeny and molecular evolution of green algae - Phycology ...
Phylogeny and molecular evolution of green algae - Phycology ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
16 CHAPTER 1<br />
Treefinder). In the second step log likelihoods <strong>of</strong> the guide tree under different partitioning strategies<br />
<strong>and</strong> models are calculated. Subsequently, the corresponding AIC or BIC scores are calculated <strong>and</strong><br />
compared. The condition with the lowest AIC <strong>and</strong>/or BIC score is chosen for phylogenetic analysis.<br />
Alternatively, Bayes factors (Nyl<strong>and</strong>er et al. 2004) can be used to compare different partitioning<br />
strategies <strong>and</strong> models. For each tested condition a separate Bayesian analyses has to be run which<br />
implies high computational times. This makes it unrealistic to compare many partitioning strategies<br />
<strong>and</strong> models in a Bayesian framework.<br />
Complex models <strong>of</strong> sequence <strong>evolution</strong><br />
The secondary structure <strong>of</strong> ribosomal RNA consists <strong>of</strong> loops <strong>and</strong> stems. The nucleotides in the stem<br />
regions form base pairs <strong>and</strong> are interdependent because a change on one side <strong>of</strong> the stem has to be<br />
compensated in the other side <strong>of</strong> stem to avoid malfunction <strong>of</strong> the molecule. Since models <strong>of</strong><br />
sequence <strong>evolution</strong> have to approach real <strong>evolution</strong> as close by as possible, it is recommended to<br />
incorporate this site interdependence in the model. This can be done by partitioning the ribosomal<br />
RNA into loops <strong>and</strong> stems <strong>and</strong> using a doublet model for the stem regions (Schöniger <strong>and</strong> Von<br />
Haeseler 1994). However, the use <strong>of</strong> a doublet model is computational dem<strong>and</strong>ing.<br />
Instead <strong>of</strong> partitioning protein coding genes into codon positions, a codon substitution model can be<br />
applied. In this model, nucleotide triplets are considered as a single character <strong>and</strong> changes from one<br />
triplet to another one are considered taking into account that some changes are more likely than<br />
others (e.g. synonymous versus non-synonymous substitution). Although codon substitution models<br />
are a more realistic approximation <strong>of</strong> protein sequence <strong>evolution</strong> than codon position models, they<br />
come with a very high computational cost, hindering their use for large datasets (Shapiro et al. 2006).<br />
Mixture models<br />
Mixture models <strong>of</strong>fer an attractive alternative to data partitioning <strong>and</strong> applying different models to<br />
the partitions. Whereas a partitioned analysis assumes that all sites within a partition arose from a<br />
single <strong>evolution</strong>ary process, mixture models relax this assumption by not expecting any prior<br />
partitioning <strong>and</strong> applying a set <strong>of</strong> different models to each site in the alignment. The log likelihood <strong>of</strong><br />
each site is calculated as a weighted sum <strong>of</strong> the log likelihoods <strong>of</strong> each model for that site. The model<br />
weights correspond to the probability that the site has evolved under the model in question. Mixture<br />
models can thus apply different rate matrices to different parts <strong>of</strong> the dataset without explicitly<br />
partitioning it (Pagel <strong>and</strong> Meade 2004, Venditti et al. 2008). This is an elegant way to incorporate<br />
across site heterogeneity in the <strong>evolution</strong>ary process because it does not require prior knowledge<br />
about differences <strong>of</strong> <strong>evolution</strong>ary processes between different parts <strong>of</strong> the dataset <strong>and</strong> it avoids<br />
problems associated with differences <strong>of</strong> the <strong>evolution</strong>ary process within partitions that are defined a<br />
priori. Although analyses using mixture models outperform analyses based on partitioned datasets,<br />
they are restrictively time-consuming for large datasets.