22.12.2012 Views

Phylogeny and molecular evolution of green algae - Phycology ...

Phylogeny and molecular evolution of green algae - Phycology ...

Phylogeny and molecular evolution of green algae - Phycology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

16 CHAPTER 1<br />

Treefinder). In the second step log likelihoods <strong>of</strong> the guide tree under different partitioning strategies<br />

<strong>and</strong> models are calculated. Subsequently, the corresponding AIC or BIC scores are calculated <strong>and</strong><br />

compared. The condition with the lowest AIC <strong>and</strong>/or BIC score is chosen for phylogenetic analysis.<br />

Alternatively, Bayes factors (Nyl<strong>and</strong>er et al. 2004) can be used to compare different partitioning<br />

strategies <strong>and</strong> models. For each tested condition a separate Bayesian analyses has to be run which<br />

implies high computational times. This makes it unrealistic to compare many partitioning strategies<br />

<strong>and</strong> models in a Bayesian framework.<br />

Complex models <strong>of</strong> sequence <strong>evolution</strong><br />

The secondary structure <strong>of</strong> ribosomal RNA consists <strong>of</strong> loops <strong>and</strong> stems. The nucleotides in the stem<br />

regions form base pairs <strong>and</strong> are interdependent because a change on one side <strong>of</strong> the stem has to be<br />

compensated in the other side <strong>of</strong> stem to avoid malfunction <strong>of</strong> the molecule. Since models <strong>of</strong><br />

sequence <strong>evolution</strong> have to approach real <strong>evolution</strong> as close by as possible, it is recommended to<br />

incorporate this site interdependence in the model. This can be done by partitioning the ribosomal<br />

RNA into loops <strong>and</strong> stems <strong>and</strong> using a doublet model for the stem regions (Schöniger <strong>and</strong> Von<br />

Haeseler 1994). However, the use <strong>of</strong> a doublet model is computational dem<strong>and</strong>ing.<br />

Instead <strong>of</strong> partitioning protein coding genes into codon positions, a codon substitution model can be<br />

applied. In this model, nucleotide triplets are considered as a single character <strong>and</strong> changes from one<br />

triplet to another one are considered taking into account that some changes are more likely than<br />

others (e.g. synonymous versus non-synonymous substitution). Although codon substitution models<br />

are a more realistic approximation <strong>of</strong> protein sequence <strong>evolution</strong> than codon position models, they<br />

come with a very high computational cost, hindering their use for large datasets (Shapiro et al. 2006).<br />

Mixture models<br />

Mixture models <strong>of</strong>fer an attractive alternative to data partitioning <strong>and</strong> applying different models to<br />

the partitions. Whereas a partitioned analysis assumes that all sites within a partition arose from a<br />

single <strong>evolution</strong>ary process, mixture models relax this assumption by not expecting any prior<br />

partitioning <strong>and</strong> applying a set <strong>of</strong> different models to each site in the alignment. The log likelihood <strong>of</strong><br />

each site is calculated as a weighted sum <strong>of</strong> the log likelihoods <strong>of</strong> each model for that site. The model<br />

weights correspond to the probability that the site has evolved under the model in question. Mixture<br />

models can thus apply different rate matrices to different parts <strong>of</strong> the dataset without explicitly<br />

partitioning it (Pagel <strong>and</strong> Meade 2004, Venditti et al. 2008). This is an elegant way to incorporate<br />

across site heterogeneity in the <strong>evolution</strong>ary process because it does not require prior knowledge<br />

about differences <strong>of</strong> <strong>evolution</strong>ary processes between different parts <strong>of</strong> the dataset <strong>and</strong> it avoids<br />

problems associated with differences <strong>of</strong> the <strong>evolution</strong>ary process within partitions that are defined a<br />

priori. Although analyses using mixture models outperform analyses based on partitioned datasets,<br />

they are restrictively time-consuming for large datasets.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!