22.12.2012 Views

Phylogeny and molecular evolution of green algae - Phycology ...

Phylogeny and molecular evolution of green algae - Phycology ...

Phylogeny and molecular evolution of green algae - Phycology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

30 CHAPTER 2<br />

Model selection procedure<br />

A suitable partitioning strategy <strong>and</strong> suitable models <strong>of</strong> sequence <strong>evolution</strong> were selected with a<br />

three-step procedure based on the Bayesian Information Criterion (BIC). The guide tree used during<br />

the entire procedure was obtained by ML analysis <strong>of</strong> the unpartitioned concatenated alignment with<br />

PhyML, using a JC+ 8 model (Guindon <strong>and</strong> Gascuel 2003). All subsequent likelihood optimizations<br />

<strong>and</strong> BIC calculations were carried out with Treefinder (Jobb et al. 2004). The first step consisted <strong>of</strong><br />

optimizing the likelihood for eight potential partitioning strategies, assuming a HKY+ 8 model for<br />

each partition. The three partitioning strategies with the best fit to the data (lowest BIC scores) were<br />

retained for further evaluation. The second step involved model selection for individual partitions.<br />

The likelihood <strong>of</strong> each partition present in the three retained partitioning strategies was optimized<br />

for three variants <strong>of</strong> the general time reversible model (F81, HKY <strong>and</strong> GTR), with <strong>and</strong> without among-<br />

site rate heterogeneity (+ 8). Because not all genes were sampled for all taxa, the guide tree was<br />

pruned to the taxa present in the partition in question before each optimization. The partitionspecific<br />

models obtaining the lowest BIC score were passed on to the third step, which consisted <strong>of</strong> a<br />

re-evaluation <strong>of</strong> the three partitioning strategies retained from the first step using the models<br />

selected for these partitions in the second step. The partitioning strategy plus model combination<br />

that received the lowest BIC score in the third step was used in the phylogenetic analyses. The model<br />

selection procedure proposed 8 partitions: SSU nrDNA was partitioned into loops <strong>and</strong> stems (2<br />

partitions), nuclear <strong>and</strong> plastid genes were partitioned according to codon position (2 3 partitions).<br />

GTR+ 8 was the optimal model for all partitions.<br />

Phylogenetic analysis<br />

Maximum likelihood analysis was performed with TreeFinder, which allows likelihood searches under<br />

partitioned models (Jobb et al. 2004). Due to the relatively low tree space coverage in TreeFinder<br />

compared to other ML programs, an analysis pipeline was created to increase tree space coverage by<br />

running analyses from many starting trees. A first set <strong>of</strong> starting trees was created by r<strong>and</strong>omly<br />

modifying the PhyML guide tree by 100 <strong>and</strong> 200 nearest neighbor interchange (NNI) steps (50<br />

replicates each). ML searches were run from these 100 starting trees <strong>and</strong> the three trees yielding the<br />

highest likelihood were used as the starting point for another set <strong>of</strong> NNI modifications <strong>of</strong> 20 <strong>and</strong> 50<br />

steps (50 replicates each). A second set <strong>of</strong> ML searches was run from each <strong>of</strong> the resulting 300<br />

starting trees. The tree with the highest likelihood resulting from this set <strong>of</strong> analyses was retained as<br />

the global ML solution. The bootstrap resampling method was used to assess statistical support<br />

(1000 pseudo-replicates).<br />

Bayesian phylogenetic inference was carried out with MrBayes 3.1.2 (Ronquist <strong>and</strong> Huelsenbeck<br />

2003). Two parallel runs, each consisting <strong>of</strong> four incrementally heated chains were run for 25 million<br />

generations, sampling every thous<strong>and</strong> generations. Convergence <strong>of</strong> log-likelihoods <strong>and</strong> parameter<br />

values was assessed in Tracer v1.4 (Rambaut <strong>and</strong> Drummond 2007). A burnin sample <strong>of</strong> 2.5 million<br />

trees was removed before constructing the majority rule consensus tree.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!