Phylogeny and molecular evolution of green algae - Phycology ...
Phylogeny and molecular evolution of green algae - Phycology ...
Phylogeny and molecular evolution of green algae - Phycology ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
30 CHAPTER 2<br />
Model selection procedure<br />
A suitable partitioning strategy <strong>and</strong> suitable models <strong>of</strong> sequence <strong>evolution</strong> were selected with a<br />
three-step procedure based on the Bayesian Information Criterion (BIC). The guide tree used during<br />
the entire procedure was obtained by ML analysis <strong>of</strong> the unpartitioned concatenated alignment with<br />
PhyML, using a JC+ 8 model (Guindon <strong>and</strong> Gascuel 2003). All subsequent likelihood optimizations<br />
<strong>and</strong> BIC calculations were carried out with Treefinder (Jobb et al. 2004). The first step consisted <strong>of</strong><br />
optimizing the likelihood for eight potential partitioning strategies, assuming a HKY+ 8 model for<br />
each partition. The three partitioning strategies with the best fit to the data (lowest BIC scores) were<br />
retained for further evaluation. The second step involved model selection for individual partitions.<br />
The likelihood <strong>of</strong> each partition present in the three retained partitioning strategies was optimized<br />
for three variants <strong>of</strong> the general time reversible model (F81, HKY <strong>and</strong> GTR), with <strong>and</strong> without among-<br />
site rate heterogeneity (+ 8). Because not all genes were sampled for all taxa, the guide tree was<br />
pruned to the taxa present in the partition in question before each optimization. The partitionspecific<br />
models obtaining the lowest BIC score were passed on to the third step, which consisted <strong>of</strong> a<br />
re-evaluation <strong>of</strong> the three partitioning strategies retained from the first step using the models<br />
selected for these partitions in the second step. The partitioning strategy plus model combination<br />
that received the lowest BIC score in the third step was used in the phylogenetic analyses. The model<br />
selection procedure proposed 8 partitions: SSU nrDNA was partitioned into loops <strong>and</strong> stems (2<br />
partitions), nuclear <strong>and</strong> plastid genes were partitioned according to codon position (2 3 partitions).<br />
GTR+ 8 was the optimal model for all partitions.<br />
Phylogenetic analysis<br />
Maximum likelihood analysis was performed with TreeFinder, which allows likelihood searches under<br />
partitioned models (Jobb et al. 2004). Due to the relatively low tree space coverage in TreeFinder<br />
compared to other ML programs, an analysis pipeline was created to increase tree space coverage by<br />
running analyses from many starting trees. A first set <strong>of</strong> starting trees was created by r<strong>and</strong>omly<br />
modifying the PhyML guide tree by 100 <strong>and</strong> 200 nearest neighbor interchange (NNI) steps (50<br />
replicates each). ML searches were run from these 100 starting trees <strong>and</strong> the three trees yielding the<br />
highest likelihood were used as the starting point for another set <strong>of</strong> NNI modifications <strong>of</strong> 20 <strong>and</strong> 50<br />
steps (50 replicates each). A second set <strong>of</strong> ML searches was run from each <strong>of</strong> the resulting 300<br />
starting trees. The tree with the highest likelihood resulting from this set <strong>of</strong> analyses was retained as<br />
the global ML solution. The bootstrap resampling method was used to assess statistical support<br />
(1000 pseudo-replicates).<br />
Bayesian phylogenetic inference was carried out with MrBayes 3.1.2 (Ronquist <strong>and</strong> Huelsenbeck<br />
2003). Two parallel runs, each consisting <strong>of</strong> four incrementally heated chains were run for 25 million<br />
generations, sampling every thous<strong>and</strong> generations. Convergence <strong>of</strong> log-likelihoods <strong>and</strong> parameter<br />
values was assessed in Tracer v1.4 (Rambaut <strong>and</strong> Drummond 2007). A burnin sample <strong>of</strong> 2.5 million<br />
trees was removed before constructing the majority rule consensus tree.