13.07.2015 Views

Introduction to MrBayes Bayes' theorem - Molecular Evolution

Introduction to MrBayes Bayes' theorem - Molecular Evolution

Introduction to MrBayes Bayes' theorem - Molecular Evolution

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Majority ruleconsensus treeFrequenciesrepresent theposteriorprobability ofthe cladesProbability ofclade being truegiven data, modeland priorSummarizing Variables Mean, median, variance commonsummaries 95 % credible interval: discard thelowest 2.5 % and highest 2.5 % ofsampled values 95 % region of highest posteriordensity (HPD): find smallest regioncontaining 95 % of probabilityCredible intervals and HPDsHPDHPDHPDMean and 95%credibilityinterval for modelparametersCredible intervalAssessing Convergence Plateau in the trace plot Look at sampling behavior within therun (au<strong>to</strong>correlation time or effectivesample size (ESS)) Compare independent runs withdifferent, randomly chosen startingpointsConvergence among RunsTree <strong>to</strong>pology: Compare clade probabilities (split frequencies) Average standard deviation of split frequenciesabove some cut-off (min. 10 % in at least onerun). Should go <strong>to</strong> 0 as runs converge.Continuous variables Potential scale reduction fac<strong>to</strong>r (PSRF).Compares variance within and between runs.Should approach 1 as runs converge.Assumes overdispersed starting points3


Clade probability in analysis 2Clade probability in analysis 2Clade probability in analysis 1Clade probability in analysis 1Improving Convergence Change proposal tuning parameters Change proposal probabilities Use different proposals Use Metropolis coupling (heatedchains)Target distributionToo modest proposalsAcceptance rate <strong>to</strong>o highPoor mixingSampled valueToo bold proposalsAcceptance rate <strong>to</strong>o lowPoor mixingModerately bold proposalsAcceptance rate intermediateGood mixing4


Tuning ProposalsManually by changing tuning parameters Increase the boldness of a proposal ifacceptance rate is <strong>to</strong>o high Decrease the boldness of a proposal ifacceptance rate is <strong>to</strong>o lowAdaptive tuning Tuning parameters are adjusted au<strong>to</strong>maticallyby the MCMC procedure <strong>to</strong> reach a targetproposal rateMetropoliscoupledMarkov chainMonte Carloa. k. a.MCMCMCa. k. a.(MC) 3cold chainheated chainIncremental HeatingT is temperature, λ is heating coefficient1 /( ) i = { 0,1,...,n −1}T = 1+ λiExample for λ = 0.2:i T0 1.00 f1 0.83 f2 0.71 f3 0.62 fDistr.1.00( θ | X )0.83( θ | X )0. 71( θ | X )( θ | X ) 0.62cold chainheated chainsRunning <strong>MrBayes</strong> Use execute <strong>to</strong> bring data in a Nexus file in<strong>to</strong><strong>MrBayes</strong> Set the model and priors using lset and prset Run the chain using mcmc; results in a set of .pand .t files The .t files contain tree samples in Nexusformat The .p files contain tab-delimited samples of themodel parameters Summarize the parameter samples using sump Summarize the tree samples using sumtConvergence DiagnosticsBy default, <strong>MrBayes</strong> performs two independentanalyses starting from different random trees(mcmc nruns=2)Average standard deviation of clade frequenciescalculated and presented during the run (mcmcmcmcdiagn=yes diagnfreq=5000) and written <strong>to</strong>file (.mcmc). Suggested target < 0.05.Standard deviation of each clade frequency andpotential scale reduction for branch lengthscalculated with sumtPotential scale reduction calculated for allsubstitution model parameters with sumpOther <strong>to</strong>ols: Awty and Tracerf (θ | D) =∫Bayes’ Rulef (θ) f (D |θ)=f (θ) f (D |θ) dθf (θ) f (D |θ)f (D)Marginal likelihood (of the data)We have implicitly conditioned on a model:f (θ | D, M) =f (θ | M) f (D |θ, M)f (D | M)5


<strong>MrBayes</strong> 3.2 – cont’dRun diagnostics including acceptance rates andtuning parameters output <strong>to</strong> .mcmc fileMax or average standard deviation of splitfrequencies, ESS and PSRF for continuousparameters; choice between HPD and credibleintervalMore extensive output from sumt; full support ofFigTreeCheck-point and append <strong>to</strong> previous resultsacross all models (default checkfreq=100000)New likelihood calcula<strong>to</strong>rs (default SSE code,full Beagle support including CPU and GPU code,up <strong>to</strong> 50 % reduction in memory requirement, 2 -100 x faster)Tree ProposalsAll <strong>to</strong>pology moves now always propose<strong>to</strong>pology changes and make minimal changes<strong>to</strong> branch lengthsSpecialized branch length or node heightmoves used more oftenRicher mixtures of tree proposals,particularly for clock and relaxed clocktreesParsimony-biased <strong>to</strong>pology proposals(ParsSPR, ParsSPRClock); convergence canbe up <strong>to</strong> an order of magnitude faster357 taxa, ~3 kb, atpB + rbcL4 x 1 chains (no heating)traditional <strong>to</strong>pology proposals(ExtTBR, ExtSPR, NNI)timeClock and Non-clock TreesAB C D E Ft 1t 2t 3evolutionaryACchangeB v 3Dv v 1 v 42EFv v 7 8vv 65Parsimony-biased proposal(ParsSPR)t 4t 5v 9Clock treen – 1 node timesNon-clock tree2n – 3 branch lengthstimeRelaxed clocks and datingA B C D E Fm 5m 2r 14A B C D E Fr 10mr 2r 3 r 4r 5r 6m 1r 7r 8 r 9m 3Branch rate models:r i follow Brownian motionr i drawn iidboth cases one variance param.Compound Poisson Process (CPP):Rate multipliers m drawn iid andgenerated according <strong>to</strong> a Poissonprocess; variance and rate parametersRelaxed clocks and dating<strong>MrBayes</strong> implements three relaxed clock models: The Compound Poisson Process (CPP) relaxed clock(discrete au<strong>to</strong>correlated model) The TK02 (bm) model (continuous au<strong>to</strong>correlated model) The “White Noise” (ibr) model (continuous trulyuncorrelated model)Date using tip and/or node calibrationsDates can be fixed or associated with uncertaintyRich summaries from sumt, including effectivebranch lengths, rates and agesSummary trees guaranteed <strong>to</strong> be clock trees andhave positive branch lengths7


Clock and relaxedclockanalyses do notuse an outgroupMorphological Models<strong>Molecular</strong> modelJCACGTMorphological modelMk0 12 3Rooting helps inrelaxed-clock analysesArbitrary state labels give constraintsDating based on simultaneous analysis of extantforms and fossilsABC D yEF G Hxtimez Uncertainty in phylogeny of extant forms Uncertainty in placement of fossils Model distance fossils <strong>to</strong> ances<strong>to</strong>rs Model preservation probabilityGene and Species TreesGene trees evolve inside species treesGene tree – species tree mismatch occursfrequently among closely related speciesModel gene trees as coalescing withinspecies trees, potentially each species(branch) having a unique population sizeTwo implementations: Edwards, Liu and Pearl, PNAS, 2007 (BEST) andsubsequent papers Heled and Drummond, MBE, 2010 (StarBEAST)No introgression or hybridization<strong>MrBayes</strong><strong>MrBayes</strong>Gene tree samplesGene tree samplesComplete modelprobabilities<strong>MrBayes</strong> 3.2BESTBESTBEST8


Some AdviceIf you use ModelTest or MrModelTest: Do not fixparameters in <strong>MrBayes</strong>Run at least 1,000,000 generationsDon’t worry if average standard deviation of splitfrequencies fluctuate, especially in the beginning of the runSave time by running the analysis without heating - workswell for many analysesPrefer Unix version, use MPI version on multiprocessor andmulticore machines, use SSE code on CPU or Beagle on GPUIf you have difficulties with convergence: Change relative proposal probabilities or tuning parameters If you use heated chains and there are few swaps betweenchains, try <strong>to</strong> lower the temperature coefficient Increase the number of heated chains Run the analysis longer Make the model more realistic Start with randomly perturbed good treesRevBayesWhat’s next?R-like and BUGS-like computingenvironment for Bayesian phylogeneticsObject-oriented programming languageFlexible model specification: you programyour own modelLanguage related <strong>to</strong> graphical modelrepresentationGraphical model builder: John and EdnaHuelsenbeckOutput files adapted for analysis in RWhy Bayesian?Priors: accumulate scientific knowledgeEasy <strong>to</strong> deal with complex models, andcurrent models need <strong>to</strong> be improvedComputational efficiency: <strong>to</strong>day hundredsof taxa, probably thousands in a few yearsConvergence diagnostics <strong>to</strong> detectproblems with convergenceModel testing with Bayes fac<strong>to</strong>rs andMCMC sampling across model spaceAcknowledgmentsMaxim TeslenkoSeraina KlopfsteinJohn Huelsenbeck, Sebastian Hoehna, TomBrit<strong>to</strong>n, Bret Larget, Tanja Stadler, Paul van derMarkThe Beagle group: Daniel Ayres, Marc Suchard,Aaron Darling, Andrew RambautLiu Liang and the BEST teamArne Mooers, Klaas Hartmann and colleaguesMany users…11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!