Diapositiva 1

PAML 

Phylogenetic Analysis by Maximum Likelihood 

Yang Z. Computational Molecular Evolution 

Yang Z. 2007 

PAML 4: Phylogenetic Analysis by Maximum Likelihood. 

Mol. Biol. Evol. 24, 1586-1591. 

Yang Z. and Bielawski J.P. 2000 

Statistical methods for detecting molecular adaptation. 

TREE 15, 496-503. 

http://abacus.gene.ucl.ac.uk/ 

http://abacus.gene.ucl.ac.uk/ziheng/ziheng.html 

http://abacus.gene.ucl.ac.uk/software/paml.html 

Ziheng Yang

Adaptive Evolution 

• Most variation within or between species random fixation selectively neutral mutations. 

• Selectively deleterious purifying selection not tolerated. 

• Occasionally, mutations selective advantage positive selection (adaptive evolution) fixed in the 

population at a much higher rate. 

Why? 

• Can indicate which amino acid sites/domains are functionally important in a molecule 

• Interest in detecting whether most mutations are deleterious, advantageous or neutral 

• Identification of selected loci provide insight into the events that have shaped a species’ evolution & can 

indicate which genes have been particularly important in the evolution of a species. 

• By screening for selective signatures associated with immunity or disease susceptibility, we may be 

able to identify those genes that have been of critical importance to the development of disease 

resistance.

Amino Acid Sites Subject to Positive Selection in 

Mammalian a-Defensins 

Red: sites predicted to be under positive selection. 

Blue: sites that are 100% conserved across all OTUs. 

Lynn et al., MBE 2004.

• Widely used method to detect adaptive evolution accelerated rate of d N /d s 

– d N = nonsynonymous (protein changing) substitutions rates 

– d S = synonymous substitutions rates 

d N 

ω = dS 

ω < 1 → Nonsynonymous mutations are slightly deleterious 

ω = 1 → Amino acid changes selectively neutral 

ω > 1 → Amino acid changes selectively advantageous 

Statistical Methods to Detect Positive Selection 

Test whether dN is significantly higher than dS . 

Approximate methods 

→ Normal approximation applied to dN-dS . 

ML method 

→ Likelihood-ratio test. 

3a: test Z 

3b: confronto modelli con un likelihood ratio test 

(modello zero:=1)

Metodi di Maximum Likelihood 

• In PAML sfrutta le MCMC come metodo di esplorazione dello spazio dei 

parametri 

• Spazio dei parametri è infinito perché sono IGNOTI ed EQUIPROBABILI 

• La funzione di Likelihood verifica qual è il valore del parametro che massimizza 

la verosimiglianza con i dati. 

X→ dati 

θ→ parametro da stimare 

La probabilità di osservare i dati X può essere vista come 

una funzione del parametro ignoto θ dati i dati 

L(θ ;X) = f (θ |X) 

Il valore di θ che massimizza la likelihood è definito come 

Maximum Likelihood Estimate (MLE)

PAML → CODEML 

Models to detect positive selection acting on: 

• Particular branches/lineages of a phylogeny (branch models). 

• Particular codon (amino acid) sites (site-specific models). 

Test for adaptive evolution in the VHL (Von Hippel-Lindau ) gene 

• Dataset: 

• Objective: 

16 sequencies from different species 

Test for sites evolving under positive 

selection. 

Identify sites by using empirical Bayes

Site-specific models Allow ω vary among sites. 

H 0: uniform selective pressure among sites (M0) 

H 1: variable selective pressure among sites (M3) 

p p 

ω ω 

Likelihood ratio test (LRT) 

2Δl = 2 (l 1-l 0) χ 2 distribution 

SERVE PER VERIFICARE 

SE ω VARIA FRA I SITI. 

NON E’ CONSIDERATO UN 

TEST PER VERIFICARE LA 

PRESENZA/ASSENZA DI 

SELEZIONE

H 0: variable selective pressure but NO positive selection (M1a) 

H 1: variable selective pressure with positive selection (M2a) 



SE: 

• IL MODELLO M2a SI 

ADATTA MEGLIO AI DATI 

• IL VALORE DI ω STIMATO 

E’ >1 PER LA CLASSE DI 

SITI p 2 

UNA PROPORZIONE 

DI SITI PARI A p2 E’ 

SOTTOPOSTA A 

SELEZIONE 

POSITIVA

H 0: Beta distributed variable selective pressure (M7) 

H 1: Beta plus positive selection(M8) 



SE: 

• IL MODELLO M8 SI ADATTA 

MEGLIO AI DATI 

• IL VALORE DI ω S STIMATO E’ 

>1 PER LA CLASSE DI SITI 

p1 

UNA PROPORZIONE 

DI SITI PARI A p1 E’ 

SOTTOPOSTA A 

SELEZIONE POSITIVA 

Quando i test suggeriscono la presenza di selezione positiva si utilizzano dei metodi Bayesiani 

(BEB) per calcolare la probabilità a posteriori che ciascun codone provenga dalla classe di 

siti sotto selezione positiva.

Branch-site models 

Allow ω vary among branches. Likelihood ratio test (LRT) 

• LRT based on χ 2 can be powerful 


Alternative model (estimated ω 2>1) 

Null model (fixed ω 2=1) 

• Power is affected by (i) sequence divergence, (ii) number of lineages, and (iii) strength of positive selection 

• The most efficient way to increase power is to add lineages 

Anisimova, Bielawski, and Yang, 2001, Mol. Bio. Evol. 18:1585-1592.

Requirements for PAML Analysis 

• A coding DNA sequence alignment in PAML format. 

• A treefile in newick-like format. 

• codeml.ctl parameter file. 

• PAML installed on your machine! 

Searching DNA sequences 

Download of coding sequences from different species by querying databases such as: 

• UniProt (http://www.uniprot.org/) 

• NCBI Entrez Gene (http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene) 

• Genome browser Ensembl (http://www.ensembl.org/).

1. Tradurre in proteine 

2. Allineare 

3. Esportare un file *.meg, e un file *.fas

Coding DNA sequence alignment in PAML format 

Number of sequences 

Length of alignment 

N.B.→ Remove TGA at the end of sequences

Treefile 

• Tree must be trifurcated NOT rooted. 

• The tree is likely to represent the true relationship among the species 

• NCBITaxonomy (http://www.ncbi.nlm.nih.gov/guide/taxonomy/ ). 

• Tree exported in *.phy. 

• Converted from rooted to unrooted as suggest the author with Retree tools of Phylip. 

MEGA4 - http://www.megasoftware.net/ 

Phylip - http://evolution.genetics.washington.edu/phylip.html 

Abbreviazioni: 

PCA: Procavia capensis 

PVA: Pteropus vampyrus 

MLU: Myotis lucifugus 

BTA: Bos taurus 

SSC: Sus scrofa 

TTR: Tursiops truncatus 

FCA: Felis catus 

CFA: Canis familiaris 

STO: Spermophilus tridecemlineatus 

RNO: Rattus norvegicus 

MUS: Mus musculus 

OPR: Ochotona princeps 

PAB: Pongo abelii 

HOM: Homo sapiens 

GGO: Gorilla gorilla 

CJA: Callithrix jacchus

CODEML.ctl parameter file 

• Ziheng Yang: Computational Molecular Evolution 

• PAML User Guide 

k → transition/transversion rate ratio 

π → codon frequency 

• Yang Z, Nielsen R. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary 

models. Mol Biol Evol 2000; 17: 32-43 

• Yang Z. PAML 4: phylogenetic Analysis by Maximum Likelihood. Mol Biol Evol 2007; 24: 1586-1591 

• F EQUAL: each codon has the same frequency 

• F1X 4: codon frequency are expected from the 

frequencies of four nucleotides. 

• F3 X 4: codon frequencies are expected from 3 

sets of nucleotide frequencies for the three 

codon positions. 

• F61: all codon frequnecies as parameters

1. Create a directory to run analysis. 

2. Copy the codeml.ctl file, the “file.fas” and treefile “file.nwk” into this directory. 

3. To open command prompt in windows XP start Run type “cmd” 

4. Open folder: “cd” and folder path 

5. To run CODEML: “Codeml.exe” 

Running process…

Output Files 

Several different output files produced: 

• rst 

• Rst1 

• Rub 

• Lnf 

• 2NG.ds 

• 2NG.dn 

• 2NG.t 

• mlc → Main output file

Modello N° 

Parametri 

liberi 

k lnL Parametri stimati 

M0 1 2,86312 -2617,14 ω= 0,12877 

M1 2 2,99216 -2547,89 p0= 0,85808 ω0=0,04165 

(p1=0,14192) (ω1=1) 

M2 4 3,02129 -2547,47 p0=0,85768 ω0=0,04237 

p1=0,13527 (ω1=1) 

(p2=0,00705) ω2=2,70033 

M3 5 2,91362 -2538,98 p0=0,78851 ω0=0,02312 

p1=0,19543 ω1=0,47991 

(p2= 0,01606) ω2=2,20475 

M7 2 2,89921 -2543,01 p=0,14942 q=0,87349 

M8 4 2,9211 -2539,62 p=0,18576 q=1,38392 

p0=0,98448 (p1=0,01552) ω=2,21316 

Modelli testati 2∆ℓ df P-Value 

M0 vs M3 156,3233 4 8,98 * 10 -33 

M1 vs M2 0 2 0,6545 

M7 vs M8 6,782778 2 0,0336

Diapositiva 1

Create successful ePaper yourself

Delete template?

Save as template?