11.07.2015 Views

Non-equilibrium theory of the allele frequency spectrum ... - ICMS

Non-equilibrium theory of the allele frequency spectrum ... - ICMS

Non-equilibrium theory of the allele frequency spectrum ... - ICMS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Non</strong>-<strong>equilibrium</strong> <strong><strong>the</strong>ory</strong> <strong>of</strong> <strong>the</strong> <strong>allele</strong> <strong>frequency</strong> <strong>spectrum</strong>Steven N. EvansDepartments <strong>of</strong> Statistics and Ma<strong>the</strong>maticsand Graduate Group in Computational and Genomic BiologyUniversity <strong>of</strong> California at Berkeleyevans@stat.berkeley.eduhttp://www.stat.berkeley.edu/users/evans(with Yelena Shvets and Montgomery Slatkin)Research supported in part by NSF and NIH grants


Contemporary traces <strong>of</strong> past human population growth?It is suggested in Reich and Lander (2001) and Williamson et al.(2005) that <strong>the</strong> effective size <strong>of</strong> <strong>the</strong> modern human populationwas stable at around 10 4 until 1.5 × 10 5 years ago.The current effective size is 1.6 × 10 9 .If <strong>the</strong> population grew according to a given schedule over thisperiod (e.g. exponential growth at a constant rate), how wouldthis manifest itself in <strong>the</strong> genetic composition <strong>of</strong> contemporaryhumans?


Equilibria under stable conditionsIf mutations are not reversible at a polymorphic locus, <strong>the</strong>n one<strong>allele</strong> will eventually fix <strong>the</strong>re (i.e. no o<strong>the</strong>r <strong>allele</strong>s will be presentin <strong>the</strong> population).However, <strong>the</strong> distribution <strong>of</strong> <strong>allele</strong> frequencies across polymorphicloci (<strong>the</strong> <strong>frequency</strong> <strong>spectrum</strong>) reaches a non-trivial <strong>equilibrium</strong> ifboth population size and selection intensities are constant.The <strong><strong>the</strong>ory</strong> for <strong>the</strong> <strong>equilibrium</strong> <strong>frequency</strong> <strong>spectrum</strong> under irreversiblemutation was developed in Fisher (1930), Wright(1938), Kimura (1964). Kimura (1969), Sawyer and Hartl(1992), Bustamante et al. (2001), Williamson et al. (2004),...


What if population size or selection intensities change?Nei et al. (1975) showed that rapid growth resulted in more low<strong>frequency</strong> <strong>allele</strong>s than expected under neutrality.Tajima (1989) confirmed that conclusion and examined <strong>the</strong> effect<strong>of</strong> past population growth on o<strong>the</strong>r aspects <strong>of</strong> <strong>the</strong> <strong>frequency</strong><strong>spectrum</strong>.Griffiths and Tavaré (1998) developed <strong>the</strong> coalescent <strong><strong>the</strong>ory</strong> for<strong>the</strong> <strong>frequency</strong> <strong>spectrum</strong> <strong>of</strong> neutral <strong>allele</strong>s in a population that hasexperienced arbitrary changes in population size (related work byNielsen (2000), Wooding and Rogers (2002) and Polanski andKimmel (2003)).


What if population size or selection intensities change? –continuedGriffiths (2003): <strong>the</strong> <strong>frequency</strong> <strong>spectrum</strong> in a neutral population<strong>of</strong> variable size could be derived from <strong>the</strong> <strong>spectrum</strong> for a population<strong>of</strong> constant size when a transformation <strong>of</strong> <strong>the</strong> time scalereduces a suitable backwards equation to one for a population<strong>of</strong> constant size.We present a forward equation approach that computes <strong>the</strong> <strong>frequency</strong><strong>spectrum</strong> for arbitrary population growth and changes inselection intensity.


Discrete time finite population modelsMonoecious, randomly-mating, diploid population containingN(t) individuals in generation t ∈ {0, 1, 2, . . .}.Independent loci.At each locus <strong>the</strong>re are only two <strong>allele</strong>s A, <strong>the</strong> derived <strong>allele</strong>, anda <strong>the</strong> ancestral <strong>allele</strong>. Mutation only occurs from ancestral toderived, at loci that haven’t seen mutations before (infinite sitesassumption).Large number <strong>of</strong> loci – effectively infinite pool.


Discrete time finite population models – continuedPut f j (t) := expected number <strong>of</strong> loci at which A is found on jchromosomes, 1 ≤ j ≤ 2N(t).Put p ij (t) := conditional probability that a locus with i copies <strong>of</strong>A in generation t will have j copies in generation t + 1.The change in f j (t) because <strong>of</strong> genetic drift and mutation atrate µ is described by <strong>the</strong> set <strong>of</strong> “forward” difference equationsf j (t + 1) =2N(t)∑i=1f i (t)p ij (t) + 2N(t)µδ 1,j , 1 ≤ j ≤ 2N(t + 1).


An analogous forward equationRecallf j (t + 1) =2N(t)∑i=1f i (t)p ij (t) + 2N(t)µδ 1,j , 1 ≤ j ≤ 2N(t + 1).If π j (t) := probability <strong>of</strong> a given locus having j copies <strong>of</strong> A ingeneration t, <strong>the</strong>nπ j (t + 1) =2N(t)∑i=1π i (t)p ij (t).


The diffusion limit for a single locusAssume for now that p ij (t) = p ij – time-homogeneous case.Assume for now # chromosomes constant at 2Nρ.Suppose that if we shrink space by a factor <strong>of</strong> 2Nρ and speedtime up by a factor <strong>of</strong> 2N, <strong>the</strong>n <strong>the</strong> chain converges to a diffusionprocess on [0, 1] (call it <strong>the</strong> 0-diffusion) with generatorG = a(x)dx d + 1 d22b(x) dx 2.This is <strong>the</strong> scaling regime that is appropriate for models such asWright-Fisher with or without selection.


The single locus diffusion forward equationWrite π(y, t) for <strong>the</strong> probability density <strong>of</strong> <strong>the</strong> 0-diffusion at <strong>frequency</strong>y ∈ (0, 1) and time t > 0.The Kolomogorov forward equation∂∂π(y, t) = −∂t ∂y [a(y)π(y, t)] + 1 2∂y2[b(y)π(y, t)].holds with suitable boundary conditions.∂ 2


The diffusion <strong>frequency</strong> <strong>spectrum</strong> without mutationSuppose at time 0 that <strong>the</strong>re are countably many loci at whichderived <strong>allele</strong>s are present, with respective frequencies x 1 , x 2 , . . ..Assume for now <strong>the</strong>re is no fur<strong>the</strong>r mutation from <strong>the</strong> ancestralstate.After passage to <strong>the</strong> diffusion limit, <strong>the</strong> <strong>frequency</strong> <strong>spectrum</strong> attime t is just <strong>the</strong> intensity measure <strong>of</strong> <strong>the</strong> point process thatcomes from starting independent copies <strong>of</strong> <strong>the</strong> 0-diffusion processat each <strong>of</strong> <strong>the</strong> x i and letting <strong>the</strong>m run to time t.


The forward equation without mutationThe <strong>frequency</strong> <strong>spectrum</strong> is obtained by taking a sum <strong>of</strong> pointmasses at <strong>the</strong> x i and moving it forwards with <strong>the</strong> 0-diffusionsemigroup.Set f o (y, t) := <strong>frequency</strong> <strong>spectrum</strong>, <strong>the</strong>n∂∂t f o (y, t) = − ∂∂y [a(y)f o (y, t)] + 1 2with suitable boundary conditions.∂ 2∂y 2[b(y)f o (y, t)].


Introducing mutation from <strong>the</strong> ancestral typeIn <strong>the</strong> Markov chain model, new mutants arise at rate 2Nρµ = θ 2 ρper generation.The initial number <strong>of</strong> mutants at a locus is 1.Hence mutants appear at rate 2N θ 2ρ per unit <strong>of</strong> rescaled time,with <strong>the</strong> initial proportion <strong>of</strong> mutants at a locus being 12Nρ .


Introducing mutation from <strong>the</strong> ancestral type – continuedPass to <strong>the</strong> diffusion limit for <strong>the</strong> <strong>allele</strong> frequencies, but for nowstill work with a finite N for <strong>the</strong> description <strong>of</strong> <strong>the</strong> appearance<strong>of</strong> new mutants.The evolving point process has new points added at location 12Nρat rate θ 2 2Nρ.The points <strong>the</strong>n evolve as independent copies <strong>of</strong> <strong>the</strong> diffusion.


Introducing mutation from <strong>the</strong> ancestral type – continuedSet P t (x, dy) := 0-diffusion semigroup.Contribution to <strong>the</strong> <strong>frequency</strong> <strong>spectrum</strong> at y ∈ (0, 1) from mutationsthat appear after time 0 =2N θ ∫ ( )t 12 ρ P t−s0 2Nρ , dy ds.


Entrance boundary <strong><strong>the</strong>ory</strong>Choose a scale function s for <strong>the</strong> 0-diffusion such that s(0) = 0and s ′ (0) = 1.The Doob h-transformP ↑ u(x, dy) := 1s(x) P u(x, dy)s(y), 0 < x, y ≤ 1,is <strong>the</strong> semigroup <strong>of</strong> a diffusion that never hits 0 (<strong>the</strong> ↑-diffusion)The ↑-semigroup can be extended to allow starting at 0 by settingP ↑ u(0, dy) = limx↓0P ↑ u(x, dy).The extended process can start at 0 but it never returns to 0.


Entrance boundary <strong><strong>the</strong>ory</strong> – continuedPutlimN→∞ 2NρP u(12Nρ , dy )= P ↑ u(0, dy)s(y)=: λ u (dy),<strong>the</strong>n ∫ λ s (dx)P t (x, dy) = λ s+t (dy) and (λ u ) u>0 has densities thatsatisfy <strong>the</strong> forward equation.Hencelim 2N θ ∫ ( )t 1N→∞ 2 ρ P t−s0 2Nρ , dyds = θ 2∫ t0 λ t−s(dy) dsalso has densities φ t (x) that satisfy <strong>the</strong> forward equation.What are <strong>the</strong> boundary conditions?


Entrance boundary <strong><strong>the</strong>ory</strong> – continuedThe ↓-diffusion is <strong>the</strong> 0-diffusion conditioned to hit 0 before 1.It has <strong>the</strong> Doob h-transform semigroup(Pt ↓ (x, dy) = 1 − s(x)) −1 (P t (x, dy) 1 − s(y)).s(1)s(1)From Williams (1974), <strong>the</strong> ↑-diffusion started at 0 and killed at<strong>the</strong> last time it visits y > 0 is <strong>the</strong> time-reversal <strong>of</strong> <strong>the</strong> ↓-diffusionstarted at y and killed when it first hits 0.Write (Q ↓ t ) t≥0 := semigroup <strong>of</strong> killed ↓-diffusion.


Finding <strong>the</strong> boundary conditionSince s(y) ≈ y for y close to 0,lim yφ t (y) = lim s(y)φ t (y)y↓0 y↓0= limy↓0s(y)= θ 2 limy↓0= θ 2 limy↓0= θ 2 limy↓0= θ 2 limy↓0∫ t0∫ t0∫ ∞0∫ ∞0∫ t0θ λ t−s (dy)ds2 dyPt−s ↑ (0, dy)dyP ↑ s (0, dy)dyP ↑ s (0, dy)dydsdsdsQ ↓ s(y, dy)ds.dy


Finding <strong>the</strong> boundary condition – continuedIf (B t ) t≥0 is a standard Brownian motion and T := inf{t ≥ 0 :B t = 0}, <strong>the</strong>n∫ ∞limy↓0 0P y {B s ∈ dy, T > s}dy∫ ∞ 1ds = lim √ − √ 1 e −(2y)2 /2s dsy↓0 0 2πs 2πs= 2y.By Itô–McKean <strong><strong>the</strong>ory</strong>,∫θ ∞2 limy↓0 0Q ↓ s(y, dy)dyds = θ limy↓0yb(y) .


Conclusion for <strong>the</strong> time-homogeneous casePut f(x, t) := for <strong>the</strong> <strong>frequency</strong> <strong>spectrum</strong> <strong>of</strong> <strong>the</strong> model with mutationfrom ancestral type, with everything time-homogeneous.Note that f(x, t) = f o (x, t) + φ t (x).Hence∂∂f(x, t) = −∂t ∂x [a(x)f(x, t)] + 1 2∂x2[b(x)f(x, t)].with lim x↓0 xf(x, t) = θ limx x↓0 b(x) and lim x↑1 f(x, t) finite.∂ 2


Conclusion for <strong>the</strong> time-inhomogeneous caseSuppose that everything is allowed to depend on time.Write f(x, t) for <strong>the</strong> <strong>frequency</strong> <strong>spectrum</strong>.Allowing a, b, θ and ρ to be piecewise constant, using <strong>the</strong> aboveanalysis, and <strong>the</strong>n taking limits gives∂∂f(y, t) = −∂t ∂y [a(y, t)f(y, t)] + 1 2∂y2[b(y, t)f(y, t)]with lim y↓0 yf(y, t) = θ(t) limx x↓0 b(x,t) and lim y↑1 f(y, t) finite.∂ 2


Structured populationsThe same approach works for systems <strong>of</strong> populations interactingvia migration.A single PDE with boundary conditions is replaced by a family<strong>of</strong> coupled PDE with boundary conditions.


ExampleTake θ(t) = θ, a(x, t) = Sx(1 − x), and b(x, t) = x(1 − x)/ρ(t) ≡Wright-Fisher diffusion with constant mutation, additive selection,and varying population size (2Nρ(t) in generation 2Nt).Set g(x, t) := x(1 − x)f(x, t).The forward equation is∂∂tg(x, t) = −Sx(1 − x)∂∂xwith lim x↓0 g(x, t) = θρ(t).[g(x, t)] +x(1 − x)2ρ(t)∂ 2[g(x, t)]∂x2


Example – continuedPut µ n (t) := ∫ 10 x n g(x, t) dx = ∫ 10 x n x(1 − x)f(x, t) dx.Integrating by parts gives <strong>the</strong> coupled system <strong>of</strong> ODEsandµ ′ 0 (t) = θ 2 − 1ρ(t) µ 0(t) + S (µ 0 (t) − 2µ 1 (t))µ ′ n(t) = 12ρ(t)[(n + 1)nµn−1 (t) − (n + 2)(n + 1)µ n (t) ]+ S ( (n + 1)µ n (t) − (n + 2)µ n+1 (t) ) , n ≥ 1.


Frequency <strong>spectrum</strong> in a finite sampleIn a sample <strong>of</strong> n chromosomes <strong>the</strong> expected number <strong>of</strong> loci atwhich <strong>the</strong> derived <strong>allele</strong> is found on i chromosomes isf i (t) = ( ni= ( ni) ∫ 10 xi (1 − x) n−i f(x, t) dx) n−i−1 ∑j=0(−1) j( n − i − 1j)µj+i−1 (t).


Recent human population growthAssume an effective size N 0 = 10 4 until 1.5 × 10 5 ya (t = 0).Assume a generation time <strong>of</strong> 25 years.Measuring time in units <strong>of</strong> 2N 0 , <strong>the</strong> present is at t = 0.3.The current effective population size is 1.6×10 9 = 10 5 ×e 40×0.3 .Assume exponential growth, so ρ(t) = e 40t , and θ(t) = 1.Assume that <strong>the</strong> <strong>spectrum</strong> at t = 0 is <strong>the</strong> <strong>equilibrium</strong>f(x, 0) = e2S ( 1 − e −2S(1−x))(e 2S − 1 ) x(1 − x)


10090f(x,0)f(x,0.3))8070f(x)60504030201000 0.1 0.2 0.3 0.4 0.5x0.6 0.7 0.8 0.9 1Plots <strong>of</strong> f(x, t) for S = 0


10090f(x,0)f(x,0.3))8070f(x)60504030201000 0.1 0.2 0.3 0.4 0.5x0.6 0.7 0.8 0.9 1Plots <strong>of</strong> f(x, t) for S = +2


10090f(x,0)f(x,0.3))807060f(x)504030201000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1xPlots <strong>of</strong> f(x, t) for S = −2


43.5f i(n=20,t=0)f i(n=20,t=0.3))32.5f i21.510.500 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1i/n = number <strong>of</strong> derived <strong>allele</strong>s / sample sizePlots <strong>of</strong> f i (t) for S = 0 and n = 20


54.5f i(n=20,t=0)f i(n=20,t=0.3))43.53f i2.521.510.500 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1i/n = number <strong>of</strong> derived <strong>allele</strong>s / sample sizePlots <strong>of</strong> f i (t) for S = +2 and n = 20


3f i(n=20,t=0)f i(n=20,t=0.3)2.52f i1.510.500 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1i/n = number <strong>of</strong> derived <strong>allele</strong>s/ sample sizePlots <strong>of</strong> f i (t) for S = −2 and n = 20


76f i(n=40,t=0)f i(n=40,t=0.3)) from truncated system <strong>of</strong> ODEsf i(n=40, t=0.3) from simulation algorithm54f i32100 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1i/n = number <strong>of</strong> derived <strong>allele</strong>s / sample sizePlots <strong>of</strong> f i (t) for S = 0 and n = 40


*ReferencesBustamante, C. D., Wakeley, J., Sawyer, S., Hartl, D. L., 2001.Directional selection and <strong>the</strong> site-<strong>frequency</strong> <strong>spectrum</strong>. Genetics159 (4), 1779–1788.Fisher, R. A., 1930. The distribution <strong>of</strong> gene ratios for raremutations. Proceedings <strong>of</strong> <strong>the</strong> Royal Society <strong>of</strong> Edinburgh 50,205–220.Griffiths, R. C., 2003. The <strong>frequency</strong> <strong>spectrum</strong> <strong>of</strong> a mutation,and its age, in a general diffusion model. Theoretical PopulationBiology 64 (2), 241–251.


Griffiths, R. C., Tavaré, S., 1998. The age <strong>of</strong> a mutation in ageneral coalescent tree. Stochastic Models 14, 273–295.Kimura, M., 1964. Diffusion models in population genetics. Journal<strong>of</strong> Applied Probability 1, 177–232.Kimura, M., 1969. The number <strong>of</strong> heterozygous nucleotide sitesmaintained in a finite population due to steady flux <strong>of</strong> mutations.Genetics 61, 893–903.Nei, M., Maruyama, T., Chakraborty, R., 1975. The bottleneckeffect and genetic variability in populations. Evolution 29, 1–10.


Nielsen, R., 2000. Estimation <strong>of</strong> population parameters and recombinationrates from single nucleotide polymorphisms. Genetics.154 (2), 931–942.Polanski, A., Kimmel, M., Sep 2003. New explicit expressions forrelative frequencies <strong>of</strong> single-nucleotide polymorphisms withapplication to statistical inference on population growth. Genetics165 (1), 427–436.Reich, D. E., Lander, E. S., 2001. On <strong>the</strong> allelic <strong>spectrum</strong> <strong>of</strong>human disease. Trends in Genetics 17 (9), 502–510.Sawyer, S. A., Hartl, D. L., 1992. Population genetics <strong>of</strong> polymorphismand divergence. Genetics 132 (4), 1161–1176.


Tajima, F., 1989. The effect <strong>of</strong> change in population size onDNA polymorphism. Genetics 123 (3), 597–602.Williams, D., 1974. Path decomposition and continuity <strong>of</strong> localtime for one-dimensional diffusions. I. Proc. London Math.Soc. (3) 28, 738–768.Williamson, S., Fledel-Alon, A., Bustamante, C. D., September1, 2004. Population genetics <strong>of</strong> polymorphism and divergencefor diploid selection models with arbitrary dominance. Genetics168 (1), 463–475.Williamson, S. H., Hernandez, R., Fledel-Alon, A., Zhu, L.,Nielsen, R., Bustamante, C. D., May 19, 2005. Simultaneous


inference <strong>of</strong> selection and population growth from patterns <strong>of</strong>variation in <strong>the</strong> human genome. Proceedings <strong>of</strong> <strong>the</strong> NationalAcademy <strong>of</strong> Sciences USA. 102, 7882–7887.Wooding, S., Rogers, A., 2002. The matrix coalescent and anapplication to human single-nucleotide polymorphisms. Genetics161 (4), 1641–50.Wright, S., 1938. The distribution <strong>of</strong> gene frequencies underirreversible mutation. Proceedings <strong>of</strong> <strong>the</strong> National Academy<strong>of</strong> Sciences <strong>of</strong> <strong>the</strong> United States <strong>of</strong> America 24, 253–259.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!