13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

248 CHIAROMONTE ET AL.Figure 1. Smoothed densities <strong>of</strong> normalized percent identity forancestral repeats and genome-wide WA-windows (50 bp, atleast 40 aligned). f neutral (S) and f genome (S) are depicted in red andblue, respectively. <strong>The</strong>y are obtained through Gaussian kernelsmoothing, a technique that employs the convolution <strong>of</strong> a Gaussiandensity with the discrete distribution placing equal mass oneach observed value.Figure 2. Ratio between the smoothed densities <strong>of</strong> normalizedpercent identity for genome-wide and ancestral repeat WA-windows(50 bp, at least 40 aligned). <strong>The</strong> minimum <strong>of</strong> this curve, p o= 0.808, estimates an upper bound for the neutral weight in themixture (i.e., the share <strong>of</strong> genome-wide windows compatiblewith the neutral template provided by ancestral repeats).We employed Gaussian kernel smoothers to producethe estimated density functions f neutral (S) and f genome (S)depicted by the blue and red curves in Figure 1. A Gaussiankernel smoother (Wegman 1972; Silverman 1986)estimates the density <strong>of</strong> a variable X, for which observations{x 1 ,...x N } are available, by convolving the density <strong>of</strong>a normal N(0,σ 2 ) with a distribution placing mass 1/N oneach observed value:1 N 1 (X – x i ) 2f(X) = ___ Σ exp{ ___________}_______ – (2)N i=1 2πσ 2σ 2We decompose the distribution <strong>of</strong> S for genome-wideWA windows as a mixture <strong>of</strong> a neutral component (thescore distribution for WA-windows in ancestral repeats)and a component that appears to be under selection, withweights p o , and p 1 = (1 – p o ), respectively:neutral weight as p o = min S [f genome (S)/f neutral (S)], whichgives a value <strong>of</strong> 0.808 . This is illustrated in Figure 2. (Inpractice, additional steps are taken to ensure that inaccuraciesin the estimated density ratio f genome (S)/f neutral (S)do not affect the result; see Control Experiments belowand Methods.) Figure 3 summarizes the corresponding“conservative” mixture decomposition: the blue curve de-f genome (S) = p o f neutral (S) + (1 – p o )f selected (S) (3)(For background on mixtures, see Lindsay 1995;McLachlan and Peel 2000; for an approach similar to theone used here, see Efron et al. 2001.) Thus, a WA-windowis assumed to be neutral (have conservation consistentwith f neutral ) with probability p o , and undergoing selection(have conservation consistent with f selected ) withprobability (1 – p o ).We have estimated f neutral and f genome from our data,and will use Equation 3 to estimate p o , which will then determinef selected . Although the parameter p o is not univocallydetermined by Equation 3, non-negativity <strong>of</strong> densitiesimplies thatf genome (S)p o ≤ ___________ (4)f neutral (S)for all scores S. Thus, we estimate an upper bound to theFigure 3. Mixture decomposition <strong>of</strong> the distribution <strong>of</strong> normalizedpercent identity for genome-wide WA-windows (50 bp, atleast 40 aligned) into a neutral component and a component underselection. This is a “conservative” decomposition that usesthe estimated upper bound p o . <strong>The</strong> blue curve depicts f genome (S),the red curve depicts p o f neutral (S), and the green curve depictsthe difference f genome (S) – p o f neutral (S) = (1 – p o ) f selected (S).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!