13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

56 BENTLEYmon over many generations. Most high-frequency sequencevariants therefore arose in Africa, were present inthe migrant founders, and are common to multiple populationgroups around the world (Barbujani et al. 1997).New variants that are under positive selection—i.e., functionalvariants that confer a survival advantage—will increasein frequency more quickly than expected by randomprocesses. A change in the environment (such as theappearance <strong>of</strong> a new virus, or release <strong>of</strong> a new toxin, orchange in food source) may impose a new selective pressureon existing functional variants, leading to alterationsin allele frequency in a particular population. Balancingor frequency-dependent selection may act to maintainboth alleles <strong>of</strong> a polymorphism stably in the population.Recombination during meiosis results in cross-oversbetween homologous chromosomes. This leads to reassortment<strong>of</strong> previously existing variants into new combinationsin the haploid gametes. Assuming a recombinationrate <strong>of</strong> 1 site per 100 million bases per generation (Yuet al. 2001) (i.e., around 30 recombination events per gamete),large segments <strong>of</strong> each parental homolog arepassed on from one generation to the next. Over multiplegenerations, further recombination events result in progressivefragmentation <strong>of</strong> the original ancestral segmentsinto more and more pieces. Individual present-day chromosomesare thus mosaics <strong>of</strong> segments <strong>of</strong> relatively fewancestral chromosomes, each segment having its own lineagehistory (Pääbo 2003). <strong>The</strong> sequence along a chromosomedefines the complete set <strong>of</strong> variants in an individualhaploid sequence and represents a particular“haplotype” (from “haploid genotype”) (see Fig. 1).Somatic mutation occurs in individual cells at anystage following the formation <strong>of</strong> a diploid zygote, duringdevelopment and throughout adulthood. <strong>The</strong> mutationprocess may be enhanced by exposure to particular nongeneticfactors such as radiation or toxins. Particularcombinations <strong>of</strong> somatic mutations, sometimes in conjunctionwith germ-line variants, alter the normal program<strong>of</strong> cell proliferation, differentiation, and apoptosis,and lead to cancer.THE NATURE OF SEQUENCE VARIATIONIndividual copies <strong>of</strong> the haploid human genome differat approximately one site per kilobase on average (Li andSadler 1991; Sachidanandam et al. 2001). Single-nucleotidepolymorphisms (SNPs) account for around 90%<strong>of</strong> sequence variants in the human population, the remaining10% being insertions or deletions (“indels”). Ithas been estimated that the world’s human populationcontains over 10 million SNPs with a minor allele frequency(m.a.f.) <strong>of</strong> at least 1% (Kruglyak and Nickerson2001). Several in-depth studies <strong>of</strong> sequencing in specificgenes have demonstrated that within exons there is aslightly lower density <strong>of</strong> polymorphic sites (1 site per2,000 bases), and also typically a lower allele frequency(Cargill et al. 1999; Halushka et al. 1999). Assuming that1.5% <strong>of</strong> the genome sequence encodes protein (Lander etal. 2001), the global population contains around 90,000variants (with m.a.f. >1%) in protein-coding sequence.From previous surveys, the protein-coding variants canbe subdivided into ~50% synonymous (causing no aminoacid change), 33% nonsynonymous and conservative(i.e., leading to conservative amino acid replacement),and 17% nonsynonymous, nonconservative changes(Cargill et al. 1999; Halushka et al. 1999).Genetic variants that give rise to nonconservativeamino acid changes are particularly good functional variantcandidates, but a number <strong>of</strong> observations suggest thatthis generalization is likely to be inadequate. Not all nonconservativeamino acid changes necessarily alter function.For example, in an exhaustive survey (Green et al.1999), only half <strong>of</strong> all possible amino acid changes(233/454) in factor IX cause hemophilia B (HaemophiliaB Mutation Database v12: http://www.kcl.ac.uk/ip/petergreen/haemBdatabase.html).Functionally important amino acids in proteins tend tobe conserved between species; on average, about onethird<strong>of</strong> amino acids are highly or absolutely conserved invertebrates. Conversely, other types <strong>of</strong> changes (either atthe amino acid or nucleotide level) can alter function.Conservative changes may affect protein function, andany changes may affect transcript function; for example,by introducing or abolishing splice sites. <strong>The</strong>re is no goodestimate <strong>of</strong> the number <strong>of</strong> functionally important changesoutside protein-coding sequence. Changes in promotersor enhancers may affect transcription (Li et al. 1999; Sauret al. 2004). Comparison between multiple genomes suggeststhat around 5% <strong>of</strong> the human genome is under selectionto conserve sequence (Waterston et al. 2002).<strong>The</strong>se analyses suggest that, in addition to the proteincodingsequence, another 3.7% <strong>of</strong> the genome is equallystrongly conserved and may be important for gene regulationor chromatin behavior (Waterston et al. 2002;Margulies et al. 2003). On this basis, an initial estimate <strong>of</strong>variants in functional sequence (with m.a.f. < 1%) wouldbe nearer 300,000, and an unknown fraction <strong>of</strong> these willhave phenotypic consequences.<strong>The</strong> first genome-wide survey <strong>of</strong> human variation resultedin detection <strong>of</strong> 1.42 million candidate SNPs thatwere mapped to a unique position in the working draft sequence(Sachidanandam et al. 2001). This set was estimatedto contain possibly 12% <strong>of</strong> all SNPs with a m.a.f.<strong>of</strong> 1% or more (Kruglyak and Nickerson 2001). More recently,the total number <strong>of</strong> SNPs available with a uniquemap position has increased to 7 million (dbSNP release120, http://www.ncbi.nlm.nih.gov/SNP) following generation<strong>of</strong> additional shotgun data, and may now contain40% <strong>of</strong> SNPs with m.a.f.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!