13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

STRATIFICATION OF LINKAGE DISEQUILIBRIUM 85and its use for large-scale mapping <strong>of</strong> genes contributingto common diseases. To design this analysis correctly,knowledge <strong>of</strong> the distribution <strong>of</strong> LD along the genome iscrucial to estimate how many markers are needed to obtainadequate power in genomewide studies. In some areas,a few haplotypes may encompass a region <strong>of</strong> the order<strong>of</strong> megabases, whereas in others, SNPs a few hundredbases apart may be in equilibrium, and thus in-depthscreening may require typing <strong>of</strong> each <strong>of</strong> them.<strong>The</strong> goal <strong>of</strong> the International HapMap Project is to createa powerful shortcut to identify genes linked to complexdisorders, identifying haplotype blocks in which afew tag SNPs may be defined in order to use them as surrogatesfor the whole haplotype block. <strong>The</strong> project willproduce haplotype maps <strong>of</strong> the whole genome in fourpopulations: Americans <strong>of</strong> European ancestry, Japanese,Chinese, and Yorubans. However, the successful application<strong>of</strong> HapMap in association studies hinges on two debatedkey assumptions. First, the basic assumption forLD-based mapping <strong>of</strong> genes contributing to complex diseasesis the so-called common disease/common variantmodel. That is, genetic factors contributing to commondisease are assumed to be relatively few for each diseaseand to comprise frequent alleles at each site. Actual evidencefor this model is weak (Pritchard 2001; Reich andLander 2001) and has raised some skepticism. Evenworse, if this model is true, it may be more difficult t<strong>of</strong>ind those genes: Frequent alleles tend to be older and,therefore, less likely to remain in LD with their genomicbackground.<strong>The</strong> second assumption is that the analysis <strong>of</strong> the fourselected populations will provide valid results for all thehuman species, independently <strong>of</strong> the ethnic background.To be precise, it is assumed that the most prevalent haplotypeswill be described with the four populations analyzedand that the block structure <strong>of</strong> these populations, includingLD decay with distance, will give the frameworkfor any other human group. <strong>The</strong> studies <strong>of</strong> geographicdistribution <strong>of</strong> LD, both in single genes or regions and inwhole-genome approaches, do not support this homogeneityin LD structure.Gabriel et al. (2002) characterized the haplotype patternsacross 51 autosomal regions in samples from Africa(African-American and sub-Saharan African samples),European-American individuals, and East Asians, to determinethe structure <strong>of</strong> LD and its variation across populations.<strong>The</strong>y provide strong evidence that most <strong>of</strong> the humangenome is organized into haplotype blocks. <strong>The</strong>blocks found in African populations are smaller thanthose in European and Asian populations, estimating thathalf <strong>of</strong> the human genome exists in blocks <strong>of</strong> around 22kb in Africans and African-Americans and in blocks <strong>of</strong>around 40 kb in Europeans and Asians. In addition, theboundaries <strong>of</strong> these blocks and the common haplotypesfound within are extremely correlated across populations.Another large-scale study reported by Reich et al. (2001)examined 19 randomly selected genomic regions in aUnited States population <strong>of</strong> north-European descent anda Nigerian population in order to characterize populationdifferences in LD patterns around genes. Again, vast differencesin the extent <strong>of</strong> LD between populations werefound, and block size was smaller in the African than inthe non-African sample.<strong>The</strong>se LD studies have been used to justify the decision<strong>of</strong> analyzing four “representative” ethnic groups in theHapMap Project. Most <strong>of</strong> the existing studies describedwith large-scale data sets and different average distancebetween markers have been performed in a reduced number<strong>of</strong> populations, usually one European, one Asian, andone African. As a result <strong>of</strong> this, several questions are stillunanswered: Are haplotype blocks and LD structure fromdifferent populations within continental groups similar,and what is the amount and grain <strong>of</strong> the substructure? Doother ethnic groups harbor a divergent structure from thesimple European/Asian/African framework proposed byHapMap, for example, in Native Americans or Oceanians?Is the expected heterogeneity within Africa found inLD and block structure? <strong>The</strong>se questions are, entirely, onthe universality <strong>of</strong> the design <strong>of</strong> HapMap.<strong>The</strong>re is a further concern related to the population differences:the quality <strong>of</strong> the SNPs database used to selectmarkers. <strong>The</strong>re is a clear ascertainment bias <strong>of</strong> SNPs reportedin databases, and additional SNP discovery is requiredin highly diverse populations, such as Africanpopulations, which will help to ensure that the LD mapwill be powerful enough in all ethnic populations. Currentdata from multiple loci show that major haplotypesin one population could be absent in others. <strong>The</strong>se resultsillustrate that tag SNPs determined in one population maynot necessarily be good tag SNPs in another if the populationsare sufficiently differentiated. <strong>The</strong>refore, the importance<strong>of</strong> taking into account the actual genetic populationdifferentiation <strong>of</strong> humans has not been considered inthe making <strong>of</strong> the LD studies design.A Pilot Study <strong>of</strong> LD in a LargeNumber <strong>of</strong> PopulationsTo gain insight on the amount <strong>of</strong> population structurein patterns <strong>of</strong> LD and haplotype composition, we havestudied a region <strong>of</strong> chromosome 22, spanning 1.78 Mb, inwhich strong LD and clear haplotype blocks were describedin an English population (Dawson et al. 2002).Twelve SNPs that were described as flanking haplotypeblocks in a high LD region have been genotyped (A.González-Neira et al., unpubl.). <strong>The</strong> study has been carriedout in 1110 individuals (2220 chromosomes) from 39populations chosen to represent most <strong>of</strong> the human geneticvariation (mainly from the HGDP-CEPH panel;Cann et al. 2002). When compared to the English population,the decay <strong>of</strong> LD with genetic distance is very heterogeneous,with some European populations showing asimilar pattern, while other populations have a remarkablydifferent pattern, either with high LD and sharp decreaseor low LD and smooth decay (Fig. 1).<strong>The</strong>re is no single way <strong>of</strong> elucidating the structure <strong>of</strong>LD and haplotype composition for a large number <strong>of</strong> populations.We present here just a graphic overview <strong>of</strong> thepattern <strong>of</strong> variation (Fig. 2). Considering the well-describedEnglish population (Dawson et al. 2002), all theother populations are depicted according to their divergencefrom it in two different but complementary param-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!