13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Structure <strong>of</strong> Linkage Disequilibrium in Humans:<strong>Genom</strong>e Factors and Population StratificationJ. BERTRANPETIT, F. CALAFELL, D. COMAS, A. GONZÁLEZ-NEIRA, AND A. NAVARROUnitat de Biologia Evolutiva, Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, SpainHuman genomes differ at, on average, one per thousandbases; this statement, which has been repeated adnauseam, takes on a whole new dimension when thephysical organization <strong>of</strong> the human genome is taken intoaccount. Thus, each allele at each polymorphic site doesnot exist independently, but is physically linked to the alleleoccurring at the next polymorphic site, and so on, allalong each chromosome. A haplotype is any combination<strong>of</strong> allelic states at linked sites. Such physical links arequite strong and are only broken by recombination viacrossing-over or similar DNA exchange processes suchas gene conversion. Recombination, the shuffling <strong>of</strong>chromosome segments <strong>of</strong> maternal and paternal originduring gamete formation, happens at such a broad scalethat relatively wide DNA portions tend to travel togetherfrom one generation to the next. In fact, the averagechance <strong>of</strong> two adjacent bases being separated by recombinationin one generation is in the order <strong>of</strong> one in a hundredmillion (10 –8 ), or, if considering polymorphic sitesat a density <strong>of</strong> one per kilobase, their probability <strong>of</strong> notstaying together in one generation would be roughly 10 –5 .Frequently, haplotype frequencies do not derive fromthe random assortment <strong>of</strong> alleles at each locus; preferentialassociations do exist. <strong>The</strong> departure <strong>of</strong> haplotype frequenciesfrom the expectation under random association<strong>of</strong> their integrating alleles is called linkage disequilibrium(LD). LD arises by several mechanisms, including randomgenetic drift, mutation, migration, changes in populationsize, and selection due to genome and populationfactors (Bertranpetit and Calafell 2001). All this newlycreated LD will be eroded by recombination, which willbe more efficient in reshuffling the alleles at the most distantloci. Thus, LD is expected to decay with physical distance.In summary, sites that are physically close willtend to carry particular combinations <strong>of</strong> alleles.From a purely numerical point <strong>of</strong> view, LD can becharacterized in two ways: as a direct measure or as theresult <strong>of</strong> a formal hypothesis test. For the sake <strong>of</strong> simplicity,we will refer to LD between a pair <strong>of</strong> polymorphisms,although the LD concept can be extended, notwithout some difficulty, to the relationships among alarger number <strong>of</strong> markers. LD ranges between two possibleextremes: On one hand, knowledge <strong>of</strong> the allele in onelocus may not convey any information about the content<strong>of</strong> the other locus; both are independent and LD is, in fact,nil. On the other hand, LD can be complete in the sensethat knowing which allele is at one locus determineswhich allele is at the other. Measures <strong>of</strong> LD (derived fromallele and haplotype frequencies) take extreme values(conveniently, 0 and 1) for these two extreme cases andthe appropriate intermediate value for intermediate situations.<strong>The</strong> two most popular measures, D´ and r 2 , arebased on the departure from the expected haplotype frequenciesunder linkage equilibrium and on the correlationcoefficient between haplotype frequencies, respectively.However, a given D´ or r 2 value does not reveal by itselfthe statistical significance <strong>of</strong> LD; for that, proper testingis needed, usually by means <strong>of</strong> a chi-square statistic or byFisher’s exact test.LD has a clear role to play in biomedical research: <strong>The</strong>physical position <strong>of</strong> a gene contributing to a phenotype(usually a disease) can be deduced from the polymorphicmarkers with which it is found at high LD. Most genes involvedin the causation <strong>of</strong> Mendelian disorders were notfound via LD, but with linkage mapping, in which the location<strong>of</strong> a disease gene is investigated by observing, infamilies with affected and nonaffected individuals, thepatterns <strong>of</strong> joint inheritance <strong>of</strong> the phenotype and each <strong>of</strong>the polymorphic markers in a genetic map. Accordingly,the events that are useful to reject a genome segment ascontaining the disease locus are recombinations happeningwithin the families collected for the study. Alternatively,if LD is assessed in the whole population, thewhole history <strong>of</strong> recombination between the disease locusand the map markers can be used to locate the gene. Thisapproach was used first by Hastbacka et al. (1992) to refinethe position <strong>of</strong> the gene for diastrophic dysplasiawithin the interval provided by linkage analysis.LD is implicitly embedded in association mapping, one<strong>of</strong> the most popular approaches in the search for the geneticcomponents <strong>of</strong> complex diseases. In associationstudies, the allelic frequencies in one or more polymorphismsare compared between affected individuals andhealthy controls. If frequencies are significantly different,there are two possible explanations: Either the polymorphismitself contributes to the phenotype (which can <strong>of</strong>tenbe ruled out given that most polymorphisms used aresynonymous or otherwise neutral variants), or the polymorphismis in linkage disequilibrium with genetic variantsactually involved in the disease phenotype.Association studies <strong>of</strong>ten involve polymorphisms in oraround candidate genes that, given their known biologicalCold Spring Harbor Symposia on Quantitative Biology, Volume LXVIII. © 2003 Cold Spring Harbor Laboratory Press 0-87969-709-1/04. 79

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!