articles20±35 Mb. The increase is most pronounced in <strong>the</strong> male meioticmap. The effect can be seen, for example, from <strong>the</strong> higher slope atboth ends <strong>of</strong> chromosome 12 (Fig. 15). Regional <strong>and</strong> sex-speci®ceffects have been observed for chromosome 21 (refs 110, 134).Why is recombination higher on smaller chromosome arms? Ahigher rate would increase <strong>the</strong> likelihood <strong>of</strong> at least one crossoverduring meiosis on each chromosome arm, as is generally observedin <strong>human</strong> chiasmata counts 135 . Crossovers are believed to benecessary for normal meiotic disjunction <strong>of</strong> homologous chromosomepairs in eukaryotes. An extreme example is <strong>the</strong> pseudoautosomalregions on chromosomes Xp <strong>and</strong> Yp, which pair during malemeiosis; this physical region <strong>of</strong> only 2.6 Mb has a genetic length <strong>of</strong>50 cM (corresponding to 20 cM per Mb), with <strong>the</strong> result that acrossover is virtually assured.Mechanistically, <strong>the</strong> increased rate <strong>of</strong> recombination on shorterchromosome arms could be explained if, once an initial recombinationevent occurs, additional nearby events are blocked by positivecrossover interference on each arm. Evidence from yeast mutants inwhich interference is abolished shows that interference plays a keyrole in distributing a limited number <strong>of</strong> crossovers among <strong>the</strong>various chromosome arms in yeast 136 . An alternative possibility isthat a checkpoint mechanism scans for <strong>and</strong> enforces <strong>the</strong> presence <strong>of</strong>at least one crossover on each chromosome arm.Variation in recombination rates along chromosomes <strong>and</strong>between <strong>the</strong> sexes is likely to re¯ect variation in <strong>the</strong> initiation <strong>of</strong>meiosis-induced double-str<strong>and</strong> breaks (DSBs) that initiate recombination.DSBs in yeast have been associated with openchromatin 137,138 , ra<strong>the</strong>r than with speci®c DNA sequence motifs.With <strong>the</strong> availability <strong>of</strong> <strong>the</strong> draft <strong>genome</strong> sequence, it should bepossible to explore in an analogous manner whe<strong>the</strong>r variationin <strong>human</strong> recombination rates re¯ects systematic differences inchromosome accessibility during meiosis.Repeat content <strong>of</strong> <strong>the</strong> <strong>human</strong> <strong>genome</strong>A puzzling observation in <strong>the</strong> early days <strong>of</strong> molecular biology wasthat <strong>genome</strong> size does not correlate well with organismal complexity.For example, Homo sapiens has a <strong>genome</strong> that is 200 times aslarge as that <strong>of</strong> <strong>the</strong> yeast S. cerevisiae, but 200 times as small as that <strong>of</strong>Recombination rate (cM per Mb)32.521.510.500 20 40 60 80 100 120 140 160Length <strong>of</strong> chromosome arm (Mb)Figure 16 Rate <strong>of</strong> recombination averaged across <strong>the</strong> euchromatic portion <strong>of</strong> eachchromosome arm plotted against <strong>the</strong> length <strong>of</strong> <strong>the</strong> chromosome arm in Mb. For largechromosomes, <strong>the</strong> average recombination rates are very similar, but as chromosome armlength decreases, average recombination rates rise markedly.Amoeba dubia 139,140 . This mystery (<strong>the</strong> C-value paradox) was largelyresolved with <strong>the</strong> recognition that <strong>genome</strong>s can contain a largequantity <strong>of</strong> repetitive sequence, far in excess <strong>of</strong> that devoted toprotein-coding genes (reviewed in refs 140, 141).In <strong>the</strong> <strong>human</strong>, coding sequences comprise less than 5% <strong>of</strong> <strong>the</strong><strong>genome</strong> (see below), whereas repeat sequences account for at least50% <strong>and</strong> probably much more. Broadly, <strong>the</strong> repeats fall into ®veclasses: (1) transposon-derived repeats, <strong>of</strong>ten referred to as interspersedrepeats; (2) inactive (partially) retroposed copies <strong>of</strong> cellulargenes (including protein-coding genes <strong>and</strong> small structural RNAs),usually referred to as processed pseudogenes; (3) simple sequencerepeats, consisting <strong>of</strong> direct repetitions <strong>of</strong> relatively short k-merssuch as (A) n , (CA) n or (CGG) n ; (4) segmental duplications, consisting<strong>of</strong> blocks <strong>of</strong> around 10±300 kb that have been copied fromone region <strong>of</strong> <strong>the</strong> <strong>genome</strong> into ano<strong>the</strong>r region; <strong>and</strong> (5) blocks <strong>of</strong>t<strong>and</strong>emly repeated sequences, such as at centromeres, telomeres,<strong>the</strong> short arms <strong>of</strong> acrocentric chromosomes <strong>and</strong> ribosomal geneclusters. (These regions are intentionally under-represented in <strong>the</strong>draft <strong>genome</strong> sequence <strong>and</strong> are not discussed here.)Repeats are <strong>of</strong>ten described as `junk' <strong>and</strong> dismissed as uninteresting.However, <strong>the</strong>y actually represent an extraordinary trove <strong>of</strong>information about biological processes. The repeats constitute arich palaeontological record, holding crucial clues about evolutionaryevents <strong>and</strong> forces. As passive markers, <strong>the</strong>y provide assaysfor studying processes <strong>of</strong> mutation <strong>and</strong> selection. It is possible torecognize cohorts <strong>of</strong> repeats `born' at <strong>the</strong> same time <strong>and</strong> to follow<strong>the</strong>ir fates in different regions <strong>of</strong> <strong>the</strong> <strong>genome</strong> or in different species.As active agents, repeats have reshaped <strong>the</strong> <strong>genome</strong> by causingectopic rearrangements, creating entirely new genes, modifying <strong>and</strong>reshuf¯ing existing genes, <strong>and</strong> modulating overall GC content. Theyalso shed light on chromosome structure <strong>and</strong> dynamics, <strong>and</strong>provide tools for medical genetic <strong>and</strong> population genetic studies.The <strong>human</strong> is <strong>the</strong> ®rst repeat-rich <strong>genome</strong> to be sequenced, <strong>and</strong>so we investigated what information could be gleaned from thismajority component <strong>of</strong> <strong>the</strong> <strong>human</strong> <strong>genome</strong>. Although some <strong>of</strong> <strong>the</strong>general observations about repeats were suggested by previousstudies, <strong>the</strong> draft <strong>genome</strong> sequence provides <strong>the</strong> ®rst comprehensiveview, allowing some questions to be resolved <strong>and</strong> new mysteries toemerge.Transposon-derived repeatsMost <strong>human</strong> repeat sequence is derived from transposableelements 142,143 . We can currently recognize about 45% <strong>of</strong> <strong>the</strong><strong>genome</strong> as belonging to this class. Much <strong>of</strong> <strong>the</strong> remaining`unique' DNA must also be derived from ancient transposableelement copies that have diverged too far to be recognized assuch. To describe our analyses <strong>of</strong> interspersed repeats, it is necessarybrie¯y to review <strong>the</strong> relevant features <strong>of</strong> <strong>human</strong> transposableelements.Classes <strong>of</strong> transposable elements. In mammals, almost all transposableelements fall into one <strong>of</strong> four types (Fig. 17), <strong>of</strong> which threetranspose through RNA intermediates <strong>and</strong> one transposes directlyas DNA. These are long interspersed elements (LINEs), shortinterspersed elements (SINEs), LTR retrotransposons <strong>and</strong> DNAtransposons.LINEs are one <strong>of</strong> <strong>the</strong> most ancient <strong>and</strong> successful inventions ineukaryotic <strong>genome</strong>s. In <strong>human</strong>s, <strong>the</strong>se transposons are about 6 kblong, harbour an internal polymerase II promoter <strong>and</strong> encode twoopen reading frames (ORFs). Upon translation, a LINE RNAassembles with its own encoded proteins <strong>and</strong> moves to <strong>the</strong> nucleus,where an endonuclease activity makes a single-str<strong>and</strong>ed nick <strong>and</strong><strong>the</strong> reverse transcriptase uses <strong>the</strong> nicked DNA to prime reversetranscription from <strong>the</strong> 39 end <strong>of</strong> <strong>the</strong> LINE RNA. Reverse transcriptionfrequently fails to proceed to <strong>the</strong> 59 end, resulting in manytruncated, nonfunctional insertions. Indeed, most LINE-derivedrepeats are short, with an average size <strong>of</strong> 900 bp for all LINE1 copies,<strong>and</strong> a median size <strong>of</strong> 1,070 bp for copies <strong>of</strong> <strong>the</strong> currently activeLINE1 element (L1Hs). New insertion sites are ¯anked by a smallNATURE | VOL 409 | 15 FEBRUARY 2001 | www.nature.com © 2001 Macmillan Magazines Ltd879
articlesClasses <strong>of</strong> interspersed repeat in <strong>the</strong> <strong>human</strong> <strong>genome</strong>Length CopynumberLINEs AutonomousORF1 ORF2 (pol)AAA 6–8 kb 850,000ABSINEs Non-autonomous AAA100–300 bp 1,500,000Fraction <strong>of</strong><strong>genome</strong>21%13%Retrovirus-likeelementsAutonomousNon-autonomousgag pol (env)(gag)6–11 kb1.5–3 kb450,0008%DNAtransposonfossilsAutonomousNon-autonomoustransposase2–3 kb80–3,000 bp300,0003%Figure 17 Almost all transposable elements in mammals fall into one <strong>of</strong> four classes. See text for details.target site duplication <strong>of</strong> 7±20 bp. The LINE machinery is believedto be responsible for most reverse transcription in <strong>the</strong> <strong>genome</strong>,including <strong>the</strong> retrotransposition <strong>of</strong> <strong>the</strong> non-autonomous SINEs 144<strong>and</strong> <strong>the</strong> creation <strong>of</strong> processed pseudogenes 145,146 . Three distantlyrelated LINE families are found in <strong>the</strong> <strong>human</strong> <strong>genome</strong>: LINE1,LINE2 <strong>and</strong> LINE3. Only LINE1 is still active.SINEs are wildly successful freeloaders on <strong>the</strong> backs <strong>of</strong> LINEelements. They are short (about 100±400 bp), harbour an internalpolymerase III promoter <strong>and</strong> encode no proteins. These nonautonomoustransposons are thought to use <strong>the</strong> LINE machineryfor transposition. Indeed, most SINEs `live' by sharing <strong>the</strong> 39 endwith a resident LINE element 144 . The promoter regions <strong>of</strong> all knownSINEs are derived from tRNA sequences, with <strong>the</strong> exception <strong>of</strong> asingle monophyletic family <strong>of</strong> SINEs derived from <strong>the</strong> signalrecognition particle component 7SL. This family, which also doesnot share its 39 end with a LINE, includes <strong>the</strong> only active SINE in <strong>the</strong><strong>human</strong> <strong>genome</strong>: <strong>the</strong> Alu element. By contrast, <strong>the</strong> mouse has bothtRNA-derived <strong>and</strong> 7SL-derived SINEs. The <strong>human</strong> <strong>genome</strong> containsthree distinct monophyletic families <strong>of</strong> SINEs: <strong>the</strong> active Alu,<strong>and</strong> <strong>the</strong> inactive MIR <strong>and</strong> Ther2/MIR3.LTR retroposons are ¯anked by long terminal direct repeats thatcontain all <strong>of</strong> <strong>the</strong> necessary transcriptional regulatory elements. Theautonomous elements (retrotransposons) contain gag <strong>and</strong> polgenes, which encode a protease, reverse transcriptase, RNAse H<strong>and</strong> integrase. Exogenous retroviruses seem to have arisen fromendogenous retrotransposons by acquisition <strong>of</strong> a cellular envelopegene (env) 147 . Transposition occurs through <strong>the</strong> retroviral mechanismwith reverse transcription occurring in a cytoplasmic virus-likeparticle, primed by a tRNA (in contrast to <strong>the</strong> nuclear location <strong>and</strong>chromosomal priming <strong>of</strong> LINEs). Although a variety <strong>of</strong> LTR retrotransposonsexist, only <strong>the</strong> vertebrate-speci®c endogenous retroviruses(ERVs) appear to have been active in <strong>the</strong> mammalian<strong>genome</strong>. Mammalian retroviruses fall into three classes (I±III),each comprising many families with independent origins. Most(85%) <strong>of</strong> <strong>the</strong> LTR retroposon-derived `fossils' consist only <strong>of</strong> anisolated LTR, with <strong>the</strong> internal sequence having been lost byhomologous recombination between <strong>the</strong> ¯anking LTRs.DNA transposons resemble bacterial transposons, having terminalinverted repeats <strong>and</strong> encoding a transposase that binds near <strong>the</strong>inverted repeats <strong>and</strong> mediates mobility through a `cut-<strong>and</strong>-paste'mechanism. The <strong>human</strong> <strong>genome</strong> contains at least seven majorclasses <strong>of</strong> DNA transposon, which can be subdivided into manyfamilies with independent origins 148 (see RepBase, http://www.girinst.org/,server/repbase.html). DNA transposons tend to haveshort life spans within a species. This can be explained by contrasting<strong>the</strong> modes <strong>of</strong> transposition <strong>of</strong> DNA transposons <strong>and</strong> LINEelements. LINE transposition tends to involve only functionalelements, owing to <strong>the</strong> cis-preference by which LINE proteinsassemble with <strong>the</strong> RNA from which <strong>the</strong>y were translated. Bycontrast, DNA transposons cannot exercise a cis-preference: <strong>the</strong>encoded transposase is produced in <strong>the</strong> cytoplasm <strong>and</strong>, when itreturns to <strong>the</strong> nucleus, it cannot distinguish active from inactiveelements. As inactive copies accumulate in <strong>the</strong> <strong>genome</strong>, transpositionbecomes less ef®cient. This checks <strong>the</strong> expansion <strong>of</strong> any DNAtransposon family <strong>and</strong> in due course causes it to die out. To survive,DNA transposons must eventually move by horizontal transferto virgin <strong>genome</strong>s, <strong>and</strong> <strong>the</strong>re is considerable evidence for suchtransfer 149±153 .Transposable elements employ different strategies to ensure <strong>the</strong>irevolutionary survival. LINEs <strong>and</strong> SINEs rely almost exclusively onvertical transmission within <strong>the</strong> host <strong>genome</strong> 154 (but see refs 148,155). DNA transposons are more promiscuous, requiring relativelyfrequent horizontal transfer. LTR retroposons use both strategies,with some being long-term active residents <strong>of</strong> <strong>the</strong> <strong>human</strong> <strong>genome</strong>(such as members <strong>of</strong> <strong>the</strong> ERVL family) <strong>and</strong> o<strong>the</strong>rs having only shortresidence times.Table 11 Number <strong>of</strong> copies <strong>and</strong> fraction <strong>of</strong> <strong>genome</strong> for classes <strong>of</strong> interspersedrepeatNumber <strong>of</strong>copies (´ 1,000)Total number <strong>of</strong>bases in <strong>the</strong> draft<strong>genome</strong>sequence (Mb)Fraction <strong>of</strong> <strong>the</strong>draft <strong>genome</strong>sequence (%)Number <strong>of</strong>families(subfamilies)SINEs 1,558 359.6 13.14Alu 1,090 290.1 10.60 1 (,20)MIR 393 60.1 2.20 1 (1)MIR3 75 9.3 0.34 1 (1)LINEs 868 558.8 20.42LINE1 516 462.1 16.89 1 (,55)LINE2 315 88.2 3.22 1 (2)LINE3 37 8.4 0.31 1 (2)LTR elements 443 227.0 8.29ERV-class I 112 79.2 2.89 72 (132)ERV(K)-class II 8 8.5 0.31 10 (20)ERV (L)-class III 83 39.5 1.44 21 (42)MaLR 240 99.8 3.65 1 (31)DNA elements 294 77.6 2.84hAT groupMER1-Charlie 182 38.1 1.39 25 (50)Zaphod 13 4.3 0.16 4 (10)Tc-1 groupMER2-Tigger 57 28.0 1.02 12 (28)Tc2 4 0.9 0.03 1 (5)Mariner 14 2.6 0.10 4 (5)PiggyBac-like 2 0.5 0.02 10 (20)Unclassi®ed 22 3.2 0.12 7 (7)Unclassi®ed 3 3.8 0.14 3 (4)Total interspersed1,226.8 44.83repeats.............................................................................................................................................................................The number <strong>of</strong> copies <strong>and</strong> base pair contributions <strong>of</strong> <strong>the</strong> major classes <strong>and</strong> subclasses <strong>of</strong>transposable elements in <strong>the</strong> <strong>human</strong> <strong>genome</strong>. Data extracted from a RepeatMasker <strong>analysis</strong> <strong>of</strong><strong>the</strong> draft <strong>genome</strong> sequence (RepeatMasker version 09092000, sensitive settings, using RepBaseUpdate 5.08). In calculating percentages, RepeatMasker excluded <strong>the</strong> runs <strong>of</strong> Ns linking <strong>the</strong> contigsin <strong>the</strong> draft <strong>genome</strong> sequence. In <strong>the</strong> last column, separate consensus sequences in <strong>the</strong> repeatdatabases are considered subfamilies, ra<strong>the</strong>r than families, when <strong>the</strong> sequences are closely relatedor related through intermediate subfamilies.880 © 2001 Macmillan Magazines Ltd NATURE | VOL 409 | 15 FEBRUARY 2001 | www.nature.com