12.07.2015 Views

Initial sequencing and analysis of the human genome - Vitagenes

Initial sequencing and analysis of the human genome - Vitagenes

Initial sequencing and analysis of the human genome - Vitagenes

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

articlesClasses <strong>of</strong> interspersed repeat in <strong>the</strong> <strong>human</strong> <strong>genome</strong>Length CopynumberLINEs AutonomousORF1 ORF2 (pol)AAA 6–8 kb 850,000ABSINEs Non-autonomous AAA100–300 bp 1,500,000Fraction <strong>of</strong><strong>genome</strong>21%13%Retrovirus-likeelementsAutonomousNon-autonomousgag pol (env)(gag)6–11 kb1.5–3 kb450,0008%DNAtransposonfossilsAutonomousNon-autonomoustransposase2–3 kb80–3,000 bp300,0003%Figure 17 Almost all transposable elements in mammals fall into one <strong>of</strong> four classes. See text for details.target site duplication <strong>of</strong> 7±20 bp. The LINE machinery is believedto be responsible for most reverse transcription in <strong>the</strong> <strong>genome</strong>,including <strong>the</strong> retrotransposition <strong>of</strong> <strong>the</strong> non-autonomous SINEs 144<strong>and</strong> <strong>the</strong> creation <strong>of</strong> processed pseudogenes 145,146 . Three distantlyrelated LINE families are found in <strong>the</strong> <strong>human</strong> <strong>genome</strong>: LINE1,LINE2 <strong>and</strong> LINE3. Only LINE1 is still active.SINEs are wildly successful freeloaders on <strong>the</strong> backs <strong>of</strong> LINEelements. They are short (about 100±400 bp), harbour an internalpolymerase III promoter <strong>and</strong> encode no proteins. These nonautonomoustransposons are thought to use <strong>the</strong> LINE machineryfor transposition. Indeed, most SINEs `live' by sharing <strong>the</strong> 39 endwith a resident LINE element 144 . The promoter regions <strong>of</strong> all knownSINEs are derived from tRNA sequences, with <strong>the</strong> exception <strong>of</strong> asingle monophyletic family <strong>of</strong> SINEs derived from <strong>the</strong> signalrecognition particle component 7SL. This family, which also doesnot share its 39 end with a LINE, includes <strong>the</strong> only active SINE in <strong>the</strong><strong>human</strong> <strong>genome</strong>: <strong>the</strong> Alu element. By contrast, <strong>the</strong> mouse has bothtRNA-derived <strong>and</strong> 7SL-derived SINEs. The <strong>human</strong> <strong>genome</strong> containsthree distinct monophyletic families <strong>of</strong> SINEs: <strong>the</strong> active Alu,<strong>and</strong> <strong>the</strong> inactive MIR <strong>and</strong> Ther2/MIR3.LTR retroposons are ¯anked by long terminal direct repeats thatcontain all <strong>of</strong> <strong>the</strong> necessary transcriptional regulatory elements. Theautonomous elements (retrotransposons) contain gag <strong>and</strong> polgenes, which encode a protease, reverse transcriptase, RNAse H<strong>and</strong> integrase. Exogenous retroviruses seem to have arisen fromendogenous retrotransposons by acquisition <strong>of</strong> a cellular envelopegene (env) 147 . Transposition occurs through <strong>the</strong> retroviral mechanismwith reverse transcription occurring in a cytoplasmic virus-likeparticle, primed by a tRNA (in contrast to <strong>the</strong> nuclear location <strong>and</strong>chromosomal priming <strong>of</strong> LINEs). Although a variety <strong>of</strong> LTR retrotransposonsexist, only <strong>the</strong> vertebrate-speci®c endogenous retroviruses(ERVs) appear to have been active in <strong>the</strong> mammalian<strong>genome</strong>. Mammalian retroviruses fall into three classes (I±III),each comprising many families with independent origins. Most(85%) <strong>of</strong> <strong>the</strong> LTR retroposon-derived `fossils' consist only <strong>of</strong> anisolated LTR, with <strong>the</strong> internal sequence having been lost byhomologous recombination between <strong>the</strong> ¯anking LTRs.DNA transposons resemble bacterial transposons, having terminalinverted repeats <strong>and</strong> encoding a transposase that binds near <strong>the</strong>inverted repeats <strong>and</strong> mediates mobility through a `cut-<strong>and</strong>-paste'mechanism. The <strong>human</strong> <strong>genome</strong> contains at least seven majorclasses <strong>of</strong> DNA transposon, which can be subdivided into manyfamilies with independent origins 148 (see RepBase, http://www.girinst.org/,server/repbase.html). DNA transposons tend to haveshort life spans within a species. This can be explained by contrasting<strong>the</strong> modes <strong>of</strong> transposition <strong>of</strong> DNA transposons <strong>and</strong> LINEelements. LINE transposition tends to involve only functionalelements, owing to <strong>the</strong> cis-preference by which LINE proteinsassemble with <strong>the</strong> RNA from which <strong>the</strong>y were translated. Bycontrast, DNA transposons cannot exercise a cis-preference: <strong>the</strong>encoded transposase is produced in <strong>the</strong> cytoplasm <strong>and</strong>, when itreturns to <strong>the</strong> nucleus, it cannot distinguish active from inactiveelements. As inactive copies accumulate in <strong>the</strong> <strong>genome</strong>, transpositionbecomes less ef®cient. This checks <strong>the</strong> expansion <strong>of</strong> any DNAtransposon family <strong>and</strong> in due course causes it to die out. To survive,DNA transposons must eventually move by horizontal transferto virgin <strong>genome</strong>s, <strong>and</strong> <strong>the</strong>re is considerable evidence for suchtransfer 149±153 .Transposable elements employ different strategies to ensure <strong>the</strong>irevolutionary survival. LINEs <strong>and</strong> SINEs rely almost exclusively onvertical transmission within <strong>the</strong> host <strong>genome</strong> 154 (but see refs 148,155). DNA transposons are more promiscuous, requiring relativelyfrequent horizontal transfer. LTR retroposons use both strategies,with some being long-term active residents <strong>of</strong> <strong>the</strong> <strong>human</strong> <strong>genome</strong>(such as members <strong>of</strong> <strong>the</strong> ERVL family) <strong>and</strong> o<strong>the</strong>rs having only shortresidence times.Table 11 Number <strong>of</strong> copies <strong>and</strong> fraction <strong>of</strong> <strong>genome</strong> for classes <strong>of</strong> interspersedrepeatNumber <strong>of</strong>copies (´ 1,000)Total number <strong>of</strong>bases in <strong>the</strong> draft<strong>genome</strong>sequence (Mb)Fraction <strong>of</strong> <strong>the</strong>draft <strong>genome</strong>sequence (%)Number <strong>of</strong>families(subfamilies)SINEs 1,558 359.6 13.14Alu 1,090 290.1 10.60 1 (,20)MIR 393 60.1 2.20 1 (1)MIR3 75 9.3 0.34 1 (1)LINEs 868 558.8 20.42LINE1 516 462.1 16.89 1 (,55)LINE2 315 88.2 3.22 1 (2)LINE3 37 8.4 0.31 1 (2)LTR elements 443 227.0 8.29ERV-class I 112 79.2 2.89 72 (132)ERV(K)-class II 8 8.5 0.31 10 (20)ERV (L)-class III 83 39.5 1.44 21 (42)MaLR 240 99.8 3.65 1 (31)DNA elements 294 77.6 2.84hAT groupMER1-Charlie 182 38.1 1.39 25 (50)Zaphod 13 4.3 0.16 4 (10)Tc-1 groupMER2-Tigger 57 28.0 1.02 12 (28)Tc2 4 0.9 0.03 1 (5)Mariner 14 2.6 0.10 4 (5)PiggyBac-like 2 0.5 0.02 10 (20)Unclassi®ed 22 3.2 0.12 7 (7)Unclassi®ed 3 3.8 0.14 3 (4)Total interspersed1,226.8 44.83repeats.............................................................................................................................................................................The number <strong>of</strong> copies <strong>and</strong> base pair contributions <strong>of</strong> <strong>the</strong> major classes <strong>and</strong> subclasses <strong>of</strong>transposable elements in <strong>the</strong> <strong>human</strong> <strong>genome</strong>. Data extracted from a RepeatMasker <strong>analysis</strong> <strong>of</strong><strong>the</strong> draft <strong>genome</strong> sequence (RepeatMasker version 09092000, sensitive settings, using RepBaseUpdate 5.08). In calculating percentages, RepeatMasker excluded <strong>the</strong> runs <strong>of</strong> Ns linking <strong>the</strong> contigsin <strong>the</strong> draft <strong>genome</strong> sequence. In <strong>the</strong> last column, separate consensus sequences in <strong>the</strong> repeatdatabases are considered subfamilies, ra<strong>the</strong>r than families, when <strong>the</strong> sequences are closely relatedor related through intermediate subfamilies.880 © 2001 Macmillan Magazines Ltd NATURE | VOL 409 | 15 FEBRUARY 2001 | www.nature.com

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!