12.07.2015 Views

Initial sequencing and analysis of the human genome - Vitagenes

Initial sequencing and analysis of the human genome - Vitagenes

Initial sequencing and analysis of the human genome - Vitagenes

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

like termini show <strong>the</strong> typical divergence level <strong>of</strong> such elements) <strong>and</strong>is actively transcribed provides strong evidence that it has beenadopted by <strong>the</strong> <strong>human</strong> <strong>genome</strong> as a gene. Its function is unknown.LINE1 activity clearly has also had fringe bene®ts. We mentionedabove <strong>the</strong> possibility <strong>of</strong> exon reshuf¯ing by cotranscription <strong>of</strong>neighbouring DNA. The LINE1 machinery can also cause reversetranscription <strong>of</strong> genic mRNAs, which typically results in nonfunctionalprocessed pseudogenes but can, occasionally, give rise t<strong>of</strong>unctional processed genes. There are at least eight <strong>human</strong> <strong>and</strong>eight mouse genes for which evidence strongly supports such anorigin 211 (see http://www-i®.uni-muenster.de/exapted-retrogenes/tables.html). Many o<strong>the</strong>r intronless genes may have been createdin <strong>the</strong> same way.Transposons have made o<strong>the</strong>r creative contributions to <strong>the</strong><strong>genome</strong>. A few hundred genes, for example, use transcriptionalterminators donated by LTR retroposons (data not shown). O<strong>the</strong>rgenes employ regulatory elements derived from repeat elements 211 .Simple sequence repeatsSimple sequence repeats (SSRs) are a ra<strong>the</strong>r different type <strong>of</strong>repetitive structure that is common in <strong>the</strong> <strong>human</strong> <strong>genome</strong>Ðperfector slightly imperfect t<strong>and</strong>em repeats <strong>of</strong> a particular k-mer. SSRs witha short repeat unit (n = 1±13 bases) are <strong>of</strong>ten termed microsaarticlesfrom transposons 142,209 . These include <strong>the</strong> RAG1 <strong>and</strong> RAG2 recombinases<strong>and</strong> <strong>the</strong> major centromere-binding protein CENPB. Wescanned <strong>the</strong> draft <strong>genome</strong> sequence <strong>and</strong> identi®ed ano<strong>the</strong>r 27 cases,bringing <strong>the</strong> total to 47 (Table 13; refs 142, 209). All but four arederived from DNA transposons, which give rise to only a smallproportion <strong>of</strong> <strong>the</strong> interspersed repeats in <strong>the</strong> <strong>genome</strong>. Why <strong>the</strong>re areso many DNA transposase-like genes, many <strong>of</strong> which still contain<strong>the</strong> critical residues for transposase activity, is a mystery.To illustrate this concept, we describe <strong>the</strong> discovery <strong>of</strong> one <strong>of</strong> <strong>the</strong>new examples. We searched <strong>the</strong> draft <strong>genome</strong> sequence to identify<strong>the</strong> autonomous DNA transposon responsible for <strong>the</strong> distribution<strong>of</strong> <strong>the</strong> non-autonomous MER85 element, one <strong>of</strong> <strong>the</strong> most recently(40±50 Myr ago) active DNA transposons. Most non-autonomouselements are internal deletion products <strong>of</strong> a DNA transposon. Weidenti®ed one instance <strong>of</strong> a large (1,782 bp) ORF ¯anked by <strong>the</strong> 59<strong>and</strong> 39 halves <strong>of</strong> a MER85 element. The ORF encodes a novel protein(partially published as pID 6453533) whose closest homologue is<strong>the</strong> transposase <strong>of</strong> <strong>the</strong> piggyBac DNA transposon, which is found ininsects <strong>and</strong> has <strong>the</strong> same characteristic TTAA target-siteduplications 210 as MER85. The ORF is actively transcribed in fetalbrain <strong>and</strong> in cancer cells. That it has not been lost to mutation in40±50 Myr <strong>of</strong> evolution (whereas <strong>the</strong> ¯anking, noncoding, MER85-Table 13 Human genes derived from transposable elementsGenBank ID* Gene name Related transposon family² Possible fusion gene§ Newly recognized derivationknID 3150436 BC200 FLAM Alu³pID 2330017 Telomerase non-LTR retrotransposonpID 1196425 HERV-3 env Retroviridae/HERV-R³pID 4773880 Syncytin Retroviridae/HERV-W³pID 131827 RAG1 <strong>and</strong> 2 Tc1-likepID 29863 CENP-B Tc1/PogoEST 2529718 Tc1/Pogo +PID 10047247 Tc1/Pogo/Pogo +EST 4524463 Tc1/Pogo/Pogo +pID 4504807 Jerky Tc1/Pogo/TiggerpID 7513096 JRKL Tc1/Pogo/TiggerEST 5112721 Tc1/Pogo/Tigger +EST 11097233 Tc1/Pogo/Tigger +EST 6986275 Sancho Tc1/Pogo/TiggerEST 8616450 Tc1/Pogo/Tigger +EST 8750408 Tc1/Pogo/Tigger +EST 5177004 Tc1/Pogo/Tigger +PID 3413884 KIAA0461 Tc1/Pogo/Tc2 +PID 7959287 KIAA1513 Tc1/Pogo/Tc2 +PID 2231380 Tc1/Mariner/Hsmar1³ +EST 10219887 hAT/Hobo + +PID 6581095 Buster1 hAT/Charlie +PID 7243087 Buster2 hAT/Charlie +PID 6581097 Buster3 hAT/CharliePID 7662294 KIAA0766 hAT/Charlie +PID 10439678 hAT/Charlie +PID 7243087 KIAA1353 hAT/Charlie +PID 7021900 hAT/Charlie/Charlie3³ +PID 4263748 hAT/Charlie/Charlie8³ +EST 8161741 hAT/Charlie/Charlie9³ +pID 4758872 DAP4,pP52 rIPK hAT/Tip100/ZaphodEST 10990063 hAT/Tip100/Zaphod +EST 10101591 hAT/Tip100/Zaphod +pID 7513011 KIAA0543 hAT/Tip100/Tip100 +pID 10439744 hAT/Tip100/Tip100 +pID 10047247 KIAA1586 hAT/Tip100/Tip100 +pID 10439762 hAT/Tip100 + +EST 10459804 hAT/Tip100 +pID 4160548 Tramp hAT/Tam3 +BAC 3522927 hAT/Tam3 +pID 3327088 KIAA0637 hAT/Tam3 +EST 1928552 hAT/Tam3 +pID 6453533 piggyBac/MER85³ +EST 3594004 piggyBac/MER85³ +BAC 4309921 piggyBac/MER85³ +EST 4073914 piggyBac/MER75³ +EST 1963278 piggyBac +...................................................................................................................................................................................................................................................................................................................................................................The Table lists 47 <strong>human</strong> genes, with a likely origin in up to 38 different transposon copies.* Where available, <strong>the</strong> GenBank ID numbers are given for proteins, o<strong>the</strong>rwise a representative EST or a clone name is shown. Six groups (two or three genes each) have similarity at <strong>the</strong> DNA level well beyondthat observed between different DNA transposon families in <strong>the</strong> <strong>genome</strong>; <strong>the</strong>y are indicated in italics, with all but <strong>the</strong> initial member <strong>of</strong> each group indented. This could be explained if <strong>the</strong> genes wereparalogous (derived from a single inserted transposon <strong>and</strong> subsequently duplicated).² Classi®cation <strong>of</strong> <strong>the</strong> transposon.³ Indicates that <strong>the</strong> transposon from which <strong>the</strong> gene is derived is precisely known.§ Proteins probably formed by fusion <strong>of</strong> a cellular <strong>and</strong> transposon gene; many have acquired zinc-®nger domains.k Not previously reported as being derived from transposable element genes. The remaining genes can be found in refs 142, 209.888 © 2001 Macmillan Magazines Ltd NATURE | VOL 409 | 15 FEBRUARY 2001 | www.nature.com

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!