13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

HUMAN SUBTELOMERIC SEQUENCES 43cated DNA or single-copy DNA. We used a database <strong>of</strong>unique transcripts representing each Unigene cluster(Schuler 1997; ftp://ftp.ncbi.nih.gov/repository/UniGene/ ; Hs.seq.uniq.Z file available from the Unigenebuild available July 1, 2002, containing transcript sequencesrepresenting ~128,000 Unigene clusters). Onethousand twelve subtelomeric transcripts were annotatedin this manner, 732 from 1-copy genomic regions, and280 from segmentally duplicated DNA and subtelomericrepeat DNA. Overall, the subtelomeric region is somewhatenriched in Unigene transcripts (54 transcripts perMb) relative to the genome-wide average (43 transcriptsper Mb). <strong>The</strong> enrichment <strong>of</strong> transcripts in subtelomericDNA is consistent with earlier studies (see, e.g., Flint etal. 1997b), although there is a great deal <strong>of</strong> variation ingene concentration from telomere to telomere. Of thetranscripts embedded within the segmental duplicationsand subtelomeric repeats, an unknown but significantfraction are likely to be pseudogenes (see, e.g., Flint et al.1997a), whereas others are likely to be members <strong>of</strong> genefamilies with many closely related but nonidentical functionaltranscripts (see, e.g., Flint et al. 1997b; Mah et al.2001; Fan et al. 2002). Cross-boundary transcripts containpart <strong>of</strong> a sequence from a duplicated genomic segmentand part from a 1-copy segment, or parts from a segmentalduplication and from a subtelomeric repeat. <strong>The</strong>setranscripts might represent transcribed pseudogenes generatedby juxtaposition <strong>of</strong> progenitor transcript segments,or they might generate new functionalities by virtue <strong>of</strong>exon shuffling upon duplication (Bailey et al. 2002; Fanet al. 2002); they include transcripts for an F-box protein,for a zinc finger-containing protein, and for many unknownpotential proteins. It is essential to acquire completefinished sequences for each distinct allele <strong>of</strong> eachsubtelomeric region in order to identify and analyze thesegenes and gene families, and to de-convolute the manyinstances <strong>of</strong> over-clustered Unigenes and mRNAs derivedfrom separate but highly similar duplicated genomicDNA fragments.Subtelomeric gene families with members having nucleotidesequence similarity in the 70% to 90% level includethe immunoglobulin heavy-chain genes (found at14q), olfactory receptor genes (1-copy regions <strong>of</strong> 1q, 5q,10q, and 15q as well as previously characterized subtelomericrepeat DNA at 1p, 6p, 8p, 11p, 15q, 19p, and 3q[Trask et al. 1998]), and zinc-finger genes (4p, 5q, 8p, 8q,12q, and 19q). Transcripts for multiple members <strong>of</strong> thesegene families were found within many <strong>of</strong> the individualsubtelomeric regions. <strong>The</strong> abundance <strong>of</strong> gene families insubtelomeric regions is a common feature <strong>of</strong> most eukaryotesand may reflect a generally increased recombinationand tolerance <strong>of</strong> subtelomeric DNA for rapid evolutionarychange.VARIATION AND TELOMERIC CLOSURELarge variant alleles <strong>of</strong> many human subtelomeric regionsexist and are believed to consist mainly or entirely<strong>of</strong> subtelomeric repeats (Wilkie et al. 1991; Macina et al.1995; Trask et al. 1998). For example, Wilkie et al.(1991) found 3 alleles varying in length up to 260 kb atthe 16p telomere among the 47 chromosomes sampled.<strong>The</strong> variant DNA regions appeared to comprise low-copysubtelomeric repeat sequences, and each allele appearedto be in complete linkage disequilibrium with markers atboth the proximal and the distal ends <strong>of</strong> the polymorphicsegment <strong>of</strong> DNA; this suggested that the subtelomeric repeatsegment contained in each allele behaved as a singleblock <strong>of</strong> DNA, with no detectable recombination withinthe block. Trask et al. (1998) examined the structure andgenomic distribution <strong>of</strong> a cosmid-sized block <strong>of</strong> segmentallyduplicated subtelomeric DNA. <strong>The</strong>y found that thisblock was consistently present at the 3q, 15q, and 19ptelomeres in humans, was variably distributed at an additionalsubset <strong>of</strong> human telomeres, but was present in asingle copy in nonhuman primate genomes. More detailedanalysis <strong>of</strong> a 12-kb segment <strong>of</strong> this block that encodesolfactory receptor genes revealed evidence for evolutionarilyrecent interchromosomal exchanges involvingthis segment, suggesting that the mosaic patchworks <strong>of</strong>duplications that comprise subtelomeric repeat regionsare not merely linear descendants <strong>of</strong> the original elements,but are still evolving and exchanging with eachother (Mefford and Trask 2002). Similar studies havemore recently demonstrated that the evolution <strong>of</strong> mostprimate subtelomeric regions has involved multiple, lineage-dependentduplications in recent evolutionary time(Martin et al. 2002; van Geel et al. 2002). <strong>The</strong> duplicationshave colonized many individual human subtelomericregions in a variable fashion since the divergence <strong>of</strong>human and primate lineages, and at least some <strong>of</strong> themare still capable <strong>of</strong> interacting and exchanging sequencesinterchromosomally.A dramatic example <strong>of</strong> this is the interchromosomalexchange <strong>of</strong> the tandemly repeated D4Z4 sequence tractbetween human 4q and 10q telomere regions (vanDeutekom et al. 1996). <strong>The</strong> size <strong>of</strong> the D4Z4 repeat tractat both 4qtel and 10qtel is highly variable in individuals,and it has been suggested that relatively frequent meioticpairing interactions between subtelomeric regions <strong>of</strong>these nonhomologous chromosomes may contribute totheir atypically high variation in D4Z4 tract length. <strong>The</strong>deletion <strong>of</strong> most <strong>of</strong> the D4Z4 tract on a particular 4q allelein the population causes FSHD, a type <strong>of</strong> musculardystrophy (Lemmers et al. 2002). Interestingly, nucleotidesequence variation in the subtelomere region distalto the D4Z4 repeat between the 4qA allele and the 4qBallele is unusually high (Lemmers et al. 2002), suggestingunusual recombinational or selective pressures that keepthese two 4qtel alleles distinct while still permittingpromiscuous interchromosomal exchange <strong>of</strong> the D4Z4repeat tracts between 4qtel and 10qtel. It is unclear howthe deletion <strong>of</strong> most <strong>of</strong> the D4Z4 tract on the 4qA allelecauses FSHD, but a widely discussed potential mechanismis a position effect on a relatively distant genecaused by disruption <strong>of</strong> subtelomeric heterochromatin inthe vicinity <strong>of</strong> the D4Z4 repeats.Characterization <strong>of</strong> large-scale human subtelomericvariation is still in its infancy, mainly because accuratelymapped and assembled reference sequences for these re-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!