13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

466 LIPOVICH AND KINGFigure 2. Perl-based, high-throughput gene verification, TUdiscovery, and UGP identification pipeline. Following stage 3(CLUSTER) output, manual curation was performed.Automated Identification <strong>of</strong> Known Genesand Novel TUs on Chr. 22Our algorithm identified 1012 nonredundant transcriptmodels on chr. 22. Of these, 495 (49%) representedknown genes and 517 (51%) represented novel TUs supportedsolely by ESTs. This result was consistent with therelative proportions <strong>of</strong> genes and TUs at 5q31, as well aswith recent findings indicating that, due to large numbers<strong>of</strong> TUs, the total number <strong>of</strong> expressed features comprisinga mammalian transcriptome is likely to be more thantwice the number <strong>of</strong> coding genes (Carninci et al. 2003).We automatically excluded transcripts homologous toimmunoglobulin λ gene segments, because existing annotationsgenerally group them into a special categoryseparate from the rest <strong>of</strong> expressed features on chr. 22(Collins et al. 2003).<strong>The</strong> sensitivity and specificity <strong>of</strong> our approach were assessedusing the most current Sanger Centre chr. 22 annotation(Collins et al. 2003). Of the 577 genes identifiedby the Sanger annotation, 469 were found by our algorithm,a sensitivity <strong>of</strong> 81%. <strong>The</strong> discrepancy was due inpart to the fact that our definition <strong>of</strong> a known gene wasbased solely on the presence <strong>of</strong> a full-length cDNA inGenbank, and did not include genes based solely onORFs or ab initio exon prediction. Of the 108 Sangergenes missed by our algorithm, the majority were eithertranscriptionally silent, segmentally duplicated paralogouscopies <strong>of</strong> genes we identified, or completely devoid<strong>of</strong> sense-strand cDNA and EST support. Of the 234 featuresidentified by the Sanger analysis as pseudogenes,206 were missing from the gene and TU sets created byour algorithm, a specificity <strong>of</strong> 88%. <strong>The</strong> remaining 28were transcribed and had full-length cDNA or EST-onlysupport, therefore fitting our definition <strong>of</strong> genes and TUs,respectively.Characterization <strong>of</strong> UGPs on Chr. 22Of the 1012 transcript models on chr. 22, 209 (21%)participated in UGPs. 77 cis-antisense pairs and 42 putativebidirectional promoters were found.Of the 77 antisense pairs, 23 were tail-to-tail and 13head-to-head, roughly consistent with the proportion at5q31 outside <strong>of</strong> the PCDH clusters (7 and 4, respectively)and with published evidence that in mammals tail-to-tailgene overlaps are more common than head-to-head overlaps(Edgar 2003). Surprisingly, the remaining 41 pairsdid not fit either category (this was the case for only 2pairs in the non-PCDH part <strong>of</strong> our 5.5-Mb 5q31 region),which argues for a substantial diversity and complexity <strong>of</strong>gene and TU structures participating in antisense overlaps.Of the 77 pairs, 36 were gene–gene, 38 weregene–TU, and 3 were TU–TU. Hence, a gene-only approachto chr. 22 annotation would miss more than half <strong>of</strong>the cis-antisense pairs. <strong>The</strong> 77 pairs accounted for only145 transcript models rather than the expected 154, because8 models participated in cis-antisense overlaps withmultiple other models.We addressed whether any <strong>of</strong> the 77 pairs had potentialfor hybridization <strong>of</strong> sense and antisense transcripts invivo due to the expression <strong>of</strong> both members <strong>of</strong> the pair inthe same tissue or cell type. Complete lists <strong>of</strong> cDNAs andTU-worthy ESTs for every pair were obtained by BLASTand manual curation, and were examined for commonalitiesin expression pr<strong>of</strong>iles. ESTs from pooled libraries ortotal fetus were eliminated, since their precise origin wasunknown. Normal tissues were considered as differentfrom corresponding tumors; e.g., a gene–TU pair inwhich the gene was expressed only in normal brain butthe TU was expressed only in brain tumors would becharacterized as lacking any overlap in expression pr<strong>of</strong>iles<strong>of</strong> the two. For 35 <strong>of</strong> the 77 antisense pairs (45%),EST evidence suggested expression <strong>of</strong> both members <strong>of</strong>the pair in the same tissue or cell type. In 19 <strong>of</strong> these 35,genomic organization <strong>of</strong> the locus was conserved betweenhuman and mouse. However, in the other 16, oneor both members <strong>of</strong> each human pair lacked orthologs andpositional equivalents in mouse (Table 4). Examples includethe acrosin precursor gene, whose head-to-head antisenseTU in humans has no mouse equivalent, and theCHK2/BC000004 head-to-head antisense pair, the orthologs<strong>of</strong> whose members in the mouse are in a head-toheadorientation but do not overlap. Of the 16 cases <strong>of</strong> human–mousedifferences in antisense-containing loci, 5were characterized by the expression <strong>of</strong> both members <strong>of</strong>the antisense pairs in human brain, raising the intriguingpossibility that some cis-regulatory effects on gene expressionin human brain are lineage-specific and are notuniversally conserved in mammals.Of the 42 putative bidirectional promoters, 34 (79%)occurred at CpG islands, confirming existing reports <strong>of</strong>divergent transcription initiation at mammalian CpG islands(Adachi and Lieber 2002). Of the 42 bidirectionallypromoted pairs, 21 were gene–gene, 18 were gene–TU,and 3 were TU–TU. <strong>The</strong>refore, similar to the case withcis-antisense, a gene-only annotation would miss approximatelyhalf <strong>of</strong> the putative bidirectionally promoted transcriptmodel pairs.Chromosomewide, 20 transcript models participated inboth cis-antisense and putative promoter-sharing pairs.This is significantly more than expected under the nullhypothesis that involvement in the two types <strong>of</strong> UGPs is

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!