13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

462 LIPOVICH AND KINGtions that create or destroy polyadenylation signals in thegenomic DNA sequence (Dan et al. 2002), in the generation<strong>of</strong> lineage-specific cis-antisense UGPs. Against abackground <strong>of</strong> evidence for substantial numbers <strong>of</strong> lineage-specificgenes in completely sequenced prokaryotic(Jordan et al. 2001) and eukaryotic (Lespinet et al. 2002)genomes, human–mouse comparisons have shown thateven in two species sharing a common ancestor only ~ 70mya, lineage-specific members <strong>of</strong> gene families related totranscription regulation, olfaction, behavior, and immunityhave appeared (Young et al. 2002; Emes et al. 2003;Shannon et al. 2003), spurring an entire field <strong>of</strong> “genomezoology” focused on the genomic basis for interspecificdifferences (Emes et al. 2003).<strong>The</strong> present study began more than 4 years ago as asimple attempt to construct and refine a comprehensive insilico physical and transcript map <strong>of</strong> a 5.5-Mb human genomicregion at 5q31. At the time, the genome draft washighly fragmented, the cDNA databases rudimentary, andpublished analyses <strong>of</strong> TUs and UGPs beyond isolated single-locusexamples practically nonexistent. Nevertheless,our extensive manual annotation uncovered substantialevidence for the existence <strong>of</strong> numerous TUs and UGPs inthe region. As our in silico models <strong>of</strong> these TUs andUGPs continued to be supported by progressively largeramounts <strong>of</strong> increasingly higher-quality genomic andcDNA sequences over time, even while gap-filling by theHuman <strong>Genom</strong>e Project permitted completion <strong>of</strong> the dataset with additional genes mapping to our 5q31 region, weset out to determine whether the patterns <strong>of</strong> incidence andgenomic distribution <strong>of</strong> TUs and UGPs observed at 5q31would be also detectable over larger intervals elsewherein the genome. Our development <strong>of</strong> an automated TU andUGP discovery pipeline and subsequent validation <strong>of</strong> the5q31 observations over the entire 35-Mb euchromatic sequence<strong>of</strong> human chromosome 22 constitute the balance<strong>of</strong> the study.OPERATIONAL DEFINITIONS1. Transcriptional unit (TU). A TU is a transcribed featurein the genome other than a known gene. It is predictedin silico from analyzing EST-to-genomic DNAalignments in which the ESTs do not correspond toknown or undocumented exons <strong>of</strong> known genes. ESTscomprising a TU must be canonically spliced (GT-AGintrons) and/or canonically polyadenylated(AATAAA or ATTAAA polyadenylation signalwithin 40 bp <strong>of</strong> the submitter-indicated 3´ end). Indefining TUs, we excluded ESTs from the ORESTESdata set (Strausberg et al. 2002) and the RAGE data set(Harrington et al. 2001) because the former containslarge numbers <strong>of</strong> unspliced, singleton, and chimericESTs, and the latter is derived from cell lines with artificialpromoter insertions and therefore is not representative<strong>of</strong> naturally occurring transcription.Since our definition <strong>of</strong> a TU was developed prior toand independently from that <strong>of</strong> Carninci et al. (2003),it is not identical to that <strong>of</strong> Carninci et al. Due to theabsence <strong>of</strong> an experimental component, our analysiscannot distinguish functionally important TUs fromnonfunctional, stochastically transcribed TUs.2. Unconventional gene pair (UGP) type 1: cis-antisense.<strong>The</strong> term “cis-antisense” means that both members<strong>of</strong> the pair are encoded within the same genomiclocus. cis-antisense overlaps included in this analysismust be exon-to-exon, meaning that the predicted matureRNAs must overlap (intronic intercalation aloneis insufficient). Two transcribed features cis-antisenseto one another, residing on the opposite strands <strong>of</strong> thesame locus, may be two genes, two TUs, or one <strong>of</strong>each. If only the first exons <strong>of</strong> the two features overlap,then the overlap is categorized as head-to-head. Ifonly the last exons overlap, then the overlap is tail-totail.<strong>The</strong> “other” category is for all remaining possibilities.3. Unconventional gene pair (UGP) type 2: putative promoter-sharing.This is a pair <strong>of</strong> divergently transcribedfeatures whose transcription start sites are separatedby 100 amino acids in length, and most <strong>of</strong> its ORFs were eitherunique or located inside expressed repetitive elements,supporting the notion that if this TU is functional,its function is not to encode a protein. <strong>The</strong> TU is cis-antisenseto an internal, translated exon <strong>of</strong> the TTID gene,which may be unusual because most cis-antisense in hu-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!