13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ANALYSIS OF HUMAN PROMOTERS 219Integrated Data Flow ChartTranscriptsRefSeq<strong>Genom</strong>e ScanTwin ScanHuman <strong>Genom</strong>eFirst EFTranscriptsMouse <strong>Genom</strong>eRefSeqRefSeqmRNA ENSEMBL mRNAENSEMBL<strong>Genom</strong>e ScanTwin ScanFirst EFTranscriptsFgenesh+Twin ScanRat <strong>Genom</strong>eFirst EFHumanGenesmRNA ENSEMBL MousePredictedpromotersGenesPredictedpromotersRatGenesPredictedpromotersHsPDMmPDRnPDTRED: Transcription Regulation Element DatabaseMySQLGBrowseComparative AnalysisOther ApplicationFigure 1. Flow chart for Cold Spring Harbor Laboratory Mammalian Promoter Database.putability in addition to manual browsibility will servewell both computational and experimental biologists.FUNCTIONAL CURATION OF CELL CYCLETRANSCRIPTION FACTORS AND THEIRTARGET GENESA promoter reference system created by automaticpipeline can ensure completeness; it is consistent withmost <strong>of</strong> the known information and also has reasonableaccuracy. It must contain rich functional information(TFs, TFBSs, TSS, CpG islands) and links to other relateddatabases and literature reference in order to be useful.<strong>The</strong>refore, we are adding to HsPD/MmPD/RnPD,TRED (Fig. 1) (transcription regulatory elementdatabase; F. Zhao et al., unpubl.), which allows semi-automatedor even hand-curated information to be entered.<strong>The</strong> three most important issues every useful databasemust address are (1) to assign quality value to the rawrecord; (2) to ensure accuracy and usefulness; and (3) toopen data disseminations. For issue 1, we have assigneddifferent quality values to promoters and TFBSs accordingto how they were derived. For issue 3, we are discussingwith NCBI (D. Lipman, pers. comm.) and EBI(E. Birney, pers. comm.) ways to incorporate our resultsinto public databases. <strong>The</strong> most difficult and time-consumingtask is issue 2, which involves hand-curation andoutreach to transcription expert labs. We are initially focusingon cell cycle and cancer-related TFs includingtheir target genes, and we will give authorship to relatedtranscription labs that contribute data or expertise. Currently,<strong>of</strong> 60,519 promoters (40,658 genes) in the humanpart <strong>of</strong> TRED, only 2,003 promoters (1,853 genes) are inthe best-quality class (known and curated class). Otherclasses are (1) known but not curated, (2) predicted basedon Refseq, (3) predicted based on other mRNAs, (4) predictedbased on other ESTs, and (5) purely predicted. Asan example, for human E2F targets, TRED contains 233promoters (182 genes) in the best-quality class.HIGH-THROUGHPUT EXPERIMENTALVALIDATIONSAll computational predictions must be subjected to experimentalverification, and both positive and negativeresults are crucial feedbacks for further database and algorithmimprovement. A lack <strong>of</strong> high-throughput experimentalvalidation has become the bottleneck in this feedbackloop. As cDNA libraries become more saturating,novel gene finding has gradually shifted its paradigmfrom EST sequencing to computational prediction plusexperimental validation (Das et al. 2001; Guigo et al.2003). To validate first exons and TSSs, getting 5´ completecDNAs is essential (Davuluri et al. 2000; Suzuki et

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!