13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

THE FINISHED HUMAN GENOME 7Information (NCBI) (http://www.ncbi.nlm.nih./gov/genome/guide/human/). <strong>The</strong> browsers give users accessto the vast amount <strong>of</strong> human sequence and map data andenable them to scan the genome for features that includegene structures, repeat families, STS markers positionedon genetic and physical maps, ESTs, coding and noncodingRNAs, C+G content, single-nucleotide polymorphisms(SNPs), and sequence similarities with other organisms.Since the first assembly <strong>of</strong> the draft sequence inOctober, 2000, the browsers have continued to provideregularly updated views <strong>of</strong> the human genome sequence,taking account <strong>of</strong> new genomic sequence information andnew supporting data from ongoing programs such as fulllengthcDNA sequencing (http://genome.rtc.riken.go.jp/,http://www.ncbi.nlm.nih.gov/MGC/), which help to improvethe gene builds. Another recent improvement hasbeen the integration <strong>of</strong> EST data into gene building, whichaids in the prediction <strong>of</strong> noncoding exons, especially thoselocated within the 3´UTR, and the prediction <strong>of</strong> pseudogenes(see Birney et al., this volume). <strong>The</strong> browsers alsoregularly add new features that provide improved functionalityfor biologists using genomic information.Although the genome browsers provide consistentviews <strong>of</strong> annotation across the most recent assembly <strong>of</strong>the human genome sequence, it is clear that further workis needed to achieve the highest levels <strong>of</strong> accuracy. Comparison<strong>of</strong> Ensembl genes with regions <strong>of</strong> the genome thathave been manually annotated and experimentally investigated,such as Chromosomes 20 and 22, shows currentlythat approximately 70% <strong>of</strong> gene loci are identified by theautomated system. Features that are underrepresented includesingle-exon genes, pseudogenes, splice variants,and sites <strong>of</strong> splicing and polyadenylation (Ashurst andCollins 2003). To provide the most comprehensive set <strong>of</strong>human genes, systematic manual annotation has beentaken on so far by the sequencing centers coordinating thefinishing process <strong>of</strong> each finished human chromosome.<strong>The</strong> Sanger Institute carried out the initial annotation <strong>of</strong>Chromosome 22 (Dunham et al. 1999), updated in 2003(Collins et al. 2003), and this has been followed by systematicannotation <strong>of</strong> Chromosomes 20 (Deloukas et al.2001), 6 (Mungall et al. 2003), 9, 10, 13, X, and 1. Manualannotation <strong>of</strong> Chromosomes 21 (Hattori et al. 2000),14 (Heilig et al. 2003), 7 (Hillier et al. 2003), and Y(Skaletsky et al. 2003) has been published by centers thathave produced those sequences. Annotation for the remainingchromosomes will be generated by participatingcenters over the coming months.<strong>The</strong> products <strong>of</strong> the manual annotation process areavailable in the Vertebrate and <strong>Genom</strong>e Analysis(VEGA) database, http://vega.sanger.ac.uk, and canalso be viewed in Ensembl. A view <strong>of</strong> VEGA gene predictionsand Ensembl genes in a region <strong>of</strong> Chromosome20 is shown in Figure 5. <strong>The</strong> VEGA browser takes advantage<strong>of</strong> the Ensembl database framework and displaysexperimental evidence associated with each prediction.In contrast to gene structures generated byautomated systems, local amendments can be made togene predictions rapidly by annotators in response touser feedback, and the database is intended to providean accurate, continually updated resource for the researchcommunity.To achieve consistency in primary manual annotationacross the genome, guidelines have been established in aseries <strong>of</strong> Human Annotation Workshop (HAWK) meetings.<strong>The</strong> model provided by the VEGA database andHAWK guidelines gives a possible mechanism for futurelong-term re-annotation and curation <strong>of</strong> the humangenome sequence. <strong>The</strong> experience on Chromosome 22,re-annotated three years after the initial annotation wasproduced, indicates the value to the research community<strong>of</strong> establishing systematic ongoing curation on a genomewidebasis: Using data from EST databases, comparativegenome analysis, and experimental verification to fusepreviously fragmented genes and identify novel genes resultedin a 74% increase in total annotated exon sequencelength on this chromosome. This included one new gene,100 pseudogenes, and 31 non-protein-coding transcripts,<strong>of</strong> which 16 are likely antisense RNAs (Collins et al.2003).FUTURE PROSPECTSAnnotating the genes is only the beginning, and alreadyit is clear that a truly comprehensive picture <strong>of</strong> thecomplexity <strong>of</strong> the human gene set, including all the biologicallyimportant alternative transcripts and promoters,will take some years to emerge. What about other functionallyimportant sequences? An initial comparison betweenthe human and mouse genome sequences suggestedthat approximately 5% <strong>of</strong> the human genome isunder purifying selection and may therefore comprisefunctionally important conserved sequences (Waterstonet al. 2002). One-third <strong>of</strong> this contains nearly all the protein-codingexons, and the other two-thirds have non-protein-codingsequence. <strong>The</strong> use <strong>of</strong> comparative sequenceanalysis was taken further in a multispecies comparison<strong>of</strong> a 1.8-Mb region <strong>of</strong> human Chromosome 7, which ledto the identification <strong>of</strong> multiple-species conserved sites(MCSs) that are strong candidates for these sequences(Margulies et al. 2003; Thomas et al. 2003; Margulies etal., this volume). Other studies have also illustrated thevalue <strong>of</strong> comparing genomes <strong>of</strong> different vertebratespecies to discover new genes or regulatory elements(“phylogenetic footprinting”) (Gottgens et al. 2000; Pennacchioet al. 2001; Pennacchio and Rubin 2001). <strong>The</strong>same principle has been extended to comparisons betweenmultiple primate sequences (“phylogenetic shadowing”)(B<strong>of</strong>felli et al. 2003). This work has formed thebasis for a plan to evaluate in detail different methods toextract as much biological information as possible in selectedregions <strong>of</strong> the genome comprising a total <strong>of</strong> 30 Mb(1%), the ENCyclopedia <strong>of</strong> DNA Elements (ENCODE)project (http://www.genome.gov/10005107).<strong>The</strong> finished human genome sequence provides resourcesfor further experimental investigation. For example,the large-insert genomic clones that formed the map(BACs and PACs) have been used to generate genomicmicroarrays for the investigation <strong>of</strong> chromosomal aberrationsthat accompany disease (Solinas-Toldo et al. 1997;

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!