13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

WHOLE-GENOME COMPARISONS 277Figure 2. Construction <strong>of</strong> ecotigs. <strong>The</strong> two first lines represent, respectively, the ecores (boxes) and the HSPs (segments) detected ongenome B, using genome A as a query. <strong>The</strong> two following lines represent, respectively, the HSPs and the ecores detected on genomeA, using genome B as a query. <strong>The</strong> bottom line represents the ecotig gene models constructed on genome A. Matching HSPs are linkedby dotted lines. Matching ecores are identified by the same prefix (i, ii, etc.). Numbers over (or under) arrows represent distances separatingecores that are consecutive on genome A (number <strong>of</strong> consecutive ecores minus one).genes may remain colinear on the pair <strong>of</strong> comparedgenomes, depending on the degree <strong>of</strong> conserved synteny<strong>of</strong> the genome pair analyzed. In the rest <strong>of</strong> this paper, wedesignate the pair <strong>of</strong> matched genomes with the followingnotation: (query/target).RESULTS<strong>The</strong> ecore detection and ecotig construction procedureshave been applied to compare draft or complete genomesequences from various multicellular organisms. We appliedEx<strong>of</strong>ish comparisons to several genome pairs betweenplant, insect, and vertebrate genomes, namely,mammals (mouse/human) with pufferfish (Tetraodon/Takifugu), the Drosophila melanogaster sequence withthe Anopheles gambiae draft, and the Arabidopsis sequencewith the rice genome draft. Such comparisonswere used (1) to detect gene models or exons that havenot yet been identified in one or both genomes, (2) to extendexisting gene models, and (3) to determine the degree<strong>of</strong> completion <strong>of</strong> existing annotations.Plant <strong>Genom</strong>es<strong>The</strong> recent availability <strong>of</strong> a draft sequence <strong>of</strong> the ricegenome with sufficient coverage (G<strong>of</strong>f et al. 2002) hasopened the possibility <strong>of</strong> comparing whole plant genomesfor the first time. In addition, the genome <strong>of</strong> Arabidopsishas been the focus <strong>of</strong> several extensive annotation projectsthat make this genome one <strong>of</strong> the best documentedto date. This situation enabled us to use Arabidopsis as areference on which Ex<strong>of</strong>ish performances can be evalu-ated. Such an evaluation benefited also from the availability<strong>of</strong> a source <strong>of</strong> new cDNA sequences that have notyet been used in the Arabidopsis genome analyses buthave served as a support for experimental validation <strong>of</strong>comparison-based predictions.Ex<strong>of</strong>ish was first calibrated using a set <strong>of</strong> 1589 Arabidopsisgenes that had been manually annotated. <strong>The</strong> optimalconditions we determined produce a specificityabove 99% and a sensitivity at the exon and gene level <strong>of</strong>64% and 93%, respectively. <strong>The</strong> global Ex<strong>of</strong>ish comparisonwas performed between the finished Arabidopsisgenome sequence <strong>The</strong> Arabidopsis <strong>Genom</strong>e Initiative(2000) used as a target and the BAC-based sequence draftestablished by the International Rice <strong>Genom</strong>e SequencingProgram used as a query. Ecores were mapped relativeto the Arabidopsis genome annotation.Statistics on ecores detected within and outside annotationsare shown in Table 1. A total <strong>of</strong> 74% <strong>of</strong> the annotatedgenes (MIPS annotation) included one ecore atleast, and 47% <strong>of</strong> the annotated exons are matched by oneor more ecores. Conversely, 91% <strong>of</strong> the ecores are localizedwithin the boundaries <strong>of</strong> annotated genes, and onlyabout 1% <strong>of</strong> these ecores do not match an annotated exon.In a subset <strong>of</strong> 60 nonmatching ecores, experimental evidencebased on new cDNAs showed that 59 cases correspondto novel exons. We thus estimate that about 98% <strong>of</strong>the ecores within gene annotations, but which do notmatch annotated exons, correspond to real exons thatwere missed during the annotation process. Taking intoaccount that only one exon out <strong>of</strong> two is detected as anecore (Table 1), an extrapolation <strong>of</strong> this analysis suggeststhat about 900 internal exons are still missing in the set <strong>of</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!