13.07.2015 Views

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

The Genom of Homo sapiens.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

278 JAILLON ET AL.Table 1. Distribution <strong>of</strong> (Rice/Arabidopsis) Ecores in the Sequence <strong>of</strong> Arabidopsis thalianaEcores Ecores Ecores inGenes within Exons overlapping genes notEcores Genes detected genes Exons detected exons in exonsNumbers 80,010 26,027 19,235 73,119 135,318 64,032 72,396 723(%) (100) (100) (74) (91) (100) (47) (90) (1)11,620 annotated Arabidopsis genes for which no correspondingfull-length cDNA is available.A total <strong>of</strong> 6891 ecores were found to be located outsidegene annotations. Of these, 2980 were found in other annotatedfeatures such as transposons, tRNAs, or pseudogenes.<strong>The</strong> presence <strong>of</strong> ecores in pseudogenes is expectedand difficult to avoid. Transposons that were matched byecores correspond to cases that escaped masking. However,we expect that a substantial fraction <strong>of</strong> the 3911 remainingun-annotated ecores correspond to gene extensionsor to undetected genes. Again, experimentalevidence based on new cDNAs confirmed that 150 ecorescould be included in 93 gene extensions. It is, however,impossible to estimate the fraction <strong>of</strong> genes that could beextended, since we cannot determine the fraction <strong>of</strong> trulyfull length cDNAs in the collection <strong>of</strong> novel cDNAs thatis being used for experimental validation.To analyze these gene extensions further, we constructedecotig gene models (see Methodology). Of the80,010 ecores, 70,847 were incorporated in 15,311ecotigs and 9,163 remained as singletons. A total <strong>of</strong>14,308 ecotigs (67,607 ecores) matched 15,496 genes inArabidopsis; 712 gene models are overlapped by two ormore ecotigs (1,433 ecotigs). Conversely, 1,413 ecotigsled to the fusion <strong>of</strong> 3,307 annotated genes. It remains tobe seen whether these fusions are correlated with conservation<strong>of</strong> synteny between genes from both plants. This isan obvious drawback <strong>of</strong> the ecotig method that may neverthelessbe <strong>of</strong> interest in the identification <strong>of</strong> conservation<strong>of</strong> synteny between two genomes, for which it couldeven provide a measurement.<strong>The</strong> construction <strong>of</strong> ecotigs can first be applied to extendgene models, and in their present stage, 697 annotatedArabidopsis genes could be potentially extended onthe basis <strong>of</strong> the ecotigs (914 ecores). Of the 93 gene extensionsthat were experimentally supported by cDNAsequences described above, 64 could be included inecotigs.Among the 1,003 ecotigs (3,240 ecores) located in regionswith no gene annotation, 619 match a transposon, atRNA, or a pseudogene. Of the 384 remaining ecotigs,245 were subjected to manual curation, which selected 98(255 ecores) as potential gene candidates. Experimentalevidence based on cDNAs was available for 19 <strong>of</strong> thesecandidates. In addition, singleton ecores may also indicatethe existence <strong>of</strong> additional genes, since about 40 suchsingletons were confirmed by novel cDNAs. Interestingly,many <strong>of</strong> these novel gene candidates (55%) encodesmall open reading frames (smORFs) with a CDS

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!