Evolution of the genomes of two nematodes in the ... - Ken Wolfe
Evolution of the genomes of two nematodes in the ... - Ken Wolfe
Evolution of the genomes of two nematodes in the ... - Ken Wolfe
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>the</strong> accuracy <strong>of</strong> <strong>the</strong> gene prediction programs or selection procedure <strong>in</strong> C. briggsae, because we lack an<br />
<strong>in</strong>dependent data set to create a gold standard.<br />
The f<strong>in</strong>al transposon-and-pseudogene-filtered C. briggsae gene set conta<strong>in</strong>s 19,507 genes, and <strong>the</strong><br />
transposon-and-pseudogene-filtered hybrid C. elegans gene set conta<strong>in</strong>s 20,621 genes. Some <strong>of</strong> <strong>the</strong> gene<br />
predictions taken from WormBase WS77 have alternative splices, so <strong>the</strong> 20,621 C. elegans genes have<br />
21,578 different splice variants. There is little EST data for C. briggsae, so we are currently unable to<br />
predict alternative splices <strong>in</strong> C. briggsae.<br />
In order to compare <strong>the</strong> transposon-and-pseudogene-filtered C. briggsae and C. elegans hybrid gene<br />
sets to <strong>the</strong> C. elegans WS77 gene set, we applied our transposon and pseudogene filter<strong>in</strong>g step to <strong>the</strong><br />
C. elegans WS77 gene set. This removed 619 genes to create a “pruned” WS77 set <strong>of</strong> 18,808 genes and<br />
19,791 splices. This pruned set is henceforth called WS77 ∗ . Some <strong>of</strong> <strong>the</strong> predictions discarded by our<br />
filter<strong>in</strong>g step may <strong>in</strong>clude real exons, s<strong>in</strong>ce 29 (9%) <strong>of</strong> <strong>the</strong> 316 putative pseudogenes <strong>in</strong> C. elegans WS77<br />
that were discarded have been partially or fully confirmed by EST or cDNA data.<br />
Data files conta<strong>in</strong><strong>in</strong>g <strong>the</strong> C. briggsae sequence and gene predictions can be found at ftp://ftp.wormbase.<br />
org/pub/wormbase/briggsae/. The results can also be browsed at http://www.wormbase.org/.<br />
3.2.2 Compar<strong>in</strong>g <strong>the</strong> C. briggsae and C. elegans Gene Sets<br />
The C. briggsae gene set (19,507 genes), <strong>the</strong> C. elegans WS77 ∗ gene set (18,808 genes) and <strong>the</strong> C. elegans<br />
hybrid gene set (20,621 genes) all conta<strong>in</strong> about <strong>the</strong> same number <strong>of</strong> genes. The recent WormBase<br />
C. elegans release WS103 (June 2003; ∼19,600 curated genes) also has a similar number.<br />
The unspliced lengths <strong>of</strong> genes are roughly <strong>the</strong> same <strong>in</strong> <strong>the</strong> <strong>two</strong> species (C. briggsae median 1.9 kb,<br />
C. elegans WS77 ∗ 1.9 kb; Table 3.1). The total length <strong>of</strong> <strong>the</strong> C. briggsae genome occupied by <strong>the</strong> 19,507<br />
genes, <strong>in</strong>clud<strong>in</strong>g <strong>the</strong>ir <strong>in</strong>trons, is 56 Mb (54% <strong>of</strong> <strong>the</strong> 102 Mb assembly) — about <strong>the</strong> same fraction <strong>of</strong> <strong>the</strong><br />
C. elegans genome occupied by <strong>the</strong> WS77 ∗ gene set. Thus <strong>the</strong> larger size <strong>of</strong> <strong>the</strong> C. briggsae genome (by<br />
∼4 Mb) is not due to an <strong>in</strong>crease <strong>in</strong> <strong>the</strong> number or size <strong>of</strong> prote<strong>in</strong> cod<strong>in</strong>g genes (but ra<strong>the</strong>r to repetitive<br />
DNA; Ste<strong>in</strong> et al., 2003).<br />
The C. elegans gene sets have slightly more <strong>in</strong>trons than <strong>the</strong> C. briggsae hybrid set (Table 3.1). Some<br />
extra <strong>in</strong>trons may be due to hand-curation <strong>of</strong> <strong>the</strong> WS77 gene set, s<strong>in</strong>ce extra exons that were missed<br />
by gene prediction s<strong>of</strong>tware are added dur<strong>in</strong>g curation. However, as shown <strong>in</strong> C. briggsae-C. elegans<br />
Orthologues (below), a portion <strong>of</strong> <strong>the</strong> <strong>in</strong>tron differences are true evolutionary changes.<br />
32