16.06.2013 Views

Evolution of the genomes of two nematodes in the ... - Ken Wolfe

Evolution of the genomes of two nematodes in the ... - Ken Wolfe

Evolution of the genomes of two nematodes in the ... - Ken Wolfe

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>the</strong> accuracy <strong>of</strong> <strong>the</strong> gene prediction programs or selection procedure <strong>in</strong> C. briggsae, because we lack an<br />

<strong>in</strong>dependent data set to create a gold standard.<br />

The f<strong>in</strong>al transposon-and-pseudogene-filtered C. briggsae gene set conta<strong>in</strong>s 19,507 genes, and <strong>the</strong><br />

transposon-and-pseudogene-filtered hybrid C. elegans gene set conta<strong>in</strong>s 20,621 genes. Some <strong>of</strong> <strong>the</strong> gene<br />

predictions taken from WormBase WS77 have alternative splices, so <strong>the</strong> 20,621 C. elegans genes have<br />

21,578 different splice variants. There is little EST data for C. briggsae, so we are currently unable to<br />

predict alternative splices <strong>in</strong> C. briggsae.<br />

In order to compare <strong>the</strong> transposon-and-pseudogene-filtered C. briggsae and C. elegans hybrid gene<br />

sets to <strong>the</strong> C. elegans WS77 gene set, we applied our transposon and pseudogene filter<strong>in</strong>g step to <strong>the</strong><br />

C. elegans WS77 gene set. This removed 619 genes to create a “pruned” WS77 set <strong>of</strong> 18,808 genes and<br />

19,791 splices. This pruned set is henceforth called WS77 ∗ . Some <strong>of</strong> <strong>the</strong> predictions discarded by our<br />

filter<strong>in</strong>g step may <strong>in</strong>clude real exons, s<strong>in</strong>ce 29 (9%) <strong>of</strong> <strong>the</strong> 316 putative pseudogenes <strong>in</strong> C. elegans WS77<br />

that were discarded have been partially or fully confirmed by EST or cDNA data.<br />

Data files conta<strong>in</strong><strong>in</strong>g <strong>the</strong> C. briggsae sequence and gene predictions can be found at ftp://ftp.wormbase.<br />

org/pub/wormbase/briggsae/. The results can also be browsed at http://www.wormbase.org/.<br />

3.2.2 Compar<strong>in</strong>g <strong>the</strong> C. briggsae and C. elegans Gene Sets<br />

The C. briggsae gene set (19,507 genes), <strong>the</strong> C. elegans WS77 ∗ gene set (18,808 genes) and <strong>the</strong> C. elegans<br />

hybrid gene set (20,621 genes) all conta<strong>in</strong> about <strong>the</strong> same number <strong>of</strong> genes. The recent WormBase<br />

C. elegans release WS103 (June 2003; ∼19,600 curated genes) also has a similar number.<br />

The unspliced lengths <strong>of</strong> genes are roughly <strong>the</strong> same <strong>in</strong> <strong>the</strong> <strong>two</strong> species (C. briggsae median 1.9 kb,<br />

C. elegans WS77 ∗ 1.9 kb; Table 3.1). The total length <strong>of</strong> <strong>the</strong> C. briggsae genome occupied by <strong>the</strong> 19,507<br />

genes, <strong>in</strong>clud<strong>in</strong>g <strong>the</strong>ir <strong>in</strong>trons, is 56 Mb (54% <strong>of</strong> <strong>the</strong> 102 Mb assembly) — about <strong>the</strong> same fraction <strong>of</strong> <strong>the</strong><br />

C. elegans genome occupied by <strong>the</strong> WS77 ∗ gene set. Thus <strong>the</strong> larger size <strong>of</strong> <strong>the</strong> C. briggsae genome (by<br />

∼4 Mb) is not due to an <strong>in</strong>crease <strong>in</strong> <strong>the</strong> number or size <strong>of</strong> prote<strong>in</strong> cod<strong>in</strong>g genes (but ra<strong>the</strong>r to repetitive<br />

DNA; Ste<strong>in</strong> et al., 2003).<br />

The C. elegans gene sets have slightly more <strong>in</strong>trons than <strong>the</strong> C. briggsae hybrid set (Table 3.1). Some<br />

extra <strong>in</strong>trons may be due to hand-curation <strong>of</strong> <strong>the</strong> WS77 gene set, s<strong>in</strong>ce extra exons that were missed<br />

by gene prediction s<strong>of</strong>tware are added dur<strong>in</strong>g curation. However, as shown <strong>in</strong> C. briggsae-C. elegans<br />

Orthologues (below), a portion <strong>of</strong> <strong>the</strong> <strong>in</strong>tron differences are true evolutionary changes.<br />

32

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!