16.06.2013 Views

Evolution of the genomes of two nematodes in the ... - Ken Wolfe

Evolution of the genomes of two nematodes in the ... - Ken Wolfe

Evolution of the genomes of two nematodes in the ... - Ken Wolfe

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.4 FUTURE WORK<br />

We found ∼1650 orphans <strong>in</strong> <strong>the</strong> <strong>two</strong> worms. Some <strong>of</strong> <strong>the</strong>se orphans may be novel genes that have arisen<br />

<strong>in</strong> one <strong>of</strong> <strong>the</strong> <strong>two</strong> <strong>genomes</strong> s<strong>in</strong>ce <strong>the</strong> species diverged (Long, 2001). However, o<strong>the</strong>rs <strong>of</strong> <strong>the</strong> candidate<br />

orphans may not be real orphans at all, but are ei<strong>the</strong>r pseudogenes that have not yet been deleted, or<br />

are very rapidly evolv<strong>in</strong>g genes that have diverged so rapidly that <strong>the</strong> <strong>the</strong> BLAST and Smith-Waterman<br />

algorithms (used <strong>in</strong> <strong>the</strong> Gene Families section <strong>of</strong> Ste<strong>in</strong> et al., 2003) cannot recognise <strong>the</strong>ir cross-species<br />

matches. In C. elegans, orphans are clustered on <strong>the</strong> arms <strong>of</strong> chromosomes: regions with unusually high<br />

rates <strong>of</strong> chromosomal rearrangement, am<strong>in</strong>o acid substitution, and transposable element <strong>in</strong>sertion (Ste<strong>in</strong><br />

et al., 2003). I am <strong>in</strong>terested <strong>in</strong> <strong>in</strong>vestigat<strong>in</strong>g whe<strong>the</strong>r <strong>the</strong> novel worm genes arose as by-products <strong>of</strong><br />

chromosomal rearrangements, s<strong>in</strong>ce rearrangements have been implicated <strong>in</strong> <strong>the</strong> birth <strong>of</strong> some novel<br />

genes (Long, 2001).<br />

3.5 METHODS<br />

3.5.1 Prote<strong>in</strong> cod<strong>in</strong>g Gene Prediction<br />

We predicted prote<strong>in</strong> cod<strong>in</strong>g genes <strong>in</strong> <strong>the</strong> C. briggsae genome us<strong>in</strong>g Genef<strong>in</strong>der (version 980506; Phil<br />

Green, unpublished s<strong>of</strong>tware), Fgenesh (Salamov and Solovyev, 2000), Tw<strong>in</strong>scan (Korf et al., 2001),<br />

and <strong>the</strong> Ensembl annotation system (Clamp et al., 2003). We also ran Genef<strong>in</strong>der and Fgenesh on <strong>the</strong><br />

C. elegans genome.<br />

The four gene prediction programs yielded a comb<strong>in</strong>ed total <strong>of</strong> 430,575 exon predictions and 73,997<br />

gene predictions <strong>in</strong> <strong>the</strong> C. briggsae assembly. Many <strong>of</strong> <strong>the</strong> predictions from different programs overlapped,<br />

so <strong>the</strong> actual number <strong>of</strong> exons and genes is far less. The C. elegans data consist<strong>in</strong>g <strong>of</strong> WS77 gene models<br />

and Fgenesh and Genef<strong>in</strong>der predictions totalled 393,529 exon predictions and 61,525 gene predictions.<br />

To select among overlapp<strong>in</strong>g predictions produced by different programs, we developed a selection<br />

procedure that worked as follows:<br />

1. Many <strong>of</strong> <strong>the</strong> exons predicted by different programs overlapped. We took only <strong>the</strong> longer <strong>of</strong> any <strong>two</strong><br />

exons that overlapped by ≥75% <strong>of</strong> <strong>the</strong>ir lengths and were <strong>in</strong> <strong>the</strong> same read<strong>in</strong>g frame.<br />

2. We clustered <strong>the</strong> exons with<strong>in</strong> each species. Two exons were put <strong>in</strong> <strong>the</strong> same “exon cluster” if ≥1<br />

gene prediction program placed <strong>the</strong>m toge<strong>the</strong>r <strong>in</strong> a gene prediction. Each exon-cluster consisted <strong>of</strong><br />

≥1 overlapp<strong>in</strong>g gene predictions.<br />

3. For each exon-cluster X, we found <strong>the</strong> most homologous exon-cluster Y <strong>in</strong> <strong>the</strong> o<strong>the</strong>r species. Cluster<br />

Y was <strong>the</strong> exon-cluster with <strong>the</strong> top BLASTP (Altschul et al., 1997) hit from any <strong>of</strong> <strong>the</strong> exons<br />

<strong>in</strong> X. For example, for <strong>the</strong> C. elegans exon-cluster conta<strong>in</strong><strong>in</strong>g <strong>the</strong> ce-acy-4 gene, its top homologue<br />

was <strong>the</strong> C. briggsae exon-cluster conta<strong>in</strong><strong>in</strong>g <strong>the</strong> cb-acy-4 gene (Figure 3.1).<br />

38

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!