16.06.2013 Views

Evolution of the genomes of two nematodes in the ... - Ken Wolfe

Evolution of the genomes of two nematodes in the ... - Ken Wolfe

Evolution of the genomes of two nematodes in the ... - Ken Wolfe

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

(a) if it could only be aligned us<strong>in</strong>g T-COFFEE (Notredame et al., 2000) to < 25% <strong>of</strong> <strong>the</strong> lengths<br />

<strong>of</strong> its top <strong>two</strong> matches <strong>in</strong> Caenorhabditis or <strong>in</strong> SwissProt 40.38 (Boeckmann et al., 2003);<br />

(b) if it did not have any BLASTP hit <strong>in</strong> Caenorhabditis or SwissProt, <strong>of</strong> E-value < 10 −10 with<br />

<strong>the</strong> SEG filter on (Wootton and Federhen, 1996), or < 10 −20 with SEG <strong>of</strong>f; or<br />

(c) if it had a with<strong>in</strong>-species match, but no cross-species match, and was < 50 am<strong>in</strong>o acids long.<br />

This yielded <strong>the</strong> f<strong>in</strong>al (G3) gene sets for C. elegans and C. briggsae.<br />

3.5.2 F<strong>in</strong>d<strong>in</strong>g C. briggsae-C. elegans Orthologues<br />

We ran NCBI BLASTP (Altschul et al., 1997) with <strong>the</strong> C. briggsae prote<strong>in</strong> set as <strong>the</strong> query database and<br />

<strong>the</strong> C. elegans WS77 ∗ prote<strong>in</strong> set as <strong>the</strong> target database, and vice versa. For C. elegans WS77 ∗ genes<br />

that have alternative transcripts, we only took <strong>the</strong> longest splice variant.<br />

We found orthologues <strong>in</strong> this way:<br />

1. We found C. briggsae-C. elegans gene pairs that were each o<strong>the</strong>r’s top BLASTP hits. We required<br />

<strong>the</strong> BLASTP hits to have an E-value <strong>of</strong> < 10 −10 with <strong>the</strong> SEG filter (Wootton and Federhen, 1996)<br />

on, or < 10 −20 with SEG <strong>of</strong>f. Fur<strong>the</strong>rmore, to avoid assign<strong>in</strong>g paralogues to orthologue pairs, <strong>the</strong><br />

top hit had to have an E-value 10 5 times lower (more significant) than <strong>the</strong> next best hit.<br />

2. We found additional orthologues by analys<strong>in</strong>g conserved gene order. We found syntenic blocks by<br />

look<strong>in</strong>g for orthologues A (found <strong>in</strong> step 1) that were nearby to orthologues B (also found <strong>in</strong> step 1)<br />

<strong>in</strong> both species. We identified C. briggsae-C. elegans gene pairs with<strong>in</strong> <strong>the</strong> A-B syntenic block that<br />

were each o<strong>the</strong>r’s top BLASTP hits with<strong>in</strong> <strong>the</strong> A-B block (although not each o<strong>the</strong>r’s top BLASTP<br />

hits with<strong>in</strong> <strong>the</strong> genome). To avoid assign<strong>in</strong>g paralogues to orthologue pairs, <strong>the</strong> top hit had to have<br />

an E-value 10 5 times lower (more significant) than <strong>the</strong> next best hit <strong>in</strong> <strong>the</strong> A-B syntenic block.<br />

3. Fur<strong>the</strong>rmore, we identified C. briggsae-C. elegans gene pairs that were each o<strong>the</strong>r’s top BLASTP<br />

hits and that were with<strong>in</strong> 100 kb <strong>of</strong> orthologues C (found <strong>in</strong> step 1) <strong>in</strong> both species.<br />

3.5.3 Detect<strong>in</strong>g Intron Ga<strong>in</strong> and Loss <strong>in</strong> Orthologues<br />

We used T-COFFEE (Notredame et al., 2000) to align all C. briggsae-C. elegans orthologue pairs. We<br />

<strong>the</strong>n searched <strong>the</strong> alignments for cases where exon i <strong>in</strong> species A aligned well to <strong>two</strong> adjacent exons j<br />

and k <strong>in</strong> species B. To ensure that orthologous exons were matched properly, we required that exons i<br />

and j, and exons i and k, had to consist <strong>of</strong> identical or conserved am<strong>in</strong>o acids across at least 20% <strong>of</strong> <strong>the</strong><br />

shorter exon.<br />

40

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!