Evolution of the genomes of two nematodes in the ... - Ken Wolfe
Evolution of the genomes of two nematodes in the ... - Ken Wolfe
Evolution of the genomes of two nematodes in the ... - Ken Wolfe
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
(a) if it could only be aligned us<strong>in</strong>g T-COFFEE (Notredame et al., 2000) to < 25% <strong>of</strong> <strong>the</strong> lengths<br />
<strong>of</strong> its top <strong>two</strong> matches <strong>in</strong> Caenorhabditis or <strong>in</strong> SwissProt 40.38 (Boeckmann et al., 2003);<br />
(b) if it did not have any BLASTP hit <strong>in</strong> Caenorhabditis or SwissProt, <strong>of</strong> E-value < 10 −10 with<br />
<strong>the</strong> SEG filter on (Wootton and Federhen, 1996), or < 10 −20 with SEG <strong>of</strong>f; or<br />
(c) if it had a with<strong>in</strong>-species match, but no cross-species match, and was < 50 am<strong>in</strong>o acids long.<br />
This yielded <strong>the</strong> f<strong>in</strong>al (G3) gene sets for C. elegans and C. briggsae.<br />
3.5.2 F<strong>in</strong>d<strong>in</strong>g C. briggsae-C. elegans Orthologues<br />
We ran NCBI BLASTP (Altschul et al., 1997) with <strong>the</strong> C. briggsae prote<strong>in</strong> set as <strong>the</strong> query database and<br />
<strong>the</strong> C. elegans WS77 ∗ prote<strong>in</strong> set as <strong>the</strong> target database, and vice versa. For C. elegans WS77 ∗ genes<br />
that have alternative transcripts, we only took <strong>the</strong> longest splice variant.<br />
We found orthologues <strong>in</strong> this way:<br />
1. We found C. briggsae-C. elegans gene pairs that were each o<strong>the</strong>r’s top BLASTP hits. We required<br />
<strong>the</strong> BLASTP hits to have an E-value <strong>of</strong> < 10 −10 with <strong>the</strong> SEG filter (Wootton and Federhen, 1996)<br />
on, or < 10 −20 with SEG <strong>of</strong>f. Fur<strong>the</strong>rmore, to avoid assign<strong>in</strong>g paralogues to orthologue pairs, <strong>the</strong><br />
top hit had to have an E-value 10 5 times lower (more significant) than <strong>the</strong> next best hit.<br />
2. We found additional orthologues by analys<strong>in</strong>g conserved gene order. We found syntenic blocks by<br />
look<strong>in</strong>g for orthologues A (found <strong>in</strong> step 1) that were nearby to orthologues B (also found <strong>in</strong> step 1)<br />
<strong>in</strong> both species. We identified C. briggsae-C. elegans gene pairs with<strong>in</strong> <strong>the</strong> A-B syntenic block that<br />
were each o<strong>the</strong>r’s top BLASTP hits with<strong>in</strong> <strong>the</strong> A-B block (although not each o<strong>the</strong>r’s top BLASTP<br />
hits with<strong>in</strong> <strong>the</strong> genome). To avoid assign<strong>in</strong>g paralogues to orthologue pairs, <strong>the</strong> top hit had to have<br />
an E-value 10 5 times lower (more significant) than <strong>the</strong> next best hit <strong>in</strong> <strong>the</strong> A-B syntenic block.<br />
3. Fur<strong>the</strong>rmore, we identified C. briggsae-C. elegans gene pairs that were each o<strong>the</strong>r’s top BLASTP<br />
hits and that were with<strong>in</strong> 100 kb <strong>of</strong> orthologues C (found <strong>in</strong> step 1) <strong>in</strong> both species.<br />
3.5.3 Detect<strong>in</strong>g Intron Ga<strong>in</strong> and Loss <strong>in</strong> Orthologues<br />
We used T-COFFEE (Notredame et al., 2000) to align all C. briggsae-C. elegans orthologue pairs. We<br />
<strong>the</strong>n searched <strong>the</strong> alignments for cases where exon i <strong>in</strong> species A aligned well to <strong>two</strong> adjacent exons j<br />
and k <strong>in</strong> species B. To ensure that orthologous exons were matched properly, we required that exons i<br />
and j, and exons i and k, had to consist <strong>of</strong> identical or conserved am<strong>in</strong>o acids across at least 20% <strong>of</strong> <strong>the</strong><br />
shorter exon.<br />
40