11.07.2015 Views

Bioinformatics for DNA Sequence Analysis.pdf - Index of

Bioinformatics for DNA Sequence Analysis.pdf - Index of

Bioinformatics for DNA Sequence Analysis.pdf - Index of

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

238 Abascal, Zardoya, and Posada4. Examples4.1. Characterization<strong>of</strong> the Genetic Code <strong>of</strong>Speleonectestulumensis(Crustacea,Remipedia)The mt-genome <strong>of</strong> the arthropod species <strong>of</strong> Speleonectes tulumensis(16) is already deposited in GenBank and it is annotated at themoment <strong>of</strong> writing this chapter as having the InvertebrateMitochondrial Genetic Code (translation table number 5). TheNCBI taxonomic identifier is 84346, which we will use as input<strong>for</strong> GenDecoder (http://darwin.uvigo.es/s<strong>of</strong>tware/gendecoder.html)leaving all other options with the default settings (alignmentpositions either with Shannon entropy values higher than 2.0or with more than a 20% <strong>of</strong> gaps are filtered out).After a while, not more than a couple <strong>of</strong> minutes, the output <strong>of</strong>GenDecoder is printed out to the browser window. Some in<strong>for</strong>mationreferred to the alignments, the number <strong>of</strong> positions filtered,and the codon-usage is displayed first. Alignments can beinspected with the help <strong>of</strong> the program Jalview (15) by clicking inthe ‘‘get alignment’’ link next to the gene names. The main result<strong>of</strong> GenDecoder is at the end <strong>of</strong> the page (Fig. 11.3a). Twopredictions (highlighted in red) are different from the annotationsin GenBank:(1) The AGC codon is predicted as Gly instead <strong>of</strong> Ser. TheAGC codon occurs 13 times in the mt-genome <strong>of</strong> S. tulumensis,butonly three instances occur at alignment positions below the defaultconservancy threshold. Hence, this prediction is based on threecodon occurrences only. In the ‘‘Freq-aa’’ line we can see that thefrequency <strong>of</strong> Gly is only around 0.3 (previously, in the results page,it was indicated that the frequency is exactly 0.32), and in the line‘‘Diff-freq’’ it is indicated that this frequency is only about 0.1 largerthan the frequency <strong>of</strong> the expected amino acid (Ser, 0.24). Byclicking on the red ‘‘g’’ (see Note 3) we can see the three alignmentpositions at which AGC appear (Fig. 11.3b): twice in the COX1alignment (positions 56 and 167) and once in the ND4 alignment(position 174). The poor support <strong>for</strong> the AGC assignment (lownumber <strong>of</strong> codons and low frequency <strong>of</strong> the predicted amino acid)suggests that this is an unreliable prediction. In fact, if we runGenDecoder with a s<strong>of</strong>ter threshold to include ‘‘variable sites (S

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!