14.06.2013 Views

Databases and Systems

Databases and Systems

Databases and Systems

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Figure 2: Distribution of families according to the number of sequences they contain<br />

(without redundancy) [HOVERGEN release 25, July 1997].<br />

Protein multiple alignment <strong>and</strong> phylogenetic trees<br />

Protein multiple alignments are computed with CLUSTALV [20] for each gene<br />

family, except those that contain more than 150 sequences (Table 1). When several<br />

redundant CDSs are available, only one is included in the alignment. Phylogenetic<br />

trees are inferred from multiple alignments using the 'neighbor joining' method [21].<br />

Updating of data <strong>and</strong> rate of growth<br />

HOVERGEN is updated every two GenBank releases (every four months). New or<br />

modified sequences from GenBank are compared to HOVERGEN sequences <strong>and</strong><br />

between each others, first at the DNA level to identify redundancy, <strong>and</strong> then at the<br />

protein level to update the classification. Multiple alignments <strong>and</strong> phylogenetic trees<br />

27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!