Computational tools and Interoperability in Comparative ... - CBS
Computational tools and Interoperability in Comparative ... - CBS
Computational tools and Interoperability in Comparative ... - CBS
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Genome Comparisons<br />
Figure 2.8: Deam<strong>in</strong>ation of cytos<strong>in</strong>e (C) <strong>in</strong>to uracil (U)<br />
2.3.5 Base composition <strong>and</strong> DNA repair<br />
Klebsiella is often found <strong>in</strong> plant products, root surfaces <strong>and</strong> liv<strong>in</strong>g trees, fresh vegetables,<br />
<strong>and</strong> foods with high content of sugars <strong>and</strong> acids, such as frozen orange juice concentrate.<br />
Klebsiella pneumoniae can causes ur<strong>in</strong>ary tract <strong>in</strong>fections <strong>and</strong> the NTUH-K2044 stra<strong>in</strong><br />
was isolated from a patient with liver abscess <strong>and</strong> men<strong>in</strong>gitis. The broad range of ecological<br />
niches <strong>in</strong> which Klebsiella lives share the property of be<strong>in</strong>g rich <strong>in</strong> energy <strong>and</strong> nitrogen.<br />
Nitrogen-fix<strong>in</strong>g aerobic bacteria are known to have higher chromosomal GC content (McEwan<br />
et al., 1998), expla<strong>in</strong>ed by the nitrogen requirement to replicate the chromosome; an<br />
AT base pairs conta<strong>in</strong>s 7 nitrogen atoms whereas a GC pair conta<strong>in</strong>s 8 nitrogen atoms.<br />
Cytos<strong>in</strong>e pairs are prone to mutation caused by spontaneous deam<strong>in</strong>ation <strong>in</strong>to uracil<br />
(Visnes et al., 2009) (figure 2.8). In E. coli the two enzymes uracil N -glycosylase <strong>and</strong><br />
apur<strong>in</strong>ic (AP) endonuclease are responsible for the repair of this mutation. However, <strong>in</strong><br />
Buchnera aphidicola Cc, which is a small reduced genome, these two enzymes are absent<br />
(confirmed by prote<strong>in</strong> BLAST). A negative selection is likely to occur <strong>in</strong> organisms with<br />
high chromosomal GC content <strong>and</strong> the lack of a functional repair mechanism. Hence, base<br />
composition of the bacterial genome is by no means r<strong>and</strong>om <strong>and</strong> adjust<strong>in</strong>g the overall GC<br />
contant through evolution may be yet another way to adapt to the environment.<br />
2.3.6 BLASTmatrix - proteome comparison<br />
The BLASTmatrix tool allows for visualization of proteome similarity between larger<br />
numbers of organisms. For each of the pairwise comb<strong>in</strong>ations of proteomes, a BLAST<br />
is performed. Two prote<strong>in</strong>s are declared homologous when 50% of the prote<strong>in</strong> is aligned<br />
<strong>and</strong> 50% of the residues with<strong>in</strong> the alignment are conserved. For a report of proteome<br />
A aga<strong>in</strong>st proteome B, all homologous prote<strong>in</strong>s are then grouped <strong>in</strong>to families <strong>and</strong> the<br />
similarity between A <strong>and</strong> B is calculated as the number of families hav<strong>in</strong>g both organism<br />
A <strong>and</strong> B represented. The BLAST report is cached, based on MD5 checksums of the<br />
proteomes. This enables the tool to efficiently reuse previous results, when organisms<br />
are added to a comparison. This is repeated for all N j=1 j comb<strong>in</strong>ations <strong>and</strong> for each<br />
comb<strong>in</strong>ation a square is drawn conta<strong>in</strong><strong>in</strong>g the follow<strong>in</strong>g <strong>in</strong>formation: the similarity as<br />
percentage of all families of A <strong>and</strong> B, the number of shared families <strong>and</strong> the total number<br />
of families. A small example matrix is shown <strong>in</strong> figure 2.9. The percentage is used to<br />
color-code the square to allow for easier overview of larger comparisons.<br />
The software requires a configuration <strong>in</strong> XML as first argument. In appendix D.4<br />
a Perl script is provided which automatically constructs a configuration that compares<br />
all published Campylobacter proteomes, by query<strong>in</strong>g the Genome Atlas Database. The<br />
output of the BLASTmatrix configuration is shown <strong>in</strong> figure 2.10.<br />
The software has been used <strong>in</strong> different publications (B<strong>in</strong>newies et al., 2005, 2006) <strong>and</strong><br />
has been updated a number of times s<strong>in</strong>ce. The older versions conta<strong>in</strong>ed both BLAST<br />
directions <strong>and</strong> showed the number of shared prote<strong>in</strong>s, leav<strong>in</strong>g the diagram redundant. The<br />
recent version avoids this by <strong>in</strong>stead plott<strong>in</strong>g the shared families which renders the plot<br />
symmetrical across the diagonal. This allows the lower triangle to be removed.<br />
16