29.07.2013 Views

Computational tools and Interoperability in Comparative ... - CBS

Computational tools and Interoperability in Comparative ... - CBS

Computational tools and Interoperability in Comparative ... - CBS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Genome Comparisons<br />

Figure 2.8: Deam<strong>in</strong>ation of cytos<strong>in</strong>e (C) <strong>in</strong>to uracil (U)<br />

2.3.5 Base composition <strong>and</strong> DNA repair<br />

Klebsiella is often found <strong>in</strong> plant products, root surfaces <strong>and</strong> liv<strong>in</strong>g trees, fresh vegetables,<br />

<strong>and</strong> foods with high content of sugars <strong>and</strong> acids, such as frozen orange juice concentrate.<br />

Klebsiella pneumoniae can causes ur<strong>in</strong>ary tract <strong>in</strong>fections <strong>and</strong> the NTUH-K2044 stra<strong>in</strong><br />

was isolated from a patient with liver abscess <strong>and</strong> men<strong>in</strong>gitis. The broad range of ecological<br />

niches <strong>in</strong> which Klebsiella lives share the property of be<strong>in</strong>g rich <strong>in</strong> energy <strong>and</strong> nitrogen.<br />

Nitrogen-fix<strong>in</strong>g aerobic bacteria are known to have higher chromosomal GC content (McEwan<br />

et al., 1998), expla<strong>in</strong>ed by the nitrogen requirement to replicate the chromosome; an<br />

AT base pairs conta<strong>in</strong>s 7 nitrogen atoms whereas a GC pair conta<strong>in</strong>s 8 nitrogen atoms.<br />

Cytos<strong>in</strong>e pairs are prone to mutation caused by spontaneous deam<strong>in</strong>ation <strong>in</strong>to uracil<br />

(Visnes et al., 2009) (figure 2.8). In E. coli the two enzymes uracil N -glycosylase <strong>and</strong><br />

apur<strong>in</strong>ic (AP) endonuclease are responsible for the repair of this mutation. However, <strong>in</strong><br />

Buchnera aphidicola Cc, which is a small reduced genome, these two enzymes are absent<br />

(confirmed by prote<strong>in</strong> BLAST). A negative selection is likely to occur <strong>in</strong> organisms with<br />

high chromosomal GC content <strong>and</strong> the lack of a functional repair mechanism. Hence, base<br />

composition of the bacterial genome is by no means r<strong>and</strong>om <strong>and</strong> adjust<strong>in</strong>g the overall GC<br />

contant through evolution may be yet another way to adapt to the environment.<br />

2.3.6 BLASTmatrix - proteome comparison<br />

The BLASTmatrix tool allows for visualization of proteome similarity between larger<br />

numbers of organisms. For each of the pairwise comb<strong>in</strong>ations of proteomes, a BLAST<br />

is performed. Two prote<strong>in</strong>s are declared homologous when 50% of the prote<strong>in</strong> is aligned<br />

<strong>and</strong> 50% of the residues with<strong>in</strong> the alignment are conserved. For a report of proteome<br />

A aga<strong>in</strong>st proteome B, all homologous prote<strong>in</strong>s are then grouped <strong>in</strong>to families <strong>and</strong> the<br />

similarity between A <strong>and</strong> B is calculated as the number of families hav<strong>in</strong>g both organism<br />

A <strong>and</strong> B represented. The BLAST report is cached, based on MD5 checksums of the<br />

proteomes. This enables the tool to efficiently reuse previous results, when organisms<br />

are added to a comparison. This is repeated for all N j=1 j comb<strong>in</strong>ations <strong>and</strong> for each<br />

comb<strong>in</strong>ation a square is drawn conta<strong>in</strong><strong>in</strong>g the follow<strong>in</strong>g <strong>in</strong>formation: the similarity as<br />

percentage of all families of A <strong>and</strong> B, the number of shared families <strong>and</strong> the total number<br />

of families. A small example matrix is shown <strong>in</strong> figure 2.9. The percentage is used to<br />

color-code the square to allow for easier overview of larger comparisons.<br />

The software requires a configuration <strong>in</strong> XML as first argument. In appendix D.4<br />

a Perl script is provided which automatically constructs a configuration that compares<br />

all published Campylobacter proteomes, by query<strong>in</strong>g the Genome Atlas Database. The<br />

output of the BLASTmatrix configuration is shown <strong>in</strong> figure 2.10.<br />

The software has been used <strong>in</strong> different publications (B<strong>in</strong>newies et al., 2005, 2006) <strong>and</strong><br />

has been updated a number of times s<strong>in</strong>ce. The older versions conta<strong>in</strong>ed both BLAST<br />

directions <strong>and</strong> showed the number of shared prote<strong>in</strong>s, leav<strong>in</strong>g the diagram redundant. The<br />

recent version avoids this by <strong>in</strong>stead plott<strong>in</strong>g the shared families which renders the plot<br />

symmetrical across the diagonal. This allows the lower triangle to be removed.<br />

16

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!