11.03.2014 Views

Data integration in microbial genomics ... - Jacobs University

Data integration in microbial genomics ... - Jacobs University

Data integration in microbial genomics ... - Jacobs University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.4. Results 81<br />

bations are grouped <strong>in</strong> putative modules [Breitl<strong>in</strong>g et al., 2008]. To<br />

detect such associations <strong>in</strong> metagenomic datasets, we measured the<br />

Spearman rank correlation (ρ) between DUFs detected <strong>in</strong> the globallydistributed<br />

GOS metagenomes (473 351 DUFs detected <strong>in</strong> 454 varieties<br />

across 79 metagenomes, see Supplementary methods onl<strong>in</strong>e). We visualized<br />

these results as network graphs. Vertices (represent<strong>in</strong>g DUFs<br />

varieties) were connected if their ρ was ≥ 0.90. As abundances of 454<br />

DUF varieties were correlated, we enforced a Bonferroni-corrected p-<br />

value threshold of ∼ 2.20× 10 5 (0.01 / 454). We embedded the graph<br />

us<strong>in</strong>g the Fruchterman-Re<strong>in</strong>gold procedure [Fruchterman and Re<strong>in</strong>gold,<br />

1991]. A m<strong>in</strong>imal spann<strong>in</strong>g tree was visualized after Prim’s algorithm<br />

[Prim, 1957] to aid visual <strong>in</strong>terpretation (Fig. 6.1). We assigned<br />

DUFs to putative functional categories guided by Pfam descriptions<br />

and l<strong>in</strong>ked literature, color-cod<strong>in</strong>g vertices accord<strong>in</strong>gly.<br />

6.4 Results<br />

We observed two prom<strong>in</strong>ent networks, one dom<strong>in</strong>ated by DUFs l<strong>in</strong>ked<br />

to photosynthetic organisms (Fig. 6.1, II) and another comprised of<br />

more diverse members (Fig. 6.1, I). Smaller networks were observed,<br />

<strong>in</strong>clud<strong>in</strong>g one associat<strong>in</strong>g DUFs 404 and 407 (Fig. 6.1, III), doma<strong>in</strong>s<br />

known to co-occur [Goonesekere et al., 2010]. Employ<strong>in</strong>g a ‘guilty-byassociation’<br />

approach (Merico et al, 2009), we propagated hypotheses<br />

across closely-embedded doma<strong>in</strong>s. We thus hypothesized that DUFs <strong>in</strong><br />

network II (Fig. 6.1), <strong>in</strong>clud<strong>in</strong>g unassigned DUFs (Fig. 6.1, grey vertices),<br />

describe a photobiologically active module. To exam<strong>in</strong>e this hypothesis,<br />

the taxonomic distributions of the unassigned doma<strong>in</strong>s were<br />

exam<strong>in</strong>ed us<strong>in</strong>g the Pfam web-<strong>in</strong>terface (http://pfam.sanger.ac.uk/).<br />

DUFs 1997, 1995, 1830, and 1651 were observed exclusively <strong>in</strong> phototrophic<br />

organisms while DUF2307 appeared to have a higher copy<br />

number <strong>in</strong> Cyanobacteria. DUFs 2214 and 564 showed less strik<strong>in</strong>g<br />

distributions, however, we speculate they may be <strong>in</strong>volved <strong>in</strong> pigmentation<br />

and C1 metabolism respectively (see Supplementary material<br />

onl<strong>in</strong>e). The larger network (Fig. 6.1, I) was difficult to <strong>in</strong>terpret<br />

due to the diverse functions of its members. The centrality of the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!