11.03.2014 Views

Data integration in microbial genomics ... - Jacobs University

Data integration in microbial genomics ... - Jacobs University

Data integration in microbial genomics ... - Jacobs University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

84 6. Doma<strong>in</strong>s of unknown function<br />

performed a least squares, l<strong>in</strong>ear fit of chlorophyll data with significance<br />

(P(>R)) determ<strong>in</strong>ed by permutation (n=1000). To explore<br />

non-l<strong>in</strong>ear relationships between chlorophyll concentrations and the<br />

ord<strong>in</strong>ation, we visualized generalized additive model (GAM) fits as<br />

smoothed, non-parametric isocl<strong>in</strong>es (Fig. 6.2) with significance determ<strong>in</strong>ed<br />

by ANOVA [Wood, 2008]. After Virtanen et al., we <strong>in</strong>terpreted<br />

coefficients of determ<strong>in</strong>ation (R2) as goodness-of-fit measures<br />

for l<strong>in</strong>ear vectors (Rv2) and non-parametric surfaces (Rs2). Analyses<br />

were performed <strong>in</strong> R (http://www.r-project.org). We observed that<br />

these DUF abundances moderately, but significantly, structure GOS<br />

sites along chlorophyll concentration (Rv2 ≈ 0.52, P (> R) ≈ 9.99 × 10 −<br />

4; Rs2 ≈ 0.91, p ≈ 2.00 × 10 − 16). An improved, albeit less significant, fit<br />

(Rv2 ≈ 0.64, P (> R) ≈ 4.00 × 10 − 3; Rs2 ≈ 0.98, p ≈ 5.8 × 10 − 2) and a more<br />

even resolution of sites may be observed when ord<strong>in</strong>at<strong>in</strong>g geographically<br />

localized sample groups such as that along the North American<br />

East Coast (GS002, GS004-8, GS012-14; n=9, plot not shown). The<br />

GAM surface reveals considerable non-l<strong>in</strong>ear effects below chlorophyll<br />

concentrations of 2.0 µg kg-1 seawater, where most sites – particularly<br />

from oligotrophic waters – are ord<strong>in</strong>ated. Such effects may rise<br />

from the diverse functions, multi-functionality, and the selective <strong>in</strong>teractions<br />

between elements <strong>in</strong> biological systems [Kitano, 2002]. The<br />

global coverage of GOS, across numerous ecoregions, may also <strong>in</strong>troduce<br />

unexpected variation. Nonetheless, if these chlorophyll measurements<br />

are understood as a proxy for phytoplankton abundance, these<br />

results tentatively support hypotheses l<strong>in</strong>k<strong>in</strong>g the functional community<br />

structure described by these DUFs to the abundance of photoreactive<br />

plankton. This manner of environmental contextualization may<br />

provide useful perspectives on the function of <strong>microbial</strong> genomic features<br />

<strong>in</strong> their surround<strong>in</strong>g ecosystems.<br />

6.5 Conclusion<br />

Ecogenomic datasets promise to deliver valuable <strong>in</strong>sight <strong>in</strong>to the roles<br />

of uncharacterized genes and prote<strong>in</strong>s. The prospects are greater if<br />

future ‘omics’ sampl<strong>in</strong>g is performed along clear environmental gra-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!