Data integration in microbial genomics ... - Jacobs University
Data integration in microbial genomics ... - Jacobs University
Data integration in microbial genomics ... - Jacobs University
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
84 6. Doma<strong>in</strong>s of unknown function<br />
performed a least squares, l<strong>in</strong>ear fit of chlorophyll data with significance<br />
(P(>R)) determ<strong>in</strong>ed by permutation (n=1000). To explore<br />
non-l<strong>in</strong>ear relationships between chlorophyll concentrations and the<br />
ord<strong>in</strong>ation, we visualized generalized additive model (GAM) fits as<br />
smoothed, non-parametric isocl<strong>in</strong>es (Fig. 6.2) with significance determ<strong>in</strong>ed<br />
by ANOVA [Wood, 2008]. After Virtanen et al., we <strong>in</strong>terpreted<br />
coefficients of determ<strong>in</strong>ation (R2) as goodness-of-fit measures<br />
for l<strong>in</strong>ear vectors (Rv2) and non-parametric surfaces (Rs2). Analyses<br />
were performed <strong>in</strong> R (http://www.r-project.org). We observed that<br />
these DUF abundances moderately, but significantly, structure GOS<br />
sites along chlorophyll concentration (Rv2 ≈ 0.52, P (> R) ≈ 9.99 × 10 −<br />
4; Rs2 ≈ 0.91, p ≈ 2.00 × 10 − 16). An improved, albeit less significant, fit<br />
(Rv2 ≈ 0.64, P (> R) ≈ 4.00 × 10 − 3; Rs2 ≈ 0.98, p ≈ 5.8 × 10 − 2) and a more<br />
even resolution of sites may be observed when ord<strong>in</strong>at<strong>in</strong>g geographically<br />
localized sample groups such as that along the North American<br />
East Coast (GS002, GS004-8, GS012-14; n=9, plot not shown). The<br />
GAM surface reveals considerable non-l<strong>in</strong>ear effects below chlorophyll<br />
concentrations of 2.0 µg kg-1 seawater, where most sites – particularly<br />
from oligotrophic waters – are ord<strong>in</strong>ated. Such effects may rise<br />
from the diverse functions, multi-functionality, and the selective <strong>in</strong>teractions<br />
between elements <strong>in</strong> biological systems [Kitano, 2002]. The<br />
global coverage of GOS, across numerous ecoregions, may also <strong>in</strong>troduce<br />
unexpected variation. Nonetheless, if these chlorophyll measurements<br />
are understood as a proxy for phytoplankton abundance, these<br />
results tentatively support hypotheses l<strong>in</strong>k<strong>in</strong>g the functional community<br />
structure described by these DUFs to the abundance of photoreactive<br />
plankton. This manner of environmental contextualization may<br />
provide useful perspectives on the function of <strong>microbial</strong> genomic features<br />
<strong>in</strong> their surround<strong>in</strong>g ecosystems.<br />
6.5 Conclusion<br />
Ecogenomic datasets promise to deliver valuable <strong>in</strong>sight <strong>in</strong>to the roles<br />
of uncharacterized genes and prote<strong>in</strong>s. The prospects are greater if<br />
future ‘omics’ sampl<strong>in</strong>g is performed along clear environmental gra-