Data integration in microbial genomics ... - Jacobs University
Data integration in microbial genomics ... - Jacobs University
Data integration in microbial genomics ... - Jacobs University
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
80 6. Doma<strong>in</strong>s of unknown function<br />
6.2 Introduction<br />
Genomic and metagenomic sequenc<strong>in</strong>g projects are reveal<strong>in</strong>g ever<strong>in</strong>creas<strong>in</strong>g<br />
numbers of novel genes, many of unknown function. The<br />
Pfam 23 database [F<strong>in</strong>n et al., 2008], for example, stored some 10 340<br />
prote<strong>in</strong> doma<strong>in</strong> families derived from conserved sequence data with<br />
22% dubbed “doma<strong>in</strong>s of unknown function” (DUFs). This proportion<br />
is predicted to soon overtake that of functionally characterized<br />
doma<strong>in</strong>s [Bateman et al., 2010], prompt<strong>in</strong>g calls for community action<br />
[Roberts, 2004] and concerted, cross-discipl<strong>in</strong>ary attention [Galper<strong>in</strong><br />
and Koon<strong>in</strong>, 2010]. In their response, Jaroszewski et al. [Jaroszewski<br />
et al., 2009] and Goonesekere et al. [Goonesekere et al., 2010] noted<br />
several DUFs that appeared to be variations of functionally characterized<br />
prote<strong>in</strong> folds, most likely ma<strong>in</strong>ta<strong>in</strong>ed due to an extension of an<br />
organism’s ecological niche. It is reasonable to expect that conserved<br />
DUFs enhance ecological performance; however, characteriz<strong>in</strong>g DUFs<br />
from an ecological perspective has yet to be attempted. In this communication,<br />
we present a method of functional hypothesis generation<br />
based on DUF correlation across the Global Ocean Sampl<strong>in</strong>g (GOS)<br />
metagenome collection [Rusch et al., 2007]. Network visualizations<br />
were used <strong>in</strong> hypothesis generation followed by <strong>in</strong>direct gradient analysis<br />
to contextualize one well-def<strong>in</strong>ed hypothesis with environmental<br />
metadata. Together, these approaches aim to support efforts <strong>in</strong> DUF<br />
characterization us<strong>in</strong>g ecogenomic resources.<br />
6.3 Material and Methods<br />
Correlation analysis of <strong>microbial</strong> taxa and environmental parameters<br />
has previously been used to construct association networks [Fuhrman<br />
et al., 2008,Fuhrman, 2009]. Just as the correlation of taxa-abundance<br />
may elucidate a given taxon’s ecosystem-level <strong>in</strong>teractions and function,<br />
correlation of prote<strong>in</strong> doma<strong>in</strong>s across environments may grant<br />
<strong>in</strong>sight <strong>in</strong>to their potential associations and roles. This approach parallels<br />
the identification of unknown metabolic modules whereby genomic<br />
features found to co-vary <strong>in</strong> response to experimental pertur-