11.03.2014 Views

Data integration in microbial genomics ... - Jacobs University

Data integration in microbial genomics ... - Jacobs University

Data integration in microbial genomics ... - Jacobs University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

80 6. Doma<strong>in</strong>s of unknown function<br />

6.2 Introduction<br />

Genomic and metagenomic sequenc<strong>in</strong>g projects are reveal<strong>in</strong>g ever<strong>in</strong>creas<strong>in</strong>g<br />

numbers of novel genes, many of unknown function. The<br />

Pfam 23 database [F<strong>in</strong>n et al., 2008], for example, stored some 10 340<br />

prote<strong>in</strong> doma<strong>in</strong> families derived from conserved sequence data with<br />

22% dubbed “doma<strong>in</strong>s of unknown function” (DUFs). This proportion<br />

is predicted to soon overtake that of functionally characterized<br />

doma<strong>in</strong>s [Bateman et al., 2010], prompt<strong>in</strong>g calls for community action<br />

[Roberts, 2004] and concerted, cross-discipl<strong>in</strong>ary attention [Galper<strong>in</strong><br />

and Koon<strong>in</strong>, 2010]. In their response, Jaroszewski et al. [Jaroszewski<br />

et al., 2009] and Goonesekere et al. [Goonesekere et al., 2010] noted<br />

several DUFs that appeared to be variations of functionally characterized<br />

prote<strong>in</strong> folds, most likely ma<strong>in</strong>ta<strong>in</strong>ed due to an extension of an<br />

organism’s ecological niche. It is reasonable to expect that conserved<br />

DUFs enhance ecological performance; however, characteriz<strong>in</strong>g DUFs<br />

from an ecological perspective has yet to be attempted. In this communication,<br />

we present a method of functional hypothesis generation<br />

based on DUF correlation across the Global Ocean Sampl<strong>in</strong>g (GOS)<br />

metagenome collection [Rusch et al., 2007]. Network visualizations<br />

were used <strong>in</strong> hypothesis generation followed by <strong>in</strong>direct gradient analysis<br />

to contextualize one well-def<strong>in</strong>ed hypothesis with environmental<br />

metadata. Together, these approaches aim to support efforts <strong>in</strong> DUF<br />

characterization us<strong>in</strong>g ecogenomic resources.<br />

6.3 Material and Methods<br />

Correlation analysis of <strong>microbial</strong> taxa and environmental parameters<br />

has previously been used to construct association networks [Fuhrman<br />

et al., 2008,Fuhrman, 2009]. Just as the correlation of taxa-abundance<br />

may elucidate a given taxon’s ecosystem-level <strong>in</strong>teractions and function,<br />

correlation of prote<strong>in</strong> doma<strong>in</strong>s across environments may grant<br />

<strong>in</strong>sight <strong>in</strong>to their potential associations and roles. This approach parallels<br />

the identification of unknown metabolic modules whereby genomic<br />

features found to co-vary <strong>in</strong> response to experimental pertur-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!