11.03.2014 Views

Data integration in microbial genomics ... - Jacobs University

Data integration in microbial genomics ... - Jacobs University

Data integration in microbial genomics ... - Jacobs University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

14 1. Introduction<br />

possibilities like temporal or spatial comparisons are hampered or simply<br />

not possible and comparability of the data cannot be assured. The<br />

fact that “latitude, longitude, and time, elements of the key contextual<br />

data tuple (x,y,z,t), are only reported <strong>in</strong> 7.3% and 7.2% of all<br />

submissions“ [Hankeln et al., 2010] shows that the majority of public<br />

sequence data is not sufficiently annotated.<br />

This fact has been<br />

recognized by the Genomic Standards Consortium (GSC), which ”is<br />

an open-membership work<strong>in</strong>g body which formed <strong>in</strong> September 2005.<br />

The goal of this <strong>in</strong>ternational community is to promote mechanisms<br />

that standardize the description of genomes and the exchange and<br />

<strong><strong>in</strong>tegration</strong> of genomic data“ (www.gensc.org).<br />

The GSC developed a series of checklists to specify which data should<br />

be captured and stored along with sequence data [Field et al., 2008].<br />

The INSDC databases support the storage of these parameters. Recently,<br />

the life science community has begun to develop tools that<br />

implement these standards and to actively <strong>in</strong>tegrate these different<br />

data sources. “<strong>Data</strong> <strong><strong>in</strong>tegration</strong> is the process of comb<strong>in</strong><strong>in</strong>g data resid<strong>in</strong>g<br />

at different sources and provid<strong>in</strong>g the user with a unified view<br />

of these data“ [Lenzer<strong>in</strong>i, 2002].<br />

Once contextualized, a far greater scope of analyses can be performed.<br />

Studies <strong>in</strong> various discipl<strong>in</strong>es of life science have already shown the<br />

power of contextual data enriched sequence studies. In mar<strong>in</strong>e microbiology<br />

it could be shown that there are conserved diversity patterns<br />

along the depth cont<strong>in</strong>uum [DeLong et al., 2006].<br />

Furthermore, annually<br />

recurr<strong>in</strong>g diversity patterns could be identified <strong>in</strong> certa<strong>in</strong> regions<br />

of the ocean [Fuhrman et al., 2006].<br />

In the medical field the<br />

global outbreaks of epidemics can be monitored globally [Janies et al.,<br />

2007, Salzberg et al., 2007, Schriml et al., 2010] 14 . All these studies<br />

exemplify the potential of globally <strong>in</strong>tegrated data.<br />

The tighter the<br />

<strong><strong>in</strong>tegration</strong> of sequence data with contextual data will be, the easier<br />

it will become to carry out sequence data analysis studies <strong>in</strong> larger<br />

contexts. This offers an approach to answer the basic questions ”Who<br />

is out there?”, ”How many of which k<strong>in</strong>d?” and ”What are they do<strong>in</strong>g?”.<br />

Moreover, knowledge will become obta<strong>in</strong>able about the complex<br />

mechanisms of the Earth’s biosphere on the micro and macro scale.<br />

14 There are many more examples that show the <strong>in</strong>creased <strong>in</strong>terpretability of contextualized<br />

sequence data: [Tyson et al., 2004, Sog<strong>in</strong> et al., 2006, Seshadri et al., 2007, Huber et al., 2007,<br />

Rusch et al., 2007].

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!