28.02.2013 Views

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Protein Data Integration Problem<br />

Am<strong>and</strong>eep S. Sidhu <strong>and</strong> Matthew Bellgard<br />

WA Centre for Comparative Genomics,<br />

Murdoch University, Perth, Australia<br />

{asidhu,mbellgard}@ccg.murdoch.edu.au<br />

Abstract. In this chapter, we consider the challenges of information integration in<br />

proteomics from the prospective of researchers using information technology as<br />

an integral part of their discovery process. Specifically, data integration, meta-data<br />

specification, data provenance <strong>and</strong> data quality, <strong>and</strong> ontology are discussed here.<br />

These are the fundamental problems that need to be solved by the bioinformatics<br />

community so that modern information technology can have a deeper impact on<br />

the progress of biological discovery.<br />

1 Introduction<br />

The advent of automated <strong>and</strong> high-throughput technologies in biological research<br />

<strong>and</strong> the progress in the genome projects has led to an ever-increasing rate of data<br />

acquisition <strong>and</strong> exponential growth of data volume. However, the most striking<br />

feature of data in life science is not its volume but its diversity <strong>and</strong> variability. The<br />

biological data sets are intrinsically complex <strong>and</strong> are organised in loose hierarchies<br />

that reflect our underst<strong>and</strong>ing of complex living systems, ranging from<br />

genes <strong>and</strong> proteins, to protein-protein interactions, biochemical pathways <strong>and</strong><br />

regulatory networks, to cells <strong>and</strong> tissues, organisms <strong>and</strong> populations, <strong>and</strong> finally<br />

ecosystems on earth. This system spans many orders of magnitudes in time <strong>and</strong><br />

space <strong>and</strong> poses challenges in informatics, modelling, <strong>and</strong> simulation that goes<br />

beyond any scientific endeavour. Reflecting the complexity of biological systems,<br />

the types of biological data are highly diverse. They range from plain text of laboratory<br />

records <strong>and</strong> literature publications, nucleic acid <strong>and</strong> protein sequences,<br />

three-dimensional atomic structure of molecules, <strong>and</strong> bio<strong>medical</strong> images with different<br />

levels of resolutions, to various experimental outputs from technology as<br />

diverse as microarray chips, light <strong>and</strong> electronic microscopy, Nuclear Magnetic<br />

Resonance (NMR), <strong>and</strong> mass spectrometry.<br />

Different individuals <strong>and</strong> species vary tremendously, so naturally biological<br />

data does this also. For example, structure <strong>and</strong> function of organs vary across age<br />

<strong>and</strong> gender, in normal <strong>and</strong> diseased states, <strong>and</strong> across species. Essentially, all features<br />

of biology exhibit some degree of variability. <strong>Bio</strong>logical research is an exp<strong>and</strong>ing<br />

phase, <strong>and</strong> many fields of biology are still in the developing stages. Data<br />

from these systems is incomplete <strong>and</strong> often inconsistent. This presents a great<br />

challenge in modelling biological objects.<br />

A.S. Sidhu et al. (Eds.): <strong>Bio</strong><strong>medical</strong> Data <strong>and</strong> Applications, SCI 224, pp. 55–69.<br />

springerlink.com © Springer-Verlag Berlin Heidelberg 2009

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!