14.06.2013 Views

Databases and Systems

Databases and Systems

Databases and Systems

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

gain performance by distributing queries across multiple processors in parallel,<br />

though such gains are rare in practice.<br />

The OPM system, described in the article by Markowitz et al, uses a<br />

middleware layer to impose a uniform object-oriented data model 3 on a potentially<br />

heterogeneous set of back-end databases. Davidson et al’s BioKleisli takes a similar<br />

approach but uses as its common data model a model adopted from logic<br />

programming which is similar to the relational model but more powerful.<br />

The SRS system described by Carter et al occupies an interesting middle<br />

ground between federation <strong>and</strong> warehousing. In SRS the datasets are warehoused in a<br />

single computer for searching, but remain organized in their original formats. New<br />

databases are added to the warehouse by supplying parsers for their flat-file formats;<br />

the files themselves are unaffected, which makes updating an SRS system uniquely<br />

simple compared to most warehouse designs.<br />

The Biology Workbench project exemplifies another approach to data<br />

integration made possible by the Web. It does not physically bring the data together<br />

like SRS, nor does it create a virtual unified database like BioKleisli or the OPM<br />

multidatabase capability. Instead, it integrates at the level of the front end by<br />

providing a thin veneer of user interface which provides access to a number of<br />

capabilities on the Web. A similar concept is employed by the BCM Search Launcher<br />

[16]. A key challenge in bioinformatics software development is the need to<br />

continuously evolve software systems to incorporate or adapt to new technologies<br />

while maintaining compatibility with existing (legacy) systems. Traditional software<br />

development practice has been described in terms of a “waterfall” model, in which<br />

development progresses continuously “downstream” from requirements analysis to<br />

design to implementation. This model provides little guidance to bioinformatics<br />

developers, whose task more closely resembles that of an auto mechanic trying to<br />

redesign a car while driving it. The rapid prototyping model, in which systems are<br />

built by successive approximation from crude first attempts, comes closer to the<br />

mark, but still assumes the luxury of an extended prototyping phase. The component-<br />

based design model, in which systems can be quickly assembled from reusable<br />

components, is one that many in bioinformatics are pinning their hopes on. Jungfer et<br />

al advocate the use of CORBA, an industry st<strong>and</strong>ard for designing distributed object-<br />

oriented software systems, which has been adopted at the European Bioinformatics<br />

Institute <strong>and</strong> elsewhere as a middleware layer to h<strong>and</strong>le communication between back<br />

³ A data model is a set of primitive constructs, such as sets, relations or objects,<br />

which can be used to build database schemas. The most common data models are the<br />

relational <strong>and</strong> the object-oriented. A data modelling language expresses the concepts<br />

of some data model in a specific notation for writing down schemas; the SQL<br />

language used in relational databases includes a data modelling component called the<br />

create table statement. A schema describes the structure of data in a particular<br />

database. A database management system (DBMS) such as Oracle TM or Sybase<br />

interprets a schema definition written in a data modelling language as instructions for<br />

creating a database with the specified structure.<br />

5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!