14.06.2013 Views

Databases and Systems

Databases and Systems

Databases and Systems

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

277<br />

table of 13 columns <strong>and</strong> 1 million lines in 16 minutes. For the IGD project, we were<br />

able to load on a DEC alpha a large data set consisting of GDB, GenBank, SwissProt<br />

<strong>and</strong> several dozen smaller databases into acedb. Once loaded, Acedb was able to<br />

complete a test suite of queries on this data set in 3 minutes as opposed to 30 minutes<br />

for a Sybase server on the same machine.<br />

It is crucial, however, to run the database on a local disk. If the database disk is<br />

mounted over NFS, or on a RAID system, performance easily degrades by a factor of<br />

10. In such a case, to produce a very large table, it is much faster to ftp the whole<br />

database onto a local disk, run the table query <strong>and</strong> destroy the database, than to run<br />

the query off the remote disk.<br />

How does the storage capacity of Acedb compare to other database management<br />

systems? Comparisons with relational systems are difficult because of the<br />

impossibility of relating relational tables to objects. However, we can compare<br />

Acedb to other object oriented systems. The present C.elegans data set contains 500<br />

thous<strong>and</strong> objects <strong>and</strong> uses half a Gigabyte of disk space. Direct comparisons of<br />

storage capacity by counting the number of objects may be misleading because it<br />

depends on the expressive power of the grammar. For example, Gilles Lucato <strong>and</strong><br />

Isabelle Mougenot at LIRMM wrote an automatic Acedb to Matisse translator. Extra<br />

classes were needed in Matisse to store the Acedb structured tags <strong>and</strong> as a result, the<br />

C.elegans dataset in Matisse is spread over several million objects <strong>and</strong> takes several<br />

times more disk space. Illustra also uses more diskspace. We have been told that 02<br />

performance degrades at around 50 thous<strong>and</strong> objects on similar machines.<br />

Objectstore databases are memory mapped to disk, this places a strict limit of 4<br />

gigabytes on conventional 32-bit architecture machines. Acedb has no such hard<br />

limit, but typically one needs 50 bytes of memory <strong>and</strong> 1 kilobyte of disk space per<br />

object, effectively limiting acedb to one million objects <strong>and</strong> 1 gigabyte of disk on a<br />

128 Mb machine.<br />

How to get the software<br />

The whole Acedb system, including the Java <strong>and</strong> Perl tools <strong>and</strong> demos, <strong>and</strong> the<br />

Acembly package are available from our Web page http://alpha.crbm.cnrs-mop.fr.<br />

The C.elegans data <strong>and</strong> Acedb source code may be downloaded from<br />

ftp://ncbi. nlm.nih.gov/repository/acedb/ in the US<br />

ftp://Airmm.lirmm. fr/pub/acedb/ in France<br />

ftp://ftp.sanger.ac. uk/pub/acedb/ in Engl<strong>and</strong><br />

Documentation, tutorials <strong>and</strong> examples are maintained by Sam Cartinhour <strong>and</strong> Dave<br />

Matthews on the US site http:probe.nalusda.gov:8000/acedocs/index.html

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!