14.06.2013 Views

Databases and Systems

Databases and Systems

Databases and Systems

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

for wisdom. Where algorithms tend to get implemented, packaged <strong>and</strong> shared as<br />

black boxes, systems get built over <strong>and</strong> over in different establishments, <strong>and</strong> people<br />

facing the same problems for the first time are always peppering their more<br />

experienced counterparts with questions like “did you use approach/product/st<strong>and</strong>ard<br />

X in your project? Was it any good? How do you keep the data up to date? How do<br />

you enforce content quality?” This book was conceived as a resource for people<br />

asking such questions, whose numbers seem to be doubling every six months at<br />

present. It is not possible to give definite answers to these sorts of questions – yes<br />

you should or no you shouldn’t use ACEDB or CORBA or OPM or whatever. The<br />

software changes from month to month, the problems change, the options change.<br />

Today’s technological rising star may be tomorrow’s horror story, <strong>and</strong> vice versa.<br />

Even the insights derived from bitter experience can be questionable – did you really<br />

diagnose the problem correctly, <strong>and</strong> if you build the next system the way you now<br />

think you should have built the last one, will the result really be better or will you<br />

simply encounter a different, equally painful set of tradeoffs? Nonetheless experience<br />

is better than no experience, so I asked the contributors to this volume to include in<br />

their articles lessons learned from developing their systems, <strong>and</strong> to write down their<br />

thoughts on how they might do things differently -- or similarly -- if they were doing<br />

it again.<br />

The contributors represent what I hope will be an interesting, albeit<br />

incomplete, sample of some of the most exciting work being done in bioinformatics<br />

today. The articles are somewhat arbitrarily divided into two sections: <strong>Databases</strong> <strong>and</strong><br />

Software. The intent was that articles that focused more on content would fall into the<br />

former category, while articles that focused on technology would fall into the latter,<br />

but there were a number of borderline cases. My hope is that this collection will be of<br />

interest to readers who have arrived at the interdisciplinary world of bioinformatics<br />

either from the biology side or the computational side (as well as those more distant<br />

migrants from literature, history, business, etc.). The database articles may be more<br />

intelligible to readers with more of a biology background, <strong>and</strong> the software articles to<br />

readers with more software engineering; hopefully there is something to interest (<strong>and</strong><br />

confuse) just about everyone.<br />

The articles in the Database section represent some of the established (or in<br />

some cases recently disestablished!) citizens of the database world, as well some<br />

promising new efforts. The first few articles describe systems focused on the<br />

molecular level; these are mostly multispecies, comparative systems. Karl Sirotkin of<br />

NCBI describes some of the software underpinnings of Entrez, the most widely used<br />

molecular biology resource. The article on HOVERGEN by Duret et al describes an<br />

interesting new approach to integrating phylogenetic <strong>and</strong> coding sequence data into<br />

an organized whole. Several articles focus on the fast-developing area of metabolic<br />

<strong>and</strong> regulatory pathway databases, including those by Overbeek et al on WIT,<br />

Kanehisa on KEGG, <strong>and</strong> Karp <strong>and</strong> Riley on EcoCyc.<br />

The remaining articles in this section describe primarily databases organized<br />

around organisms rather than molecules. Alan Scott describes the extensive<br />

literature-based curation process used to maintain the high quality st<strong>and</strong>ards of<br />

OMIM, the fundamental resource on human genetic disease. My own article looks at<br />

3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!