Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
for wisdom. Where algorithms tend to get implemented, packaged <strong>and</strong> shared as<br />
black boxes, systems get built over <strong>and</strong> over in different establishments, <strong>and</strong> people<br />
facing the same problems for the first time are always peppering their more<br />
experienced counterparts with questions like “did you use approach/product/st<strong>and</strong>ard<br />
X in your project? Was it any good? How do you keep the data up to date? How do<br />
you enforce content quality?” This book was conceived as a resource for people<br />
asking such questions, whose numbers seem to be doubling every six months at<br />
present. It is not possible to give definite answers to these sorts of questions – yes<br />
you should or no you shouldn’t use ACEDB or CORBA or OPM or whatever. The<br />
software changes from month to month, the problems change, the options change.<br />
Today’s technological rising star may be tomorrow’s horror story, <strong>and</strong> vice versa.<br />
Even the insights derived from bitter experience can be questionable – did you really<br />
diagnose the problem correctly, <strong>and</strong> if you build the next system the way you now<br />
think you should have built the last one, will the result really be better or will you<br />
simply encounter a different, equally painful set of tradeoffs? Nonetheless experience<br />
is better than no experience, so I asked the contributors to this volume to include in<br />
their articles lessons learned from developing their systems, <strong>and</strong> to write down their<br />
thoughts on how they might do things differently -- or similarly -- if they were doing<br />
it again.<br />
The contributors represent what I hope will be an interesting, albeit<br />
incomplete, sample of some of the most exciting work being done in bioinformatics<br />
today. The articles are somewhat arbitrarily divided into two sections: <strong>Databases</strong> <strong>and</strong><br />
Software. The intent was that articles that focused more on content would fall into the<br />
former category, while articles that focused on technology would fall into the latter,<br />
but there were a number of borderline cases. My hope is that this collection will be of<br />
interest to readers who have arrived at the interdisciplinary world of bioinformatics<br />
either from the biology side or the computational side (as well as those more distant<br />
migrants from literature, history, business, etc.). The database articles may be more<br />
intelligible to readers with more of a biology background, <strong>and</strong> the software articles to<br />
readers with more software engineering; hopefully there is something to interest (<strong>and</strong><br />
confuse) just about everyone.<br />
The articles in the Database section represent some of the established (or in<br />
some cases recently disestablished!) citizens of the database world, as well some<br />
promising new efforts. The first few articles describe systems focused on the<br />
molecular level; these are mostly multispecies, comparative systems. Karl Sirotkin of<br />
NCBI describes some of the software underpinnings of Entrez, the most widely used<br />
molecular biology resource. The article on HOVERGEN by Duret et al describes an<br />
interesting new approach to integrating phylogenetic <strong>and</strong> coding sequence data into<br />
an organized whole. Several articles focus on the fast-developing area of metabolic<br />
<strong>and</strong> regulatory pathway databases, including those by Overbeek et al on WIT,<br />
Kanehisa on KEGG, <strong>and</strong> Karp <strong>and</strong> Riley on EcoCyc.<br />
The remaining articles in this section describe primarily databases organized<br />
around organisms rather than molecules. Alan Scott describes the extensive<br />
literature-based curation process used to maintain the high quality st<strong>and</strong>ards of<br />
OMIM, the fundamental resource on human genetic disease. My own article looks at<br />
3