Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
282<br />
To create a database for a specific laboratory protocol, the main tasks are to give<br />
names to the Materials <strong>and</strong> Steps of interest, <strong>and</strong> to describe the data to be reported in<br />
each Step.<br />
Steps are generally obvious, because they correspond to the actual work being<br />
done in the laboratory protocol. The main subtlety is ensuring that Steps correspond<br />
to useful points of contact between the laboratory <strong>and</strong> the computer. In our running<br />
example, possible Steps include ones reporting (i) that a library has been constructed,<br />
(ii) that a clone has been picked <strong>and</strong> plated, (iii) that sequence-template has been<br />
prepared from a clone, (iv) that sequence-template has been loaded onto a sequencing<br />
machine <strong>and</strong> run, (v) the results of a sequencing-run, e.g., base-calls, quality<br />
indicators, <strong>and</strong> chromatographs, (vi) the results of vector stripping <strong>and</strong> quality<br />
screening of sequencing results, (vii) the results of assembling sequences, <strong>and</strong> (viii)<br />
the results of analyzing sequence-assemblies.<br />
Many Materials are equally as obvious, because they correspond to the major<br />
reagents employed in the protocol, e.g., libraries <strong>and</strong> clones, or the major data<br />
produced by the protocol, e.g., sequence-reads <strong>and</strong> assemblies. As with Steps, the<br />
main danger is excess: Materials should only be defined for things that are really<br />
worth tracking. Limitations in our current software push strongly in the direction of<br />
parsimony. The mechanism mentioned above for connecting Step-data to Materials<br />
only works for Steps operating directly on a Material; it does not work transitively<br />
over related Materials. While it is easy to get the base-calls for a sequence-read, <strong>and</strong><br />
a list of all sequence-reads for a given clone, <strong>and</strong> a list of all clones picked from a<br />
library, the software offers no special help for getting all base-calls for all sequencereads<br />
for a given clone or library. A second limitation is that LabFlow (see later<br />
section) only supports workflows in which a single kind of Material marches through<br />
a protocol. The effect of these limitations is to encourage database designs in which<br />
multiple real-world material are elided into a single database-Material. In our<br />
example, it would probably be best to represent libraries as Objects (not Materials),<br />
<strong>and</strong> to merge clones <strong>and</strong> sequence-reads into one Material; assemblies would<br />
probably remain as separate Materials. The end result is a database with just two<br />
kinds of Materials: sequence-reads <strong>and</strong> sequence-assemblies.<br />
To recapitulate, the database for our running example would have just two kinds<br />
of Materials, sequence-reads <strong>and</strong> sequence-assemblies, <strong>and</strong> many kinds of Steps,<br />
each operating on one Material. One of the possible Steps listed earlier, namely, the<br />
one reporting on library construction, must fall by the wayside, since we have<br />
decided to represent libraries as Objects, not Materials; data on library construction<br />
would be stored as fields of these library Objects. The most obvious, practical<br />
shortcoming of this example database is that without a clone Material, we lose the<br />
most natural means of coordinating multiple reads from the same clone. In the<br />
database as given, one would probably coordinate multiple reads per clone in the<br />
context of sequence-assemblies; this may be workable but is certainly not ideal.