Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
287<br />
sequence-assembly Worker in our cDNA-sequencing example might be reused in a<br />
genomic sequencing project. It is also reasonable to imagine that two or more<br />
different Workers might be implemented for the same job using different<br />
components, e.g., we might develop one sequence-assembly Worker based on phrap<br />
[19] <strong>and</strong> another based on TIGR Assembler [20]. It is also reasonable to expect that<br />
the same component might be used for several purposes, e.g., we might use a fast<br />
sequence alignment program, such as FASTA [21] or crossmatch [22], for both<br />
vector-stripping <strong>and</strong> contamination-checking. For these reasons, it makes sense to<br />
organize the collection of Workers as a class library with well-defined interfaces that<br />
are separate from their implementations, <strong>and</strong> to allow Workers to call each other. In<br />
the long run, success with our method (<strong>and</strong> probably any other modular approach to<br />
LIMS construction) depends on the accumulation of a well-designed library of good<br />
Workers.<br />
After Workers are developed, what remains is to connect them together into a<br />
complete workflow. There are two main tasks: a Step must be created for each<br />
Worker, <strong>and</strong> Routers must be defined to connect the Steps together. The main work<br />
in defining a Step is to determine the mapping between the field-names used by the<br />
Worker <strong>and</strong> those used by the workflow as a whole. (These field-names may be<br />
different since Workers are written for reuse). Routers are generally straightforward<br />
for success-cases, but can be tricky for failure-cases; often, in the early days of a<br />
project, all failures are sent to a catch-all Worker that reports the event to laboratory<br />
supervisors.<br />
Let us apply these ideas to our running example. We will model the database as<br />
suggested in the previous section, i.e., with two kinds of Materials, namely,<br />
sequence-reads <strong>and</strong> sequence-assemblies. Since a given LabFlow can operate on<br />
only one kind of Material, the overall system will need two LabFlows. We will only<br />
describe the first. The sequence-read LabFlow needs Workers for robot-control,<br />
base-calling, vector-clipping, quality-screening, <strong>and</strong> review by laboratory personnel.<br />
Robot-control software generally comes with the machine <strong>and</strong> cannot be modified<br />
by the customer; often the software runs on a dedicated computer, <strong>and</strong> can only be<br />
operated by a person entering comm<strong>and</strong>s directly at that computer. The Worker, in<br />
such cases, helps the human operator coordinate the robot with the rest of the system.<br />
Assume for purposes of the example, that the outputs of the robotic procedure are (i)<br />
a collection of plated clones, <strong>and</strong> (ii) a collection of plated sequencing-templates<br />
derived from those clones, <strong>and</strong> that these plates are bar-coded. The most important<br />
coordination task is to record the bar-codes of the plates in such a way that the each<br />
clone-plate is associated with the corresponding template-plate (so that subsequent<br />
sequence data can be associated with the correct clone). The Worker software for<br />
doing this might be no more than a Web-based program that accepts bar-codes<br />
(entered by the operator using a bar-code w<strong>and</strong>) two-at-a-time, <strong>and</strong> passes each pair<br />
back to the Step.<br />
Next comes base-calling. Assume that we use phred [23] for this purpose, <strong>and</strong><br />
that we wish to run phred in real-time on the data stream generated by the sequencing