14.06.2013 Views

Databases and Systems

Databases and Systems

Databases and Systems

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

287<br />

sequence-assembly Worker in our cDNA-sequencing example might be reused in a<br />

genomic sequencing project. It is also reasonable to imagine that two or more<br />

different Workers might be implemented for the same job using different<br />

components, e.g., we might develop one sequence-assembly Worker based on phrap<br />

[19] <strong>and</strong> another based on TIGR Assembler [20]. It is also reasonable to expect that<br />

the same component might be used for several purposes, e.g., we might use a fast<br />

sequence alignment program, such as FASTA [21] or crossmatch [22], for both<br />

vector-stripping <strong>and</strong> contamination-checking. For these reasons, it makes sense to<br />

organize the collection of Workers as a class library with well-defined interfaces that<br />

are separate from their implementations, <strong>and</strong> to allow Workers to call each other. In<br />

the long run, success with our method (<strong>and</strong> probably any other modular approach to<br />

LIMS construction) depends on the accumulation of a well-designed library of good<br />

Workers.<br />

After Workers are developed, what remains is to connect them together into a<br />

complete workflow. There are two main tasks: a Step must be created for each<br />

Worker, <strong>and</strong> Routers must be defined to connect the Steps together. The main work<br />

in defining a Step is to determine the mapping between the field-names used by the<br />

Worker <strong>and</strong> those used by the workflow as a whole. (These field-names may be<br />

different since Workers are written for reuse). Routers are generally straightforward<br />

for success-cases, but can be tricky for failure-cases; often, in the early days of a<br />

project, all failures are sent to a catch-all Worker that reports the event to laboratory<br />

supervisors.<br />

Let us apply these ideas to our running example. We will model the database as<br />

suggested in the previous section, i.e., with two kinds of Materials, namely,<br />

sequence-reads <strong>and</strong> sequence-assemblies. Since a given LabFlow can operate on<br />

only one kind of Material, the overall system will need two LabFlows. We will only<br />

describe the first. The sequence-read LabFlow needs Workers for robot-control,<br />

base-calling, vector-clipping, quality-screening, <strong>and</strong> review by laboratory personnel.<br />

Robot-control software generally comes with the machine <strong>and</strong> cannot be modified<br />

by the customer; often the software runs on a dedicated computer, <strong>and</strong> can only be<br />

operated by a person entering comm<strong>and</strong>s directly at that computer. The Worker, in<br />

such cases, helps the human operator coordinate the robot with the rest of the system.<br />

Assume for purposes of the example, that the outputs of the robotic procedure are (i)<br />

a collection of plated clones, <strong>and</strong> (ii) a collection of plated sequencing-templates<br />

derived from those clones, <strong>and</strong> that these plates are bar-coded. The most important<br />

coordination task is to record the bar-codes of the plates in such a way that the each<br />

clone-plate is associated with the corresponding template-plate (so that subsequent<br />

sequence data can be associated with the correct clone). The Worker software for<br />

doing this might be no more than a Web-based program that accepts bar-codes<br />

(entered by the operator using a bar-code w<strong>and</strong>) two-at-a-time, <strong>and</strong> passes each pair<br />

back to the Step.<br />

Next comes base-calling. Assume that we use phred [23] for this purpose, <strong>and</strong><br />

that we wish to run phred in real-time on the data stream generated by the sequencing

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!