14.06.2013 Views

Databases and Systems

Databases and Systems

Databases and Systems

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

74<br />

The search can also be made against the KEGG pathway diagrams, which form a<br />

much larger data set than the ortholog group tables. In this case the search has to be<br />

made against a single reference organism, but the procedure is similar; the user<br />

specifies a set of sequences as a query. When the EC numbers are preassigned to<br />

enzyme genes in the genome, the user can match the set of EC numbers against the<br />

KEGG reference metabolic pathways (not the organism-specific pathways). The<br />

matched enzymes are marked by color, so that the connectivity <strong>and</strong> completeness of<br />

the marked enzymes can be used to assess the correctness of functional assignment<br />

(EC number assignment) in the gene catalog. The existence of a missing element<br />

implies either the gene function assignment is wrong or the biochemical knowledge<br />

of reaction pathways is incomplete [7].<br />

Pathway reconstruction from binary relations<br />

The prediction above is a homology modeling based on comparison against the welldefined<br />

reference. Perhaps, the most challenging task in KEGG is to make<br />

predictions even when the reference is missing or incomplete. In the case of the<br />

metabolic pathways, if the reconstructed pathway contains a missing element <strong>and</strong> it is<br />

not due to an error in the EC number assignment, then this implies that the reference<br />

knowledge is incomplete; an alternative reaction pathway exists or an alternative<br />

enzyme with wider specificity takes the place [7]. To investigate this possibility<br />

KEGG provides a tool to compute from a given list of enzymes all possible pathways<br />

between two compounds <strong>and</strong> allowing changes in specificity. In our representation,<br />

because a list of enzymes is equivalent to a list of substrate-product binary relations,<br />

the procedure involves deduction from binary relations, which is like combining<br />

multiple links in LinkDB. Possible changes of substrate specificity can be<br />

incorporated by considering the hierarchy of EC numbers; namely, allowing a group<br />

of enzymes to be incorporated whenever any member of the group is identified in the<br />

genome, which effectively increases the number of substrate-product relations. Figure<br />

4 shows the procedure of computing chemical reaction paths in terms of the<br />

deductive database.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!