You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
70<br />
Data Organization in KEGG<br />
Biochemical pathways<br />
KEGG contains most of the known metabolic pathways, especially for the<br />
intermediary metabolism, that are represented by about 100 graphical diagrams<br />
(pathway maps). In addition, we are adding various types of regulatory pathways<br />
such as membrane transport, signal transduction, cell cycle, transcription, <strong>and</strong><br />
translation, as well as the information of molecular assemblies. Each pathway<br />
diagram is drawn <strong>and</strong> continuously updated manually. For metabolic pathways the<br />
manually drawn diagrams are considered as references of biochemical knowledge<br />
containing all chemically identified reaction pathways. The organism-specific<br />
pathways are then automatically generated by matching the enzyme genes in the gene<br />
catalog with the enzymes on the reference pathway diagrams according to the EC<br />
number. The matched enzymes are colored green in the pathway diagrams. This<br />
matching process is possible because the intermediary metabolism is relatively well<br />
conserved among different organisms. In contrast, the regulatory pathways are too<br />
divergent to be represented in a single reference diagram; they are drawn separately<br />
for each organism.<br />
Gene catalogs <strong>and</strong> genome maps<br />
The information of genes <strong>and</strong> genomes is taken from GenBank <strong>and</strong> organized as the<br />
gene catalog <strong>and</strong> the genome map. The gene catalog contains classifications of all<br />
known genes for each organism. Depending on how one views the function, genes<br />
may be classified in different ways. KEGG provides its own classification scheme<br />
according to the pathway classification, as well as another scheme by the original<br />
authors which is often a variant of Riley's classification [6]. As mentioned the<br />
functional assignment of genes is re-examined by KEGG. The genome map is<br />
presented to help underst<strong>and</strong> the positional information of genes, such as an operon<br />
structure, <strong>and</strong> its relationship with the pathways <strong>and</strong> assemblies. Genome maps are<br />
manipulated graphically by Java applets.<br />
In order to cope with an increasing number of complete genomes, we are trying to<br />
automate as much as possible the EC number assignment that is critical to generate<br />
organism-specific metabolic pathways <strong>and</strong> the gene function assignment. Both<br />
assignments are based not only on sequence similarity, but also on additional<br />
information including the positional information in the genome <strong>and</strong> the orthologous<br />
relation with different species. Since the operon structure is widespread in bacteria<br />
<strong>and</strong> archaea, the genome map browser has turned out to be an indispensable tool for<br />
gene function assignment.<br />
Ortholog group tables<br />
Sequence similarity search against the existing sequence databases often generates a<br />
long list of hits, which requires human efforts to screen out orthologous relations that