14.06.2013 Views

Databases and Systems

Databases and Systems

Databases and Systems

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

70<br />

Data Organization in KEGG<br />

Biochemical pathways<br />

KEGG contains most of the known metabolic pathways, especially for the<br />

intermediary metabolism, that are represented by about 100 graphical diagrams<br />

(pathway maps). In addition, we are adding various types of regulatory pathways<br />

such as membrane transport, signal transduction, cell cycle, transcription, <strong>and</strong><br />

translation, as well as the information of molecular assemblies. Each pathway<br />

diagram is drawn <strong>and</strong> continuously updated manually. For metabolic pathways the<br />

manually drawn diagrams are considered as references of biochemical knowledge<br />

containing all chemically identified reaction pathways. The organism-specific<br />

pathways are then automatically generated by matching the enzyme genes in the gene<br />

catalog with the enzymes on the reference pathway diagrams according to the EC<br />

number. The matched enzymes are colored green in the pathway diagrams. This<br />

matching process is possible because the intermediary metabolism is relatively well<br />

conserved among different organisms. In contrast, the regulatory pathways are too<br />

divergent to be represented in a single reference diagram; they are drawn separately<br />

for each organism.<br />

Gene catalogs <strong>and</strong> genome maps<br />

The information of genes <strong>and</strong> genomes is taken from GenBank <strong>and</strong> organized as the<br />

gene catalog <strong>and</strong> the genome map. The gene catalog contains classifications of all<br />

known genes for each organism. Depending on how one views the function, genes<br />

may be classified in different ways. KEGG provides its own classification scheme<br />

according to the pathway classification, as well as another scheme by the original<br />

authors which is often a variant of Riley's classification [6]. As mentioned the<br />

functional assignment of genes is re-examined by KEGG. The genome map is<br />

presented to help underst<strong>and</strong> the positional information of genes, such as an operon<br />

structure, <strong>and</strong> its relationship with the pathways <strong>and</strong> assemblies. Genome maps are<br />

manipulated graphically by Java applets.<br />

In order to cope with an increasing number of complete genomes, we are trying to<br />

automate as much as possible the EC number assignment that is critical to generate<br />

organism-specific metabolic pathways <strong>and</strong> the gene function assignment. Both<br />

assignments are based not only on sequence similarity, but also on additional<br />

information including the positional information in the genome <strong>and</strong> the orthologous<br />

relation with different species. Since the operon structure is widespread in bacteria<br />

<strong>and</strong> archaea, the genome map browser has turned out to be an indispensable tool for<br />

gene function assignment.<br />

Ortholog group tables<br />

Sequence similarity search against the existing sequence databases often generates a<br />

long list of hits, which requires human efforts to screen out orthologous relations that

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!