28.02.2013 Views

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

Bio-medical Ontologies Maintenance and Change Management

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Substructure Analysis of Metabolic Pathways 245<br />

of the input graph after compression is DL(G|S). SUBDUE’s discovery algorithm<br />

tries to minimize DL(S)+DL(G|S) which represents the description<br />

length of the graph G given the substructure S. The compression of the graph<br />

can be calculated as<br />

Compression = DL(S)+DL(G|S)<br />

DL(G)<br />

where description length DL() is calculated as the number of bits in a<br />

minimal encoding of the graph. Cook <strong>and</strong> Holder describe the detailed computation<br />

of DL(G) in[5].<br />

The discovery algorithm of SUBDUE is computationally expensive as other<br />

graph-related algorithms. SUBDUE uses two heuristic constraints to maintain<br />

polynomial running time: Beam <strong>and</strong> Limit. Beam constrains the number<br />

of substructures by limiting the length of the Q ′ in figure 2. Limit is a userdefined<br />

bound on the number of substructures considered by the algorithm.<br />

4.3 Unsupervised Learning<br />

Once the best structure is discovered, the graph can be compressed using<br />

the best substructure. The compression procedure replaces all instances of<br />

the best substructure in the input graph with a pointer, a single vertex, to<br />

the discovered best substructure. The discovery algorithm can be repeated<br />

on this compressed graph for multiple iterations until the graph cannot be<br />

compressed any more or on reaching the user-defined number of iterations.<br />

Each iteration generates a node in a hierarchical, conceptual clustering of the<br />

input data. On the ith iteration, the best substructure Si is used to compress<br />

the input graph, introducing a new vertex labeled Si to the next iteration.<br />

Consequently, any subsequently discovered subgraph Sj canbedefinedin<br />

terms of one or more Si, wherei

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!