30.04.2013 Views

Signal Analysis Research (SAR) Group - RNet - Ryerson University

Signal Analysis Research (SAR) Group - RNet - Ryerson University

Signal Analysis Research (SAR) Group - RNet - Ryerson University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Fig 3: Two-dmensional mapping: (Left) Input pattern with 7 dstinct clusters, (Middle) 8 centres are generated using Ncut, and (fight) 7 centres are<br />

generated using SONcut. Over-classification around the quev (triangle) will result in enoneous classification of the relevant class.<br />

nodes in the input pattern,) and assoc(A, A) and where H(t) is defined similar to the hierarchy function used<br />

assoc(B, B) measure the total intra-cluster similarity in the DSOTM algorithm, then increment maximum allowed<br />

Ncut by 1;<br />

(association) in A and B, assoc(A,V) is the Continuation: Continue with the Similarity Measurement<br />

total connection from nodes in cluster A to all nodes in the ,tep until no noticeable changes in the feature map are<br />

graph; assoc(B,V)is defined similarly. w, is a nonnegative observed;<br />

weight function measuring degree of similarity between two Graph Generation: Given the input pattern, set up a<br />

samples of input patterns and is defined as: weighted graph, G = (V, E), compute the weight, wm, on<br />

('1<br />

each edge, Em, using (8) and create affinity, W, and<br />

diagonal, D, matrices;<br />

where d(p,q) is a pre-defined distance metric (i.e.<br />

Euclidean distance) and k is a user defined constant to<br />

control decreasing rate of the weight function and is<br />

empirically sets to 0.2. By using this function, the smallest<br />

eigenvector remains constant and Ncut can find relatively<br />

right partitions [2].<br />

As Shi and Malik have also discussed, the optimal<br />

partitioning (minimum possible Ncut) can be computed by<br />

solving the generalized eigenvalue system. The second<br />

smallest eigenvector of the generalized eigensystem is then<br />

used to partition the graph.<br />

In this paper we have used the Ncut algorithm [2] for the<br />

purpose of unsupervised data clustering. The Ncut<br />

partitioning method can be recursively applied on the input<br />

pattern to generate more than two clusters. Decision on<br />

maximum number of centres in the input pattern to stop the<br />

clustering process is a challenging problem. In this work, we<br />

Eigensystem Transformation: Solve (D- W)x = A- Dx for<br />

eigenvector with the smallest eigenvalue;<br />

Graph Bipartition: Use the eigenvector with the second<br />

smallest eigenvalue to bipartition the graph;<br />

Partitioning Continuation: Consider current partitions for<br />

supplementary subdivision. Continue with repartitioning<br />

until the Ncut value reaches to its maximum allowed.<br />

In summary, we have proposed an unsupervised<br />

hierarchical Ncut algorithm that is able to estimate the<br />

maximum number of allowed Ncuts by training the<br />

algorithm using the principles found in the DSOTM<br />

architecture. Thus, by dynamically adapting the Ncut<br />

algorithm to the nature of the input pattern, problem of overpartitioning<br />

the relevant class can be prevented. Fig. 3<br />

depicts importance of adapting such predictive mechanism<br />

for the Ncut clustering algorithm and illustrates<br />

effectiveness of employing such mechanism to avoid overclassification<br />

around the query centre.<br />

have integrated - the Ncut algorithm - with the principles found<br />

in DSOTM to automatically estimate appropriate number of<br />

clusters in the input pattern and set the maximum allowed<br />

Ncut accordingly. We call this Ncut algorithm with self-<br />

oriented centre detection, the Self-Organizing Normalized<br />

cuts, SONcut.<br />

The proposed SONcut algorithm is as follows:<br />

Initialization: Choose a root node, {I-,)~=~,<br />

from the<br />

available set of input vectors, { x,)L, in a random manner.<br />

N is the maximum allowed Ncut (initially set to 1) and K the<br />

total number of inputs;<br />

Similarity Measurement: Randomly select a new data point,<br />

x, and find the winning centroid, n*, by minimizing a<br />

predefined distance criterion in (1);<br />

Maximum Allowed Ncut Estimation: If Ilx(t) - I-,, 11 > H (t)<br />

3535<br />

35<br />

Previously in [7], we proposed an automatic CBIR engine<br />

that was structured around an unsupervised learning<br />

algorithm, the DSOTM. To reduce the gap between high-<br />

level concepts (semantics) and low-level statistical features<br />

and to evolve the search process according to what the<br />

system believes to be the significant content within the<br />

query, the above engine was integrated with a process of<br />

feature weight detection using genetic algorithms (GA) as<br />

illustrated in Fig. 4b. In this paper we use a relatively<br />

simpler CBIR architecture (see Fig. 4a and Fig. 5) to solely<br />

compare abilities of the proposed hierarchical clustering<br />

algorithms for the purpose of data classification with other<br />

three techniques, SOTM, SOFM, and Ncut.<br />

Authorized licensed use limited to: <strong>Ryerson</strong> <strong>University</strong> Library. Downloaded on July 7, 2009 at 11:49 from IEEE Xplore. Restrictions apply.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!